• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How to remove hyperlinks from pdf file with API service?

Community Beginner ,
Jan 25, 2023 Jan 25, 2023

Copy link to clipboard

Copied

Hello,

 

is there a way to remove all hyperlinks from a pdf document using one of the API services?

 

Thanks and regards

Views

331

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 26, 2023 Jan 26, 2023

Copy link to clipboard

Copied

One thing to bear in mind (I have no specific answer for the API services) is that if your hyperlinks have text on the page with a URL (like http://something on the page), they will keep working no matter what you do; these work whether or not they are actual hyperlinks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 26, 2023 Jan 26, 2023

Copy link to clipboard

Copied

That would be ok, I specifically want to remove any links/references to other pages in the document.
The background is that the extract algorithm can't handle these links.


Each normal word containing a hyperlink to another page is created as a separate object in the generated JSON file. The actual text does not contain this word anymore.

 

I think this is a bug but a workaround would be to remove all links before extracting. I think this would be generally useful for the extract service because the link itself or the reference is not output in the JSON file. So at the moment there is no real added value to leave the hyperlinks in the document.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 26, 2023 Jan 26, 2023

Copy link to clipboard

Copied

LATEST

Example:

  1. "b)  derived from Table 4."

  2. Table 4 has a hyperlink to the page with table 4 in the document

JSON looks like:

 

{
			"Bounds": [
				36.85040283203125,
				609.50439453125,
				47.90544128417969,
				670.8843994140625
			],
			"ClipBounds": [
				36.85040283203125,
				609.50439453125,
				47.90544128417969,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/Lbl",
			"Text": "b) ",
			"TextSize": 11.0
		},
		{
			"Bounds": [
				121.38999938964844,
				609.50439453125,
				155.33999633789062,
				670.8843994140625
			],
			"ClipBounds": [
				121.38999938964844,
				609.50439453125,
				155.33999633789062,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody/StyleSpan/Reference",
			"Text": "Table 4",
			"TextSize": 11.0
		},
		{
			"Bounds": [
				56.96940612792969,
				609.50439453125,
				160.34469604492188,
				670.8843994140625
			],
			"ClipBounds": [
				56.96940612792969,
				609.50439453125,
				160.34469604492188,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody",
			"Text": "derived from . ",
			"TextSize": 11.0
		},

 

  1.  

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources