Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to remove hyperlinks from pdf file with API service?

Community Beginner ,
Jan 25, 2023 Jan 25, 2023

Hello,

 

is there a way to remove all hyperlinks from a pdf document using one of the API services?

 

Thanks and regards

565
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 26, 2023 Jan 26, 2023

One thing to bear in mind (I have no specific answer for the API services) is that if your hyperlinks have text on the page with a URL (like http://something on the page), they will keep working no matter what you do; these work whether or not they are actual hyperlinks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 26, 2023 Jan 26, 2023

That would be ok, I specifically want to remove any links/references to other pages in the document.
The background is that the extract algorithm can't handle these links.


Each normal word containing a hyperlink to another page is created as a separate object in the generated JSON file. The actual text does not contain this word anymore.

 

I think this is a bug but a workaround would be to remove all links before extracting. I think this would be generally useful for the extract service because the link itself or the reference is not output in the JSON file. So at the moment there is no real added value to leave the hyperlinks in the document.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 26, 2023 Jan 26, 2023
LATEST

Example:

  1. "b)  derived from Table 4."

  2. Table 4 has a hyperlink to the page with table 4 in the document

JSON looks like:

 

{
			"Bounds": [
				36.85040283203125,
				609.50439453125,
				47.90544128417969,
				670.8843994140625
			],
			"ClipBounds": [
				36.85040283203125,
				609.50439453125,
				47.90544128417969,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/Lbl",
			"Text": "b) ",
			"TextSize": 11.0
		},
		{
			"Bounds": [
				121.38999938964844,
				609.50439453125,
				155.33999633789062,
				670.8843994140625
			],
			"ClipBounds": [
				121.38999938964844,
				609.50439453125,
				155.33999633789062,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody/StyleSpan/Reference",
			"Text": "Table 4",
			"TextSize": 11.0
		},
		{
			"Bounds": [
				56.96940612792969,
				609.50439453125,
				160.34469604492188,
				670.8843994140625
			],
			"ClipBounds": [
				56.96940612792969,
				609.50439453125,
				160.34469604492188,
				670.8843994140625
			],
			"Font": {
				"alt_family_name": "Cambria",
				"embedded": true,
				"encoding": "Custom",
				"family_name": "Cambria",
				"font_type": "TrueType",
				"italic": false,
				"monospaced": false,
				"name": "AAAAAC+Cambria",
				"subset": true,
				"weight": 400
			},
			"HasClip": true,
			"Page": 25,
			"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody",
			"Text": "derived from . ",
			"TextSize": 11.0
		},

 

  1.  
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources