Copy link to clipboard
Copied
Hello,
is there a way to remove all hyperlinks from a pdf document using one of the API services?
Thanks and regards
Copy link to clipboard
Copied
One thing to bear in mind (I have no specific answer for the API services) is that if your hyperlinks have text on the page with a URL (like http://something on the page), they will keep working no matter what you do; these work whether or not they are actual hyperlinks.
Copy link to clipboard
Copied
That would be ok, I specifically want to remove any links/references to other pages in the document.
The background is that the extract algorithm can't handle these links.
Each normal word containing a hyperlink to another page is created as a separate object in the generated JSON file. The actual text does not contain this word anymore.
I think this is a bug but a workaround would be to remove all links before extracting. I think this would be generally useful for the extract service because the link itself or the reference is not output in the JSON file. So at the moment there is no real added value to leave the hyperlinks in the document.
Copy link to clipboard
Copied
Example:
"b) derived from Table 4."
JSON looks like:
{
"Bounds": [
36.85040283203125,
609.50439453125,
47.90544128417969,
670.8843994140625
],
"ClipBounds": [
36.85040283203125,
609.50439453125,
47.90544128417969,
670.8843994140625
],
"Font": {
"alt_family_name": "Cambria",
"embedded": true,
"encoding": "Custom",
"family_name": "Cambria",
"font_type": "TrueType",
"italic": false,
"monospaced": false,
"name": "AAAAAC+Cambria",
"subset": true,
"weight": 400
},
"HasClip": true,
"Page": 25,
"Path": "//Document/L[39]/LI/LBody/L/LI[2]/Lbl",
"Text": "b) ",
"TextSize": 11.0
},
{
"Bounds": [
121.38999938964844,
609.50439453125,
155.33999633789062,
670.8843994140625
],
"ClipBounds": [
121.38999938964844,
609.50439453125,
155.33999633789062,
670.8843994140625
],
"Font": {
"alt_family_name": "Cambria",
"embedded": true,
"encoding": "Custom",
"family_name": "Cambria",
"font_type": "TrueType",
"italic": false,
"monospaced": false,
"name": "AAAAAC+Cambria",
"subset": true,
"weight": 400
},
"HasClip": true,
"Page": 25,
"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody/StyleSpan/Reference",
"Text": "Table 4",
"TextSize": 11.0
},
{
"Bounds": [
56.96940612792969,
609.50439453125,
160.34469604492188,
670.8843994140625
],
"ClipBounds": [
56.96940612792969,
609.50439453125,
160.34469604492188,
670.8843994140625
],
"Font": {
"alt_family_name": "Cambria",
"embedded": true,
"encoding": "Custom",
"family_name": "Cambria",
"font_type": "TrueType",
"italic": false,
"monospaced": false,
"name": "AAAAAC+Cambria",
"subset": true,
"weight": 400
},
"HasClip": true,
"Page": 25,
"Path": "//Document/L[39]/LI/LBody/L/LI[2]/LBody",
"Text": "derived from . ",
"TextSize": 11.0
},