Does PDF Extract API support hyperlink extraction?
Copy link to clipboard
Copied
Hi Support Community,
I have a query related to PDF Extract API.
Does it support extraction of hyperlinks from PDF content? If yes can you share sample json output with hyperlink.
Copy link to clipboard
Copied
Update: I tried it in Extract API Demo, it was able to identify the hyperlink as Reference but it did not provide the URL of the hyperlink.
Refer snapshot below:
We need to extract hyperlinks also, so can you please clarify on this.
Copy link to clipboard
Copied
Yes. It supports hyperlink extraction. Try the attached file. You'll see the hyperlinks on the right that are represented like this...
{
"Bounds": [
427.7200012207031,
355.3280029296875,
524.5761413574219,
377.3710021972656
],
"Font": {
"alt_family_name": "Clean",
"embedded": true,
"encoding": "Custom",
"family_name": "Adobe Clean",
"font_type": "Type1",
"italic": false,
"monospaced": false,
"name": "FHQAMC+AdobeClean-Bold",
"subset": true,
"weight": 700
},
"HasClip": false,
"Lang": "en",
"Page": 0,
"Path": "//Document/Aside/P[2]/Reference",
"Text": "(<https://www.adobe.io/apis/documentcloud/dcsdk/>)Adobe Acrobat Services › (<https://www.adobe.io/apis/documentcloud/dcsdk/pdf-extract.html>)Adobe PDF Extract API › ",
"TextSize": 9,
"attributes": {
"LineHeight": 11
},
"elementId": 15
},
Copy link to clipboard
Copied
Thanks Joel for your response. Yes I can see that hyperlinks are extracted properly in the PDF sample that you shared. However I am not seeing the same behavior with my sample PDF. Can you please check the attached PDF.
Copy link to clipboard
Copied
Hi @Joel Geraci ,
Any update on above?
Copy link to clipboard
Copied
No. Generally when I submit test files, I don't get updates.
Copy link to clipboard
Copied
Hi @Joel Geraci ,
When I tried at my end hyper links were not extracted. Regarding your response I am not clear on following piece: "Generally when I submit test files, I don't get updates." - where did you submit test files?
I want to understand why it was not able to extract hyperlinks from the sample pdf that I shared earlier.
Copy link to clipboard
Copied
I want to understand why it was not able to extract hyperlinks from the sample pdf that I shared earlier.
It's an AI, we don't know why it does what it does, we just have to train it more when it gets things wrong.
Copy link to clipboard
Copied
Hi Joel, hope you are well. Is there any update on this since your post? Has the model been trained a little bit since? I'm experiencing the same issue. Thanks, Tim

