Does PDF Extract API support hyperlink extraction?

Report · Oct 06, 2023

Hi Support Community,
I have a query related to PDF Extract API.

Does it support extraction of hyperlinks from PDF content? If yes can you share sample json output with hyperlink.

Report · Oct 06, 2023

Update: I tried it in Extract API Demo, it was able to identify the hyperlink as Reference but it did not provide the URL of the hyperlink.
Refer snapshot below:

We need to extract hyperlinks also, so can you please clarify on this.

Report · Oct 06, 2023

Yes. It supports hyperlink extraction. Try the attached file. You'll see the hyperlinks on the right that are represented like this...

{
    "Bounds": [
        427.7200012207031,
        355.3280029296875,
        524.5761413574219,
        377.3710021972656
    ],
    "Font": {
        "alt_family_name": "Clean",
        "embedded": true,
        "encoding": "Custom",
        "family_name": "Adobe Clean",
        "font_type": "Type1",
        "italic": false,
        "monospaced": false,
        "name": "FHQAMC+AdobeClean-Bold",
        "subset": true,
        "weight": 700
    },
    "HasClip": false,
    "Lang": "en",
    "Page": 0,
    "Path": "//Document/Aside/P[2]/Reference",
    "Text": "(<https://www.adobe.io/apis/documentcloud/dcsdk/>)Adobe Acrobat Services › (<https://www.adobe.io/apis/documentcloud/dcsdk/pdf-extract.html>)Adobe PDF Extract API › ",
    "TextSize": 9,
    "attributes": {
        "LineHeight": 11
    },
    "elementId": 15
},

Report · Oct 08, 2023

Thanks Joel for your response. Yes I can see that hyperlinks are extracted properly in the PDF sample that you shared. However I am not seeing the same behavior with my sample PDF. Can you please check the attached PDF.

Report · Oct 15, 2023

Hi @Joel Geraci ,
Any update on above?

Report · Oct 16, 2023

No. Generally when I submit test files, I don't get updates.

Report · Oct 17, 2023

Hi @Joel Geraci ,

When I tried at my end hyper links were not extracted. Regarding your response I am not clear on following piece: "Generally when I submit test files, I don't get updates." - where did you submit test files?

I want to understand why it was not able to extract hyperlinks from the sample pdf that I shared earlier.

Report · Oct 18, 2023

I want to understand why it was not able to extract hyperlinks from the sample pdf that I shared earlier.

It's an AI, we don't know why it does what it does, we just have to train it more when it gets things wrong.

Report · Mar 05, 2024

Hi Joel, hope you are well. Is there any update on this since your post? Has the model been trained a little bit since? I'm experiencing the same issue. Thanks, Tim

Does PDF Extract API support hyperlink extraction?

Photos