• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Adobe Extract API Hyperlink is a different element then the string it is in

New Here ,
Oct 15, 2021 Oct 15, 2021

Copy link to clipboard

Copied

hello, 

 

I am using the document extract REST api to parse pdf's that include hyperlinks. from my testing it looks like the api parses the hyperlink as a different element then the sentance that it is in. from what i can tell there is no indication of where the hyper link was located in the string so i can reassemble it, as it leaves no symbol and eats the trailing line space. 

 

below is the content analyzer request i have been using. the documentation has given me no clue as how to either get it to ignore hyperlinks, or otherwise indicate where they were extracted from. 

 

{
    "cpf:engine": {
      "repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
    },
    "cpf:inputs": {
      "documentIn": {
        "cpf:location": "InputFile0",
        "dc:format": "application/pdf"
      },
      "params": {
        "cpf:inline": {
          "elementsToExtract": [
            "text",
            "tables"
          ]
        }
      }
    },
    "cpf:outputs": {
      "elementsInfo": {
        "cpf:location": "jsonoutput",
        "dc:format": "application/json"
      },
      "elementsRenditions": {
        "cpf:location": "fileoutpart",
        "dc:format": "text/directory"
      }
    }
  }
}

Any help would be appreciated!

TOPICS
PDF Extract API , PDF Services API

Views

246

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 16, 2021 Oct 16, 2021

Copy link to clipboard

Copied

That is the design of PDF files. There is text on a page, and there are also hyperlinks identified by rectangles. There is no PDF connection between the text and the hyperlink. Working out which text forms a link requires comparing the position of each character with the position of each link rectangle.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 26, 2021 Oct 26, 2021

Copy link to clipboard

Copied

LATEST

I'm reporting this as a bug. Hopefully they'll escalate it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources