Adobe Extract API Hyperlink is a different element then the string it is in

Question

hello,  I am using the document extract REST api to parse pdf's that include hyperlinks. from my testing it looks like the api parses the hyperlink as a different element then the sentance that it is in. from what i can tell there is no indication of where the hyper link was located in the string so i can reassemble it, as it leaves no symbol and eats the trailing line space.  below is the content analyzer request i have been using. the documentation has given me no clue as how to either get it to ignore hyperlinks, or otherwise indicate where they were extracted from.  {
    "cpf:engine": {
      "repo:assetId": "urn:aaid:cpf:58af6e2c-1f0c-400d-9188-078000185695"
    },
    "cpf:inputs": {
      "documentIn": {
        "cpf:location": "InputFile0",
        "dc:format": "application/pdf"
      },
      "params": {
        "cpf:inline": {
          "elementsToExtract": [
            "text",
            "tables"
          ]
        }
      }
    },
    "cpf:outputs": {
      "elementsInfo": {
        "cpf:location": "jsonoutput",
        "dc:format": "application/json"
      },
      "elementsRenditions": {
        "cpf:location": "fileoutpart",
        "dc:format": "text/directory"
      }
    }
  }
}Any help would be appreciated!

Joel Geraci · Answer

I'm reporting this as a bug. Hopefully they'll escalate it.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.