Skip to main content
Participant
January 5, 2023
Question

API Adobe PDF Extract text JSON missed results on a simple pdf

  • January 5, 2023
  • 1 reply
  • 696 views

I have a simple PDF file that has basic words on the corners, 4 per corner, 2 pages. Total results should be 8, but I am getting only 5 elements when examining the output JSON file. 

How is this API missing such a simple test case?

If it can't extract information accurately from a basic example, how much confidence can I have for much larger more complex PDF's?

 

Should have: top left, top right, bottom left, bottom right, top left 2, top right 2, bottom left 2, bottom right 2.

 

 

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    January 26, 2023

    Actually, it'd probably do a better job on a more complex PDF. It's been trained to look at the layout and categorize page elements. This simple layout is confusing to it. I think the text at the bottom is being recognized as a footer so it's being ignored by the AI. That said, I've alerted our team and sent them a link to this thread.

    Participant
    January 26, 2023

    Thanks but I'm not impressed, especially if I need to use the adobe API token which is not entirely free.

    For those who need an alternate, get PDF2JSON api, which is free to use unlimited, and can pass the most basic test case I provided.