Copy link to clipboard
Copied
I have a simple PDF file that has basic words on the corners, 4 per corner, 2 pages. Total results should be 8, but I am getting only 5 elements when examining the output JSON file.
How is this API missing such a simple test case?
If it can't extract information accurately from a basic example, how much confidence can I have for much larger more complex PDF's?
Should have: top left, top right, bottom left, bottom right, top left 2, top right 2, bottom left 2, bottom right 2.
Copy link to clipboard
Copied
Actually, it'd probably do a better job on a more complex PDF. It's been trained to look at the layout and categorize page elements. This simple layout is confusing to it. I think the text at the bottom is being recognized as a footer so it's being ignored by the AI. That said, I've alerted our team and sent them a link to this thread.
Copy link to clipboard
Copied
Thanks but I'm not impressed, especially if I need to use the adobe API token which is not entirely free.
For those who need an alternate, get PDF2JSON api, which is free to use unlimited, and can pass the most basic test case I provided.