API Adobe PDF Extract text JSON missed results on a simple pdf

Question

I have a simple PDF file that has basic words on the corners, 4 per corner, 2 pages. Total results should be 8, but I am getting only 5 elements when examining the output JSON file.

How is this API missing such a simple test case?

If it can't extract information accurately from a basic example, how much confidence can I have for much larger more complex PDF's?

Should have: top left, top right, bottom left, bottom right, top left 2, top right 2, bottom left 2, bottom right 2.

Joel Geraci · Answer

Actually, it'd probably do a better job on a more complex PDF. It's been trained to look at the layout and categorize page elements. This simple layout is confusing to it. I think the text at the bottom is being recognized as a footer so it's being ignored by the AI. That said, I've alerted our team and sent them a link to this thread.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.