PDF Extraction never the same format

Question

Hi, I just started trying your pdf extraction api for a client, it's seems to be working fine but I am having issues with how to get the needed data while all the PDFs are visually exactly the same. Therefore, the paths, rows and index are never the same for the same elements.

Here is an exemple :

For the first invoice PDF, if I am trying to get the first "Cat. No." :

Path : //Document/Sect/Table/TR[7]/TD[3]/P

Index : 68

Row count : 6

For the second invoice, first "Cat. No." :

Path : //Document/Sect/Table[2]/TR[2]/TD[3]/P

Index : 60

Row count : 5

Am I doing something wrong or is the API not precise enough?

Thank you for your help!

Joel Geraci · Answer

The paths are calculated on a document-by-document basis. There are probably differences that the AI sees that we don't. When you Extract with tables, what do the tables look like? The extracted tables should be consistent.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.