Skip to main content
Participant
July 14, 2022
Question

PDF Extraction never the same format

  • July 14, 2022
  • 1 reply
  • 333 views

Hi, I just started trying your pdf extraction api for a client, it's seems to be working fine but I am having issues with how to get the needed data while all the PDFs are visually exactly the same. Therefore, the paths, rows and index are never the same for the same elements. 

 

Here is an exemple : 

 

For the first invoice PDF, if I am trying to get the first "Cat. No." :

Path : //Document/Sect/Table/TR[7]/TD[3]/P

Index : 68

Row count : 6 

 

For the second invoice, first "Cat. No." :

Path : //Document/Sect/Table[2]/TR[2]/TD[3]/P

Index : 60

Row count : 5

 

Am I doing something wrong or is the API not precise enough?

Thank you for your help!

 

This topic has been closed for replies.

1 reply

Joel Geraci
Community Expert
Community Expert
July 14, 2022

The paths are calculated on a document-by-document basis. There are probably differences that the AI sees that we don't. When you Extract with tables, what do the tables look like? The extracted tables should be consistent.