• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

PDF Extraction never the same format

New Here ,
Jul 14, 2022 Jul 14, 2022

Copy link to clipboard

Copied

Hi, I just started trying your pdf extraction api for a client, it's seems to be working fine but I am having issues with how to get the needed data while all the PDFs are visually exactly the same. Therefore, the paths, rows and index are never the same for the same elements. 

 

Here is an exemple : 

 

For the first invoice PDF, if I am trying to get the first "Cat. No." :

Path : //Document/Sect/Table/TR[7]/TD[3]/P

Index : 68

Row count : 6 

Jordan25237770zaym_1-1657808934168.png

 

For the second invoice, first "Cat. No." :

Path : //Document/Sect/Table[2]/TR[2]/TD[3]/P

Index : 60

Row count : 5

Jordan25237770zaym_0-1657808498861.png

 

Am I doing something wrong or is the API not precise enough?

Thank you for your help!

 

TOPICS
PDF Extract API

Views

31

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 14, 2022 Jul 14, 2022

Copy link to clipboard

Copied

LATEST

The paths are calculated on a document-by-document basis. There are probably differences that the AI sees that we don't. When you Extract with tables, what do the tables look like? The extracted tables should be consistent.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources