PDF extract api (table)

Forum|Forum|2 years ago
January 23, 2024
返信数 4.
1180 ビュー

I am using the extract API for PDF tables to export to an Excel file (pdfservices-node-sdk 3.2.0). In the same PDF file, there are 4 identical tables, but when exporting, only 2 tables are extracted. What factors could be affecting the export of tables? How can I adjust to achieve the best results?

Only the table highlighted in red is successfully exported to Excel.

M

maaz_1828

Participant

can you please share the code of extracting table and savign them to a excel file?

A

Anonymous

It's in the samples.

T

Tuân3456692039on作成者

Participating Frequently

"There's a bit of confusion. More accurately, figures 2 and 4 have Excel table results, while figures 1 and 3 cannot generate an Excel file."

Joel Geraci

Community Expert

It's an AI. The code to do the page segmentation can be off when deciding if something is a figure vs a table vs. text. Honestly, given the proximity to the drawings above the tables, I'd have thought that the bottom two tables would get read and not the top 2.

Unfortunately, there are no "knobs" to turn to get better results but with your permission, I can send your file to engineering to train for this sort of thing.