Parse Hard Table

May 28, 2018

Copy link to clipboard

Copied

Hello,

I'am using Adobe Pro DC in order  to export financial PDF reports to HTML (or Word).

However, the tables are hard (in fact extremely hard)  to parse and the toolkit failed to correctly detect tables.

The PDFs have a high quality, the problem is that the tables are artistically\carefully designed.

Is there something i can do  or some trick in order to improve table parsing accuracy.

Thanks,  

TOPICS
Acrobat SDK and JavaScript

Views

184

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Parse Hard Table

May 28, 2018

Copy link to clipboard

Copied

Hello,

I'am using Adobe Pro DC in order  to export financial PDF reports to HTML (or Word).

However, the tables are hard (in fact extremely hard)  to parse and the toolkit failed to correctly detect tables.

The PDFs have a high quality, the problem is that the tables are artistically\carefully designed.

Is there something i can do  or some trick in order to improve table parsing accuracy.

Thanks,  

TOPICS
Acrobat SDK and JavaScript

Views

185

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
May 28, 2018 0
Adobe Employee ,
Aug 22, 2018

Copy link to clipboard

Copied

Have you tried (a) selecting just the table and exporting the selection or (b) exporting to Excel instead of Word?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 22, 2018 0
Aug 22, 2018

Copy link to clipboard

Copied

Yes, this is hard. Tools either rely on fuzzy logic or on allowing users to prespecify zones for extraction. Fuzzy logic is going to look for vertical and horizontal lines, for runs of whitespace, for vertical alignment (left or right) in blocks. All any of us have to work with is what we see on the page; it's interesting how the human brain can so easily make a table from tiny clues.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Aug 22, 2018 0