Skip to main content
Participant
May 28, 2018
Question

Parse Hard Table

  • May 28, 2018
  • 2 replies
  • 591 views

Hello,

I'am using Adobe Pro DC in order  to export financial PDF reports to HTML (or Word).

However, the tables are hard (in fact extremely hard)  to parse and the toolkit failed to correctly detect tables.

The PDFs have a high quality, the problem is that the tables are artistically\carefully designed.

Is there something i can do  or some trick in order to improve table parsing accuracy.

Thanks,  

This topic has been closed for replies.

2 replies

Legend
August 22, 2018

Yes, this is hard. Tools either rely on fuzzy logic or on allowing users to prespecify zones for extraction. Fuzzy logic is going to look for vertical and horizontal lines, for runs of whitespace, for vertical alignment (left or right) in blocks. All any of us have to work with is what we see on the page; it's interesting how the human brain can so easily make a table from tiny clues.

lrosenth
Adobe Employee
Adobe Employee
August 22, 2018

Have you tried (a) selecting just the table and exporting the selection or (b) exporting to Excel instead of Word?