Copy link to clipboard
Copied
Hello,
I'am using Adobe Pro DC in order to export financial PDF reports to HTML (or Word).
However, the tables are hard (in fact extremely hard) to parse and the toolkit failed to correctly detect tables.
The PDFs have a high quality, the problem is that the tables are artistically\carefully designed.
Is there something i can do or some trick in order to improve table parsing accuracy.
Thanks,
Copy link to clipboard
Copied
Hello,
I'am using Adobe Pro DC in order to export financial PDF reports to HTML (or Word).
However, the tables are hard (in fact extremely hard) to parse and the toolkit failed to correctly detect tables.
The PDFs have a high quality, the problem is that the tables are artistically\carefully designed.
Is there something i can do or some trick in order to improve table parsing accuracy.
Thanks,
Copy link to clipboard
Copied
Have you tried (a) selecting just the table and exporting the selection or (b) exporting to Excel instead of Word?
Copy link to clipboard
Copied
Yes, this is hard. Tools either rely on fuzzy logic or on allowing users to prespecify zones for extraction. Fuzzy logic is going to look for vertical and horizontal lines, for runs of whitespace, for vertical alignment (left or right) in blocks. All any of us have to work with is what we see on the page; it's interesting how the human brain can so easily make a table from tiny clues.