Copy link to clipboard
Copied
We are trying to see if we can extract a 'tagged table' from PDF with TD TR structure along with TD properties (like col/row span, background color, border etc.). I have tried many PDF extraction tools or libraries and all of them just extract only positions of text objects and not the structure for the tagged tables. The only tool that extracts a table structure is Adobe's proprietary HTML converter but the conversion is not 100% accurate (sometimes table is rendered as plain text). Is Adobe restricting the extraction of TD TR tags along with their properties? Clarification would be really helpful.
Copy link to clipboard
Copied
No, Adobe is not restricting the extraction of any data structure from a PDF document. Indeed, PDF documents follow a standard that is not any more in the hands of Adobe (as it is now an ISO standard).
If a structure does not extract correctly, that may be because it has not been created as such a structure. PDF documents may be quite complex, but at the end of the day, they were never thought to be converted back. It was thought to be an electronic copy of your print. That means that you may have data in your PDF file that looks like a table, but is none. And till, it's a correct PDF file.
Copy link to clipboard
Copied