Copy link to clipboard
Copied
I am using Acrobat XI Pro to convert PDF file to XML which consist many tables but some tables not convert in XML. All the tables showing with TD/TR tags but some tables converted in paragraphs <P> tag. Vertical tables are also disturbed in the XML file Please help me regarding this problem and provide best solution.
Copy link to clipboard
Copied
Most PDF files don't contain tables. It's all guesswork, sometimes it guesses as you want, sometimes not.
Copy link to clipboard
Copied
Thank you for your reply but what is the solution of this problem?
Copy link to clipboard
Copied
Lower expectations.
Copy link to clipboard
Copied
Is this PDF tagged or not?
Copy link to clipboard
Copied
Yes, the PDF tagged.
Copy link to clipboard
Copied
If it's tagged, Acrobat might manage better. Do the tags define all the table definitions?
Copy link to clipboard
Copied
Yes.
Copy link to clipboard
Copied
Did you try in up to date Acrobat?
Please show screen shot of a table in the tags panel and the same date extracted to XML.
(Protect private information).
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Thank you. Now please show the same information (perhaps starting at Thailand 9,487,661) in the Tags panel, showing that it is tagged as a table.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Thank you. I am not an expert on tags, but I notice that there are a very large number of table tags, perhaps the file is badly tagged with a series of one line tables. Anyway I defer to Thom Parker who knows a lot more about this than I do.
Copy link to clipboard
Copied
Table recognition and conversion is extremely difficult. There are several applications that attempt this, and one very well used open source tool, https://tabula.technology/.
In general, they all work decently on simple tables and then fall to pieces when things start getting complicated.
So you are asking a lot of Acrobat. Even if the table tags are really well formed, the Acrobat conversion might fall apart. Consider using another tool for this. Search google for 'PDF Table Extraction'