I am using Acrobat XI Pro to convert PDF file to XML which consist many tables but some tables not convert in XML. All the tables showing with TD/TR tags but some tables converted in paragraphs <P> tag. Vertical tables are also disturbed in the XML file Please help me regarding this problem and provide best solution.
Most PDF files don't contain tables. It's all guesswork, sometimes it guesses as you want, sometimes not.
Thank you for your reply but what is the solution of this problem?
Is this PDF tagged or not?
Yes, the PDF tagged.
If it's tagged, Acrobat might manage better. Do the tags define all the table definitions?
Did you try in up to date Acrobat?
Please show screen shot of a table in the tags panel and the same date extracted to XML.
(Protect private information).
Thank you. Now please show the same information (perhaps starting at Thailand 9,487,661) in the Tags panel, showing that it is tagged as a table.
Thank you. I am not an expert on tags, but I notice that there are a very large number of table tags, perhaps the file is badly tagged with a series of one line tables. Anyway I defer to Thom Parker who knows a lot more about this than I do.
Table recognition and conversion is extremely difficult. There are several applications that attempt this, and one very well used open source tool, https://tabula.technology/.
In general, they all work decently on simple tables and then fall to pieces when things start getting complicated.
So you are asking a lot of Acrobat. Even if the table tags are really well formed, the Acrobat conversion might fall apart. Consider using another tool for this. Search google for 'PDF Table Extraction'