I have used Adobe Acrobat PRO DC software.
Tagging a PDF document is pretty much easier when compare to other software. If the document is tagged then we can view the tagged objects as (Paragraph/Table/Figure). So, my question is. Whether i can get those tagged objects(Paragraph/Table/Figure) programmatically from a PDF.
So, here by using the Accessibility Tool we can view each and every object as Paragraph or Table. Can we extract those objects programmatically as JSON/XML? Guide me..
Thanks and regards,
Copy link to clipboard
You can use Save as XML to get an XML representation that uses the structural elements.
Or you can write your own plugin that walks the structure and exports it any way you want.