Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Is extraction of table structure restricted by Adobe?

New Here ,
Jun 07, 2023 Jun 07, 2023

We are trying to see if we can extract a 'tagged table' from PDF with TD TR structure along with TD properties (like col/row span, background color, border etc.). I have tried many PDF extraction tools or libraries and all of them just extract only positions of text objects and not the structure for the tagged tables. The only tool that extracts a table structure is Adobe's proprietary HTML converter but the conversion is not 100% accurate (sometimes table is rendered as plain text). Is Adobe restricting the extraction of TD TR tags along with their properties? Clarification would be really helpful.

TOPICS
PDF
422
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 08, 2023 Jun 08, 2023

No, Adobe is not restricting the extraction of any data structure from a PDF document. Indeed, PDF documents follow a standard that is not any more in the hands of Adobe (as it is now an ISO standard).

 

If a structure does not extract correctly, that may be because it has not been created as such a structure. PDF documents may be quite complex, but at the end of the day, they were never thought to be converted back. It was thought to be an electronic copy of your print. That means that you may have data in your PDF file that looks like a table, but is none. And till, it's a correct PDF file.

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 12, 2023 Jun 12, 2023
LATEST

Hi Adambo, thanks for your response.

The attached PDF document is a tagged one and has proper table structure. Yet all we get the x and y position of text objects without any table td tr structure.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines