• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Deserializing PDF Extract Json File

New Here ,
Dec 02, 2023 Dec 02, 2023

Copy link to clipboard

Copied

Hi,

Is there any inbuilt tool or documentation which can be used to deserialize the Structured JSON to get the Proper DOM hierarchy of the document.

TOPICS
.NET SDK , PDF Extract API

Views

201

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 04, 2023 Dec 04, 2023

Copy link to clipboard

Copied

There is no concept of a DOM hierarchy in PDF. There can be "Marked Content" generally known as tags, the output from Extract is more often more properly representative of the document structure than the tags,  especially if the PDF was created by a low-quality tool

 

If you are asking if the flat JSON can be transformed into something hierarchical like XML/HTML then yes. The "Path" property of each element can be used to construct such a hierarchy I'm actually working on a sample of this that should be published soon.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 04, 2023 Dec 04, 2023

Copy link to clipboard

Copied

Hi @Joel_Geraci 
Thank you for your reply,

Yes I was talking about the "Path" property, I was wondering is there any documentation on what could be present under the Path property as I've seen there's a lot of values in that property and in order to constrcut a proper hierarchy all of those values should be mapped properly.

Please keep the commiunity posted on your solution as it would help a lot of us as well.

Thanks

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 09, 2023 Dec 09, 2023

Copy link to clipboard

Copied

LATEST

New here, but that would be amazing. I'm using Extract right now for one giant file but soon will have to start covering different ones (all with a similar structure), so I started working on a custom output parser. If you finish early or want any of my input as a potential user / starting to work on a similar idea lmk

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources