• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Seeking Solutions: Preserving Table Structure in JSON Output with Adobe PDF Extract API for RAG App

Community Beginner ,
Jun 10, 2024 Jun 10, 2024

Copy link to clipboard

Copied

Hello all! Iused adobe extract pdf API service to parse a pdf. Pdf and output JSON is attached to this message.  I believe the Json output doesn't preserve the table structure. If I pass this data to an LLM, it is not able to answer relevant questions about this data as the table structure is not preserved. How should I go about this? I want to use adobe API to build a RAG application. Is there a way to preserve the table structure within the Json file? for example, I need outputs such as like this:

{

"Input (DC)":

{ "MVPS 4000-S2": null

, "MVPS 4200-S2": null

},

 

"Available inverters": {

"MVPS 4000-S2": "1 x SCS 3450 UP or 1 x SCS 3450 UP-XT",

"MVPS 4200-S2": "1 x SCS 3600 UP or 1 x SCS 3600 UP-XT"

},

 

"Max. input voltage": {

"MVPS 4000-S2": "1500 V",

"MVPS 4200-S2": "1500 V"

},

 

"Number of DC inputs": {

"MVPS 4000-S2": "dependent on the selected inverters",

"MVPS 4200-S2": null

},

 

"Integrated zone monitoring": {

"MVPS 4000-S2": "○",

"MVPS 4200-S2": null

},

 

"Available DC fuse sizes (per input)": {

"MVPS 4000-S2": "200 A, 250 A, 315 A, 350 A, 400 A, 450 A, 500 A",

"MVPS 4200-S2": null

},

 

I know it can also generate csv, but the csv doesnt have any other information that might be present in the pdf. 

 

TOPICS
PDF Extract API

Views

96

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Jun 10, 2024 Jun 10, 2024

I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like. 

Votes

Translate

Translate
Community Expert ,
Jun 10, 2024 Jun 10, 2024

Copy link to clipboard

Copied

I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jun 10, 2024 Jun 10, 2024

Copy link to clipboard

Copied

Hi Joel,

 

Thanks a lot for the reply! Yes, would really help if you can share your Node.JS code.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 10, 2024 Jun 10, 2024

Copy link to clipboard

Copied

It's in a private git repo. If you are comfortable doing so, send me a private message with your github ID and I'll add you as a collaborator. I eventually plan on making it opensource once I'm past the work-in-progress. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jun 10, 2024 Jun 10, 2024

Copy link to clipboard

Copied

LATEST

Thanks Joel! I just sent you a personal message that has my Github ID. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources