Skip to main content
Participant
June 10, 2024
Answered

Seeking Solutions: Preserving Table Structure in JSON Output with Adobe PDF Extract API for RAG App

  • June 10, 2024
  • 2 replies
  • 1567 views

Hello all! Iused adobe extract pdf API service to parse a pdf. Pdf and output JSON is attached to this message.  I believe the Json output doesn't preserve the table structure. If I pass this data to an LLM, it is not able to answer relevant questions about this data as the table structure is not preserved. How should I go about this? I want to use adobe API to build a RAG application. Is there a way to preserve the table structure within the Json file? for example, I need outputs such as like this:

{

"Input (DC)":

{ "MVPS 4000-S2": null

, "MVPS 4200-S2": null

},

 

"Available inverters": {

"MVPS 4000-S2": "1 x SCS 3450 UP or 1 x SCS 3450 UP-XT",

"MVPS 4200-S2": "1 x SCS 3600 UP or 1 x SCS 3600 UP-XT"

},

 

"Max. input voltage": {

"MVPS 4000-S2": "1500 V",

"MVPS 4200-S2": "1500 V"

},

 

"Number of DC inputs": {

"MVPS 4000-S2": "dependent on the selected inverters",

"MVPS 4200-S2": null

},

 

"Integrated zone monitoring": {

"MVPS 4000-S2": "○",

"MVPS 4200-S2": null

},

 

"Available DC fuse sizes (per input)": {

"MVPS 4000-S2": "200 A, 250 A, 315 A, 350 A, 400 A, 450 A, 500 A",

"MVPS 4200-S2": null

},

 

I know it can also generate csv, but the csv doesnt have any other information that might be present in the pdf. 

 

Correct answer Joel Geraci

I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like. 

2 replies

Participant
September 2, 2025

I have tried to convert directly PDF to JSON via Adobe API. I just wanted to test it because of this mad pricing policy. However the result wasn't exact at all. I had complex tables with merged cells. The solution that worked for me was: 

1. convert PDF to DOCX via eg CloudConvert (they provide much better pricing with credits)

2. then convert DOCX to JSON.

That worked perfectly!

Joel Geraci
Community Expert
Joel GeraciCommunity ExpertCorrect answer
Community Expert
June 10, 2024

I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like. 

Participant
June 10, 2024

Hi Joel,

 

Thanks a lot for the reply! Yes, would really help if you can share your Node.JS code.

Joel Geraci
Community Expert
Community Expert
June 10, 2024

It's in a private git repo. If you are comfortable doing so, send me a private message with your github ID and I'll add you as a collaborator. I eventually plan on making it opensource once I'm past the work-in-progress.