Copy link to clipboard
Copied
Hello all! Iused adobe extract pdf API service to parse a pdf. Pdf and output JSON is attached to this message. I believe the Json output doesn't preserve the table structure. If I pass this data to an LLM, it is not able to answer relevant questions about this data as the table structure is not preserved. How should I go about this? I want to use adobe API to build a RAG application. Is there a way to preserve the table structure within the Json file? for example, I need outputs such as like this:
{
"Input (DC)":
{ "MVPS 4000-S2": null
, "MVPS 4200-S2": null
},
"Available inverters": {
"MVPS 4000-S2": "1 x SCS 3450 UP or 1 x SCS 3450 UP-XT",
"MVPS 4200-S2": "1 x SCS 3600 UP or 1 x SCS 3600 UP-XT"
},
"Max. input voltage": {
"MVPS 4000-S2": "1500 V",
"MVPS 4200-S2": "1500 V"
},
"Number of DC inputs": {
"MVPS 4000-S2": "dependent on the selected inverters",
"MVPS 4200-S2": null
},
"Integrated zone monitoring": {
"MVPS 4000-S2": "○",
"MVPS 4200-S2": null
},
"Available DC fuse sizes (per input)": {
"MVPS 4000-S2": "200 A, 250 A, 315 A, 350 A, 400 A, 450 A, 500 A",
"MVPS 4200-S2": null
},
I know it can also generate csv, but the csv doesnt have any other information that might be present in the pdf.
I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like.
Copy link to clipboard
Copied
I generally post-process the JSON from extract to create a Markdown file. When I hit a table, I read past it, read in the .CSV as a Markdown table, then contuinue with the JSON. It works great. I have some Node.JS code I can share if you like.
Copy link to clipboard
Copied
Hi Joel,
Thanks a lot for the reply! Yes, would really help if you can share your Node.JS code.
Copy link to clipboard
Copied
It's in a private git repo. If you are comfortable doing so, send me a private message with your github ID and I'll add you as a collaborator. I eventually plan on making it opensource once I'm past the work-in-progress.
Copy link to clipboard
Copied
Thanks Joel! I just sent you a personal message that has my Github ID.