Seeking Solutions: Preserving Table Structure in JSON Output with Adobe PDF Extract API for RAG App
- June 10, 2024
- 2 replies
- 1567 views
Hello all! Iused adobe extract pdf API service to parse a pdf. Pdf and output JSON is attached to this message. I believe the Json output doesn't preserve the table structure. If I pass this data to an LLM, it is not able to answer relevant questions about this data as the table structure is not preserved. How should I go about this? I want to use adobe API to build a RAG application. Is there a way to preserve the table structure within the Json file? for example, I need outputs such as like this:
{
"Input (DC)":
{ "MVPS 4000-S2": null
, "MVPS 4200-S2": null
},
"Available inverters": {
"MVPS 4000-S2": "1 x SCS 3450 UP or 1 x SCS 3450 UP-XT",
"MVPS 4200-S2": "1 x SCS 3600 UP or 1 x SCS 3600 UP-XT"
},
"Max. input voltage": {
"MVPS 4000-S2": "1500 V",
"MVPS 4200-S2": "1500 V"
},
"Number of DC inputs": {
"MVPS 4000-S2": "dependent on the selected inverters",
"MVPS 4200-S2": null
},
"Integrated zone monitoring": {
"MVPS 4000-S2": "○",
"MVPS 4200-S2": null
},
"Available DC fuse sizes (per input)": {
"MVPS 4000-S2": "200 A, 250 A, 315 A, 350 A, 400 A, 450 A, 500 A",
"MVPS 4200-S2": null
},
I know it can also generate csv, but the csv doesnt have any other information that might be present in the pdf.
