Copy link to clipboard
Copied
Hello everyone,
I'm currently working on a project where I need to process a JSON file that represents the structure of a PDF document. This JSON file includes various elements, some of which are tables. However, the JSON data includes references to table elements for every text present within the tables, making it challenging to identify the main table structures separately from their contents.
Here's an example snippet from the JSON data:
{
"elements": [
{
"Bounds": [56.69189453125, 40.66029357910156, 551.7269134521484, 673.7546997070312],
"ObjectID": 109,
"Page": 1,
"Path": "//Document/Sect[4]/Table",
"attributes": {
"BBox": [42.47829999999885, 40.922599999999875, 553.0569999999716, 679.8969999999972],
"NumCol": 3,
"NumRow": 58,
"Placement": "Block",
"SpaceAfter": 18
},
"filePaths": ["tables/fileoutpart0.csv", "tables/fileoutpart1.png"]
},
{
"Bounds": [56.692901611328125, 661.3946990966797, 104.927001953125, 673.7546997070312],
"Font": {
"alt_family_name": "SMA Futura Global",
"embedded": true,
"encoding": "WinAnsiEncoding",
"family_name": "SMA Futura Global",
"font_type": "TrueType",
"italic": false,
"monospaced": false,
"name": "GSXDMC+SMAFuturaGlobal-DemiBold",
"subset": true,
"weight": 600
},
"Lang": "en",
"ObjectID": 1572,
"Page": 1,
"Path": "//Document/Sect[4]/Table/TR/TH/P",
"Text": "Technical Data",
"TextSize": 7.5,
"attributes": {"LineHeight": 9}
}
// More elements...
]
}
Certainly! Here is a draft for a community post asking for help on identifying and processing main table elements in a JSON file:
Title: Help Needed: Identifying and Processing Main Table Elements in JSON
Hello everyone,
I'm currently working on a project where I need to process a JSON file that represents the structure of a PDF document. This JSON file includes various elements, some of which are tables. However, the JSON data includes references to table elements for every text present within the tables, making it challenging to identify the main table structures separately from their contents.
Here's an example snippet from the JSON data:
As you can see, the JSON includes both main table elements (e.g., //Document/Sect[4]/Table) and individual text elements within the table (e.g., //Document/Sect[4]/Table/TR/TH/P).
Objective: I need to identify and process only the main table elements to replace them with corresponding data from Excel files. The goal is to skip the individual text elements within the tables and focus on the main table structures.
Request: I would appreciate any advice or help on:
Thank you in advance for your help!
Best regards,
Amith
I don't use the JSON Path for understanding tables. I read the JSON until I get to a table element then I switch over to read the .xlsx file. I export tables as .xlsx because unlike .csv, it retains the merged cells. I then process the .xlsx and then skip over the table elements until I'm back to regular paragraphs.
Copy link to clipboard
Copied
I don't use the JSON Path for understanding tables. I read the JSON until I get to a table element then I switch over to read the .xlsx file. I export tables as .xlsx because unlike .csv, it retains the merged cells. I then process the .xlsx and then skip over the table elements until I'm back to regular paragraphs.