• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Help Needed: Identifying and Processing Main Table Elements in JSON

Community Beginner ,
Jun 13, 2024 Jun 13, 2024

Copy link to clipboard

Copied

Hello everyone,

 

I'm currently working on a project where I need to process a JSON file that represents the structure of a PDF document. This JSON file includes various elements, some of which are tables. However, the JSON data includes references to table elements for every text present within the tables, making it challenging to identify the main table structures separately from their contents.

Here's an example snippet from the JSON data:

 

{
"elements": [
{
"Bounds": [56.69189453125, 40.66029357910156, 551.7269134521484, 673.7546997070312],
"ObjectID": 109,
"Page": 1,
"Path": "//Document/Sect[4]/Table",
"attributes": {
"BBox": [42.47829999999885, 40.922599999999875, 553.0569999999716, 679.8969999999972],
"NumCol": 3,
"NumRow": 58,
"Placement": "Block",
"SpaceAfter": 18
},
"filePaths": ["tables/fileoutpart0.csv", "tables/fileoutpart1.png"]
},
{
"Bounds": [56.692901611328125, 661.3946990966797, 104.927001953125, 673.7546997070312],
"Font": {
"alt_family_name": "SMA Futura Global",
"embedded": true,
"encoding": "WinAnsiEncoding",
"family_name": "SMA Futura Global",
"font_type": "TrueType",
"italic": false,
"monospaced": false,
"name": "GSXDMC+SMAFuturaGlobal-DemiBold",
"subset": true,
"weight": 600
},
"Lang": "en",
"ObjectID": 1572,
"Page": 1,
"Path": "//Document/Sect[4]/Table/TR/TH/P",
"Text": "Technical Data",
"TextSize": 7.5,
"attributes": {"LineHeight": 9}
}
// More elements...
]
}

 

 

Certainly! Here is a draft for a community post asking for help on identifying and processing main table elements in a JSON file:


Title: Help Needed: Identifying and Processing Main Table Elements in JSON

Hello everyone,

I'm currently working on a project where I need to process a JSON file that represents the structure of a PDF document. This JSON file includes various elements, some of which are tables. However, the JSON data includes references to table elements for every text present within the tables, making it challenging to identify the main table structures separately from their contents.

Here's an example snippet from the JSON data:

json
Copy code
{ "elements": [ { "Bounds": [56.69189453125, 40.66029357910156, 551.7269134521484, 673.7546997070312], "ObjectID": 109, "Page": 1, "Path": "//Document/Sect[4]/Table", "attributes": { "BBox": [42.47829999999885, 40.922599999999875, 553.0569999999716, 679.8969999999972], "NumCol": 3, "NumRow": 58, "Placement": "Block", "SpaceAfter": 18 }, "filePaths": ["tables/fileoutpart0.csv", "tables/fileoutpart1.png"] }, { "Bounds": [56.692901611328125, 661.3946990966797, 104.927001953125, 673.7546997070312], "Font": { "alt_family_name": "SMA Futura Global", "embedded": true, "encoding": "WinAnsiEncoding", "family_name": "SMA Futura Global", "font_type": "TrueType", "italic": false, "monospaced": false, "name": "GSXDMC+SMAFuturaGlobal-DemiBold", "subset": true, "weight": 600 }, "Lang": "en", "ObjectID": 1572, "Page": 1, "Path": "//Document/Sect[4]/Table/TR/TH/P", "Text": "Technical Data", "TextSize": 7.5, "attributes": {"LineHeight": 9} } // More elements... ] }

As you can see, the JSON includes both main table elements (e.g., //Document/Sect[4]/Table) and individual text elements within the table (e.g., //Document/Sect[4]/Table/TR/TH/P).

 

Objective: I need to identify and process only the main table elements to replace them with corresponding data from Excel files. The goal is to skip the individual text elements within the tables and focus on the main table structures.

 

Request: I would appreciate any advice or help on:

  1. Refining the approach to accurately identify main table elements.
  2. Best practices for processing these main table elements

 

Thank you in advance for your help!

 

Best regards,

Amith

Views

52

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Jun 13, 2024 Jun 13, 2024

I don't use the JSON Path for understanding tables. I read the JSON until I get to a table element then I switch over to read the .xlsx file. I export tables as .xlsx because unlike .csv, it retains the merged cells. I then process the .xlsx and then skip over the table elements until I'm back to regular paragraphs. 

Votes

Translate

Translate
Community Expert ,
Jun 13, 2024 Jun 13, 2024

Copy link to clipboard

Copied

LATEST

I don't use the JSON Path for understanding tables. I read the JSON until I get to a table element then I switch over to read the .xlsx file. I export tables as .xlsx because unlike .csv, it retains the merged cells. I then process the .xlsx and then skip over the table elements until I'm back to regular paragraphs. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources