Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Why I am getting extra attributes Even don't have into original pdf?

New Here ,
Sep 09, 2025 Sep 09, 2025

Hi Community members,

I am exploring the pdf OCR and EXTRACT APIs ( OCR, EXTRACT  )

I have a Scanned pdf so to make it editable i applied the OCR and then for the pdf style and content information i am using the Extract API ( Extract Text and Tables and Character Bounding Boxes (w/ Renditions) )

 

I have used this api into node like below

    const params = new ExtractPDFParams({
                            elementsToExtract: [ExtractElementType.TEXT, ExtractElementType.TABLES],
                            addCharInfo: true                
                        });

But the JSON which is extracted contains some extra info like added some 

attributes (boxes into the elements) but if you look into the original pdf then there are no boxes then why those added?Untitled design (4).png
TOPICS
PDF Extract API , PDF Services API
75
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 09, 2025 Sep 09, 2025

I don't understand your question. Your code indicates that you are using the Extract API but your images show PDF. Extract API does not return PDF, just JSON, tables, and images.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 09, 2025 Sep 09, 2025
LATEST

You are right, Extract API returns the JSON table and images not pdf. I am getting the JSON, table and images also thats fine for me. As mentioned into the question I am using the Text with styles, BBox and other properties from JSON to recreate the pdf into Flutter Application. 
I am rendering the Text with borders (BBox) if present into the JSON into the APP, into the screenshot left is original pdf and right is the output .
I have share the JSON also through link into question you can see.

My question is that,  there are some extra BBox into the JSON (highlighed through red boxes) those are actually not present into the original pdf.  Why those are into the Extracted JSON?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources