Skip to main content
Participating Frequently
September 9, 2025
Question

Why I am getting extra attributes Even don't have into original pdf?

  • September 9, 2025
  • 1 reply
  • 120 views

Hi Community members,

I am exploring the pdf OCR and EXTRACT APIs ( OCR, EXTRACT  )

I have a Scanned pdf so to make it editable i applied the OCR and then for the pdf style and content information i am using the Extract API ( Extract Text and Tables and Character Bounding Boxes (w/ Renditions) )

 

I have used this api into node like below

    const params = new ExtractPDFParams({
                            elementsToExtract: [ExtractElementType.TEXT, ExtractElementType.TABLES],
                            addCharInfo: true                
                        });

But the JSON which is extracted contains some extra info like added some 

attributes (boxes into the elements) but if you look into the original pdf then there are no boxes then why those added?

1 reply

Joel Geraci
Community Expert
Community Expert
September 9, 2025

I don't understand your question. Your code indicates that you are using the Extract API but your images show PDF. Extract API does not return PDF, just JSON, tables, and images.

Participating Frequently
September 10, 2025

You are right, Extract API returns the JSON table and images not pdf. I am getting the JSON, table and images also thats fine for me. As mentioned into the question I am using the Text with styles, BBox and other properties from JSON to recreate the pdf into Flutter Application. 
I am rendering the Text with borders (BBox) if present into the JSON into the APP, into the screenshot left is original pdf and right is the output .
I have share the JSON also through link into question you can see.

My question is that,  there are some extra BBox into the JSON (highlighed through red boxes) those are actually not present into the original pdf.  Why those are into the Extracted JSON?