Why I am getting extra attributes Even don't have into original pdf?

Report · Sep 09, 2025

Hi Community members,

I am exploring the pdf OCR and EXTRACT APIs ( OCR, EXTRACT )

I have a Scanned pdf so to make it editable i applied the OCR and then for the pdf style and content information i am using the Extract API ( Extract Text and Tables and Character Bounding Boxes (w/ Renditions) )

I have used this api into node like below

    const params = new ExtractPDFParams({
                            elementsToExtract: [ExtractElementType.TEXT, ExtractElementType.TABLES],
                            addCharInfo: true                
                        });

But the JSON which is extracted contains some extra info like added some

attributes (boxes into the elements) but if you look into the original pdf then there are no boxes then why those added? Untitled design (4).png

Report · Sep 09, 2025

I don't understand your question. Your code indicates that you are using the Extract API but your images show PDF. Extract API does not return PDF, just JSON, tables, and images.

Report · Sep 09, 2025

You are right, Extract API returns the JSON table and images not pdf. I am getting the JSON, table and images also thats fine for me. As mentioned into the question I am using the Text with styles, BBox and other properties from JSON to recreate the pdf into Flutter Application.
I am rendering the Text with borders (BBox) if present into the JSON into the APP, into the screenshot left is original pdf and right is the output .
I have share the JSON also through link into question you can see.

My question is that, there are some extra BBox into the JSON (highlighed through red boxes) those are actually not present into the original pdf. Why those are into the Extracted JSON?