Why does Extract API output extra bounding boxes and treat lines as rectangles?
Hello Adobe Team,
I’m working with scanned invoices and using two APIs together:
OCR API → to make the scanned PDF editable/searchable.
Extract API → with parameters:
const params = new ExtractPDFParams({ elementsToExtract: [ExtractElementType.TEXT, ExtractElementType.TABLES], addCharInfo: true });
This works, but I’ve noticed unexpected results when reviewing the JSON and trying to re-render the PDF:
The JSON output includes BBox attributes that add rectangular boxes around text and table elements.
When rendering from this JSON in Flutter, extra borders appear that do not exist in the original scanned PDF (e.g. double borders around tables, boxes around text).
It seems the API is treating every detected line or text area as a bounding rectangle, not just the actual drawn table/line borders from the original file.
Example: a single drawn line in the PDF becomes a rectangle in the JSON.
This makes it impossible to distinguish between real visual borders vs. bounding boxes used for OCR positioning.
