Copy link to clipboard
Copied
Hi Community members,
I am exploring the pdf OCR and EXTRACT APIs ( OCR, EXTRACT )
I have a Scanned pdf so to make it editable i applied the OCR and then for the pdf style and content information i am using the Extract API ( Extract Text and Tables and Character Bounding Boxes (w/ Renditions) )
I have used this api into node like below
const params = new ExtractPDFParams({
elementsToExtract: [ExtractElementType.TEXT, ExtractElementType.TABLES],
addCharInfo: true
});
But the JSON which is extracted contains some extra info like added some
Copy link to clipboard
Copied
I don't understand your question. Your code indicates that you are using the Extract API but your images show PDF. Extract API does not return PDF, just JSON, tables, and images.
Copy link to clipboard
Copied
You are right, Extract API returns the JSON table and images not pdf. I am getting the JSON, table and images also thats fine for me. As mentioned into the question I am using the Text with styles, BBox and other properties from JSON to recreate the pdf into Flutter Application.
I am rendering the Text with borders (BBox) if present into the JSON into the APP, into the screenshot left is original pdf and right is the output .
I have share the JSON also through link into question you can see.
My question is that, there are some extra BBox into the JSON (highlighed through red boxes) those are actually not present into the original pdf. Why those are into the Extracted JSON?
Find more inspiration, events, and resources on the new Adobe Community
Explore Now