Copy link to clipboard
Copied
Copy link to clipboard
Copied
What does you get? What does you expect?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
There are no text on this pages, only images.
Copy link to clipboard
Copied
There must be some work around to get the text out of this kind of pdfs. Would highly appreciate if you could suggest me how do i get the text.
Copy link to clipboard
Copied
You can try OCR in Adobe Acrobat.
Copy link to clipboard
Copied
https://aws.amazon.com/marketplace/pp/prodview-g2ikxe6zxsi64
Adobe PDF Services API is also working in the same way if i am not wrong. The json output which i had shared earlier with you was from Adobe PDF service only.
OCR in Adobe Acrobat will turned out to be a manual process, how do i integrate it with my python script. I am really sorry for bothering you but i really need a solution for this.
Copy link to clipboard
Copied
You can perform OCR on the document:
https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#text-recognition-ocr
Copy link to clipboard
Copied
Try splitting the PDF into one containing only scanned pages (eg. 1-5 of that document) and a second PDF that has the non-scanned pages (eg. 6-24).