OCR skips some text

Report · Mar 12, 2018

I have a large PDF of an old book that I'm trying to convert to text in order, ultimately, to create an ebook. The print-ready PDF has been supplied by the printer but we don't have access to the original InDesign (or whatever software was used) files of the layout some 20-odd years ago. The PDF file is essentially just page images, so the text needs to be freshly OCRed, so I'm trialling the latest Adobe Acrobat DC for this purpose.

OCR seems to work quite well on the text that Acrobat recognises, but it is passing off large slabs of text. The image below shows what I mean;

Is there a way I can force Acrobat to OCR regions not automatically identified as text?

Report · Mar 19, 2018

Please try the different option of OCR. Go to Tools> Enhance Scans> Recognize Text> In this file> Recognize Text

It should recognize Text properly. But it won't allow you to do any Editing. But you can correct any text it recognized incorrectly using "correct recognize text" option in drop down.

Thanks.

Report · Apr 09, 2018

I've a question. Can I automate OCR to search within folders of scanned documents and images without converting the image or scanned document to editable?

Report · May 14, 2018

Not exactly, but you can do it using Advance search functionality.

Instead of selecting "Editable Text & Images", select "Searchable Image(Exact)". It will not make PDFs editable but add a text layer on images or scanned documents. Also, you can save these documents as a copy of original instead of making changes in original.

I hope it will resolve your issue.

Thanks.