Copy link to clipboard
Copied
Hello,
I know that Acrobat can't OCR pages that have renderable text already. I also know the workarounds like printing to PDF first as an image and then running OCR again. However, I work with very large PDF's (thousands of pages) and multiple of this type. To have to print to PDF first and then run OCR on them would take a very long time. I'm hoping I could skip the step of having to Print to PDF first as an image.
I tried sanitizing the documents but after the sanitization, it still can't OCR some of these pages because I think some fonts are missing (these are not our documents) and although Acrobat displays these missing fonts correctly, they're read as gibberish (confirmed when trying to copy and paste).
I found a software called PDF Pro at pdfpro.com that seems to OCR the pages with these weird gibberish renderable text and correct them to what they should be. So this prevents the step of having to print to PDF.
This made me wonder if there were other applications that can OCR renderable text pages. If I have to spend money, I might as well get one that comes highly recommended but I'm not versed in this field. Can anyone suggest any other software that I can demo that may be faster/better/have more options?
Could someone also explain why Acrobat doesn't replace renderable text during OCR while other apps can? Not sure how the other apps are accomplishing this and I'm curious to the tech behind it. Is there a reason Acrobat doens't want to implement such a feature?
Thanks.
Have something to add?
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more