Improve bad OCR quality
Copy link to clipboard
Copied
For years now I have downloaded U.S. patents in PDF from Google Patents and other sources. Adobe's OCR engine routinely fails to recognize "fi" (recognizing it as "?"), misreads a lower case w as an upper case W, and sometimes misreads a lower case z as an upper case Z. I have seen this behavior consistently across hundreds of patent docuemtsn.
The only remedy I have found in this forum is to change each occurrence manually. That's not an acceptable solution. At a minimum, there should be a global search & replace. Is there?
Thanks,
- bill.
Copy link to clipboard
Copied
Hi bill,
Sorry for the delay in response.
As mentioned above, you want to improve OCR quality in Acrobat, please refer to the following forum threads discussing the similar issue:
Are there any tricks to improving OCR accuracy on a previously scanned pdf?
Better PDF OCR. ClearScan is smaller, looks better
Let us know if you need any help.
Shivam
Copy link to clipboard
Copied
Hi, thanks but none of these are helpful. The first one refers to "find next suspect", it says there are no "suspects" in the files.
The second one says to re-scan the document, this is not possible, I don't have the original documents.
The third one discusses a new OCR technology that Adobe adopted with Acrobat 9, I am using Acrobat Pro DC latest version.
Seriously, this has been a problem for many years. Very disappointing that there is no solution, especially to the ligatures (e.g. "fi") problem.
Thanks,
- bill.
Copy link to clipboard
Copied
HI bill,
Sorry for the delay in response.
Would it be possible to share the pdf file you are working with to replicate the issue at our end? To share the file here in the forums, refer to the steps given in this link: How to share a file using Adobe Document Cloud
You may also share the link for the pdf via private message - How Do I Send Private Message
Also, let us know dot version of Acrobat and the operating system installed on the machine? You may refer to the steps given in this link on how to check the version in Acrobat: Identify the product and its version for Acrobat and Reader DC
Thanks,
Shivam
Copy link to clipboard
Copied
I'm also running into this problem, and have read through the solutions above. Any tips on improving OCR accuracy would be helpful.
Copy link to clipboard
Copied
You may also find some good ideas on improving OCR accuracy here: https://www.bisok.com/how-to-get-better-ocr-accuracy/ Hope this helps.

