Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
2

Improve bad OCR quality

Explorer ,
Nov 07, 2018 Nov 07, 2018

For years now I have downloaded U.S. patents in PDF from Google Patents and other sources. Adobe's OCR engine routinely fails to recognize "fi" (recognizing it as "?"), misreads a lower case w as an upper case W, and sometimes misreads a lower case z as an upper case Z. I have seen this behavior consistently across hundreds of patent docuemtsn.

The only remedy I have found in this forum is to change each occurrence manually. That's not an acceptable solution. At a minimum, there should be a global search & replace. Is there?

Thanks,

- bill.

TOPICS
Scan documents and OCR
10.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Nov 29, 2018 Nov 29, 2018

Hi bill,

Sorry for the delay in response. 

As mentioned above, you want to improve OCR quality in Acrobat, please refer to the following forum threads discussing the similar issue:

improve OCR results

Are there any tricks to improving OCR accuracy on a previously scanned pdf?

Better PDF OCR. ClearScan is smaller, looks better

Let us know if you need any help.

Shivam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 29, 2018 Nov 29, 2018

Hi, thanks but none of these are helpful. The first one refers to "find next suspect", it says there are no "suspects" in the files.

The second one says to re-scan the document, this is not possible, I don't have the original documents.

The third one discusses a new OCR technology that Adobe adopted with Acrobat 9, I am using Acrobat Pro DC latest version.

Seriously, this has been a problem for many years. Very disappointing that there is no solution, especially to the ligatures (e.g. "fi") problem.

Thanks,

- bill.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 03, 2019 Jan 03, 2019

HI bill,

Sorry for the delay in response. 

Would it be possible to share the pdf file you are working with to replicate the issue at our end? To share the file here in the forums, refer to the steps given in this link: How to share a file using Adobe Document Cloud

You may also share the link for the pdf via private message - How Do I Send Private Message

Also, let us know dot version of Acrobat and the operating system installed on the machine? You may refer to the steps given in this link on how to check the version in Acrobat: Identify the product and its version for Acrobat and Reader DC

Thanks,

Shivam

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 19, 2019 Jun 19, 2019

I'm also running into this problem, and have read through the solutions above.  Any tips on improving OCR accuracy would be helpful.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 04, 2020 Mar 04, 2020
LATEST

You may also find some good ideas on improving OCR accuracy here: https://www.bisok.com/how-to-get-better-ocr-accuracy/ Hope this helps.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines