Why so many OCR errors

Report · Oct 31, 2016

Even in a very clear image of English text in a sans-serif font such as Helvetica, OCR produces numerous artifacts and recognition errors. And many of those don't show up as 'suspects' and may not be visible in Edit mode — only when exported as text. Results are way below what I could get from separate OCR 10 years ago.

How can I get more usable text recognition?

Report · Nov 02, 2016

We apologize for the issue you are facing. Can you please share following information to help us identify and resolve the issue ASAP:

- Acrobat version you are using

- Operating system

- OCR method

- 1 sample PDF file where you are facing this issue(you can use https://cloud.acrobat.com/send for sharing)

Thanks.

Report · Nov 02, 2016

Uploaded a couple of samples to

files.acrobat.com/a/preview/e18a6014-1494-48b5-8322-366d0571c5b5 <https://files.acrobat.com/a/preview/e18a6014-1494-48b5-8322-366d0571c5b5>

Report · Nov 02, 2016

Acrobat Pro DC

Architecture: x86_64

Build: 15.20.20039.203716

AGM: 4.30.66

CoolType: 5.14.5

JP2K: 1.2.2.37137

Currently running Mac OS Sierra 10.12.2 Beta (16C32e); problem first noticed on Sierra 10.12.1

Original method was just to Enhance Scan/Recognize Text, but it was difficult to capture the recognized text. Then tried just opening the PDF and doing File/Export to RTF. The results are better this way.

I discarded the earlier problem files but will post a couple of less-extreme examples.

Report · Nov 04, 2016

Thanks for sharing the files. If I am not wrong, issue you are talking about is two words overlapped after recognizing text.

Enhance Scan/Recognize Text>Correct recognize text> Review recognize text.

Please use Editable text & Image once. Also specify the settings you are using.

Thanks.

Report · Nov 04, 2016

Sometimes. It also involves characters within a word being overlapped and mis-recognized characters.

When I try the 'overlapped text.png’ and try to review the text, it says there are no suspects. What settings do you want to know?

Report · Nov 08, 2016

If you run OCR using 'Editable text & Images', it won't show any suspect. You can go to edit PDF tool and change any word.

Otherwise after running OCR, click 'Review recognize text' checkbox. Now you can make any word as a suspect by double clicking on it. Enhance Scan/Recognize Text>Correct recognize text> Review recognize text.

Thanks.

Report · Nov 08, 2016

Thank you — this is very helpful!

Report · Jul 02, 2024

Hello, Community! Recently, I started to experience difficulties with the Read-Out Loud function, after having processed my PDF-file with OCR. I am still using Acrobat Pro, v. 9. What I am experiencing is that for a given block of recognized text, I will hear a line being read correctly, and then the same words, repeated, indiviually, at a much slower speed, with occational gibberish thrown in for good measure!

W#hat in the world has happened to the OCR process with Acrobat Pro v. 9? Curious minds would like to know!

Report · Jun 30, 2024

image *of* English text? what is that supposed to mean? idk about you, but I'm just trying to convert an image of this weird duck lookin thing to text, it doesn't have any text because why would it have text?

Why so many OCR errors

1 Correct answer