Copy link to clipboard
Copied
Even in a very clear image of English text in a sans-serif font such as Helvetica, OCR produces numerous artifacts and recognition errors. And many of those don't show up as 'suspects' and may not be visible in Edit mode — only when exported as text. Results are way below what I could get from separate OCR 10 years ago.
How can I get more usable text recognition?
If you run OCR using 'Editable text & Images', it won't show any suspect. You can go to edit PDF tool and change any word.
Otherwise after running OCR, click 'Review recognize text' checkbox. Now you can make any word as a suspect by double clicking on it. Enhance Scan/Recognize Text>Correct recognize text> Review recognize text.
Thanks.
Copy link to clipboard
Copied
We apologize for the issue you are facing. Can you please share following information to help us identify and resolve the issue ASAP:
- Acrobat version you are using
- Operating system
- OCR method
- 1 sample PDF file where you are facing this issue(you can use https://cloud.acrobat.com/send for sharing)
Thanks.
Copy link to clipboard
Copied
Uploaded a couple of samples to
files.acrobat.com/a/preview/e18a6014-1494-48b5-8322-366d0571c5b5 <https://files.acrobat.com/a/preview/e18a6014-1494-48b5-8322-366d0571c5b5>
Copy link to clipboard
Copied
Acrobat Pro DC
Architecture: x86_64
Build: 15.20.20039.203716
AGM: 4.30.66
CoolType: 5.14.5
JP2K: 1.2.2.37137
Currently running Mac OS Sierra 10.12.2 Beta (16C32e); problem first noticed on Sierra 10.12.1
Original method was just to Enhance Scan/Recognize Text, but it was difficult to capture the recognized text. Then tried just opening the PDF and doing File/Export to RTF. The results are better this way.
I discarded the earlier problem files but will post a couple of less-extreme examples.
Copy link to clipboard
Copied
Thanks for sharing the files. If I am not wrong, issue you are talking about is two words overlapped after recognizing text.
Enhance Scan/Recognize Text>Correct recognize text> Review recognize text.
Please use Editable text & Image once. Also specify the settings you are using.
Thanks.
Copy link to clipboard
Copied
Sometimes. It also involves characters within a word being overlapped and mis-recognized characters.
When I try the 'overlapped text.png’ and try to review the text, it says there are no suspects. What settings do you want to know?
Copy link to clipboard
Copied
If you run OCR using 'Editable text & Images', it won't show any suspect. You can go to edit PDF tool and change any word.
Otherwise after running OCR, click 'Review recognize text' checkbox. Now you can make any word as a suspect by double clicking on it. Enhance Scan/Recognize Text>Correct recognize text> Review recognize text.
Thanks.
Copy link to clipboard
Copied
Thank you — this is very helpful!
Copy link to clipboard
Copied
Hello, Community! Recently, I started to experience difficulties with the Read-Out Loud function, after having processed my PDF-file with OCR. I am still using Acrobat Pro, v. 9. What I am experiencing is that for a given block of recognized text, I will hear a line being read correctly, and then the same words, repeated, indiviually, at a much slower speed, with occational gibberish thrown in for good measure!
W#hat in the world has happened to the OCR process with Acrobat Pro v. 9? Curious minds would like to know!
Copy link to clipboard
Copied
image *of* English text? what is that supposed to mean? idk about you, but I'm just trying to convert an image of this weird duck lookin thing to text, it doesn't have any text because why would it have text?