Skip to main content
January 10, 2012
Question

Proofread and correct OCR'd text in Acrobat 10 Pro

  • January 10, 2012
  • 3 replies
  • 30427 views

How do you proofread and correct text produced by OCR from a scanned document, in Acrobat 10 Pro?

I scan (many, large) paper documents, then use Recognise Text. After the OCR phase, if I save PDFs as text, I can see many scan errors.

I would like to be able to correct those errors in the scanned text, so that names etc can be successfully searched. However I cannot find any way to view and correct the scanned text.

I experimented with Tools / Content / Edit Document Text, but I cannot see how to display the scanned text to allow correction. It appears to operate on the PDF image. But if I try to change the document image to correct known errors (e.g. in spacing), and then save the PDF as text again, the string where I changed the image becomes gibberish.

How is Edit Document Text supposed to work? Is there any way to achieve what I am looking for (fixing many errors in large OCR'd documents)?

Regards,

Sue.

This topic has been closed for replies.

3 replies

Participant
January 19, 2012

7. Oct 13, 2010 12:13 PM (in response to (Dave_Rado))

Re: Is it possible to correct Acrobat's OCR errors?

I came across this thread looking for the same information.  After playing around with some settings in Acrobat 8, I discovered the following steps to make the invisible OCR text visible for editing:

To make the OCR hidden text visible, use the Text TouchUp Tool and change the color from No Color to a visible color.  Then edit the text.  Then change the text color back to No Color.

1) With the Text TouchUp tool, select all text on the page (ctrl-A).

2) Right-click the page and click "Properties".

3) Go to the "Text" tab and select a Fill color for the font.  Now the overlaid text is visible!

4) Make any text corrections needed.

5) Select all text again and choose "No Color" for the fill text to make it invisible again.

6) Save.

If seeing the scanned image is getting  in the way, you can also go to Edit > Preferences > Page Display and uncheck  the option to "Show large images".

Hope this helps!

Participant
March 2, 2012

milray2, I too found this thread while looking for the same information. Thanks for posting your findings - very helpful! This worked for me, exactly as you described, in Acrobat 9.

Adobe Employee
January 11, 2012

For editing the OCRed text, go to Tools pane > Recognize Text > Find First Suspect.

Now click on any text and edit. The editing will be done on the hidden text layer hence you would not be able to see the modifications. However, there is a workaround to verify your modifications by selecting and copying this text to notepad.

Please note that this workflow would not work for Clear Scan documents. .

Bernd Alheit
Community Expert
Community Expert
January 11, 2012

The function will not find all suspects.

January 12, 2012

Thanks, Bernd and apangasa.

I tried the method you describe. I OCR'd a scanned file using Readable Image (Exact). I saved it as PDF and as txt; the txt revealed many scannoes.

I found that Find Suspects found nothing, so I highlighted text where I knew there was an error. Right mouse click then gave me a list of options, of which only "Replace text" sounded useful, so I used that. It presented as if I was typing an annotation, and it put a blue line through the text I had "replaced".

I then saved the file as PDF and as txt. The txt had no changes whatsoever - no corrections vs, the original.

Am I doing something wrong?

Regards,

Sue.

Bernd Alheit
Community Expert
Community Expert
January 11, 2012

Select the text and change the color of the text. Then you can see and change the text.