Skip to main content
January 10, 2012
Question

Proofread and correct OCR'd text in Acrobat 10 Pro

  • January 10, 2012
  • 3 replies
  • 30413 views

How do you proofread and correct text produced by OCR from a scanned document, in Acrobat 10 Pro?

I scan (many, large) paper documents, then use Recognise Text. After the OCR phase, if I save PDFs as text, I can see many scan errors.

I would like to be able to correct those errors in the scanned text, so that names etc can be successfully searched. However I cannot find any way to view and correct the scanned text.

I experimented with Tools / Content / Edit Document Text, but I cannot see how to display the scanned text to allow correction. It appears to operate on the PDF image. But if I try to change the document image to correct known errors (e.g. in spacing), and then save the PDF as text again, the string where I changed the image becomes gibberish.

How is Edit Document Text supposed to work? Is there any way to achieve what I am looking for (fixing many errors in large OCR'd documents)?

Regards,

Sue.

This topic has been closed for replies.

3 replies

New Participant
January 19, 2012

7. Oct 13, 2010 12:13 PM (in response to (Dave_Rado))

Re: Is it possible to correct Acrobat's OCR errors?

I came across this thread looking for the same information.  After playing around with some settings in Acrobat 8, I discovered the following steps to make the invisible OCR text visible for editing:

To make the OCR hidden text visible, use the Text TouchUp Tool and change the color from No Color to a visible color.  Then edit the text.  Then change the text color back to No Color.

1) With the Text TouchUp tool, select all text on the page (ctrl-A).

2) Right-click the page and click "Properties".

3) Go to the "Text" tab and select a Fill color for the font.  Now the overlaid text is visible!

4) Make any text corrections needed.

5) Select all text again and choose "No Color" for the fill text to make it invisible again.

6) Save.

If seeing the scanned image is getting  in the way, you can also go to Edit > Preferences > Page Display and uncheck  the option to "Show large images".

Hope this helps!

New Participant
March 2, 2012

milray2, I too found this thread while looking for the same information. Thanks for posting your findings - very helpful! This worked for me, exactly as you described, in Acrobat 9.

Adobe Employee
January 11, 2012

For editing the OCRed text, go to Tools pane > Recognize Text > Find First Suspect.

Now click on any text and edit. The editing will be done on the hidden text layer hence you would not be able to see the modifications. However, there is a workaround to verify your modifications by selecting and copying this text to notepad.

Please note that this workflow would not work for Clear Scan documents. .

Bernd Alheit
Braniac
January 11, 2012

The function will not find all suspects.

January 13, 2012

My conclusion is that there is no way to proofread and correct the scanned text using Acrobat X. Am I being too pessimistic?

Are there any tools other than Acrobat X that I could use to proofread and correct the hidden text? Adobe, or from other vendors?


Surely it must be possible to get access to the text layer, correct it and save it still attached to the PDF - without having to use these inefficient and ineffective tools?

Bernd Alheit
Braniac
January 11, 2012

Select the text and change the color of the text. Then you can see and change the text.