How to Edit/Fix OCR errors by Acrobat Pro DC?
- February 14, 2022
- 2 replies
- 5429 views
Two problems, really, and both relate to the text as recognized (or not) in scanned images of text documents (or in text and image documents) by Acrobat Pro DC's OCR capability:
(1) How can I access the full OCR text (hidden) file so I can correct the (inevitable) OCR errors in text that the program does not identify as a "suspect" to make accessible for editing via the "Correct Recognized Text" feature? I.e., the program "thinks" it has correctly identified and spelled a word, so it failed to present that text as a "suspect" for possible correction, but after the "text" in that document is copied and pasted into a separate document, the human eye can easily ascertain that the OCR software mispelled or misinterpreted the image of that text, so the user desires to correct that error in the OCR's "text" to assure later search accuracy.)
(2) The other part of the problem occurs when the OCR fails to identify (at all) some portions of text image. In those events, I would have thought I could use the Edit PDF | Edit feature to insert that the missing text, but I could not. I found the "missing text" located outside of the border of one of the "text boxes" (where text was recognized and can be edited), but I couldn't find a way to move the boundaries of that text box one to "bring in" that unrecognized text and, worse, trying to do so distorted the text characters that initially were in that "text box". So this problem is really how to add into the "searchable text" items that the OCR failed to identify as text. (I've attached a pdf with 2 cropped screenshots, the first is of a portion of the pdf before OCR & trying to edit the OCR'd text; and the second is of that same portion of the pdf while in the Edit mode and that shows the text boxes surrounding text the software" recognized [that I can manually edit] as well as the text in the original that the software DID NOT recognize and therefore did not include in a text box, none of which I can edit.
Adobe musts know both of these issues exist, so I presume there must be some way to address them so I can end up with a correct and complete text file that can be searched. However, I just cannot tools that seem to be able to make these these types of corrections to the text.
I've previously used other standalone OCR software that easily permits making these sorts of corrections (including adding missing text) to the underlying "searchable text" of an OCR'd image, but I just can't figure out where Adobe has hidden these capabilities within Acrobat Pro DC. Or, if these capabilities aren't present in Acrobat Pro DC, why in the world would Adobe not have included them?
I will deeply appreciate any help on how best to deal with these two problems.
