Skip to main content
Participant
August 16, 2022
Question

OCR/Recognize Text w/ Searchable Image (Exact) exporting without corrections for some filetypes/apps

  • August 16, 2022
  • 1 reply
  • 794 views

After performing OCR/"Recognize Text" with "Searchable Image (Exact)" selected in Settings for the Output, and after subsequently correcting OCR errors by pressing "Correct Recognized Text," I see the corrections reflected when I copy text directly in the file shown in Acrobat, as well as when I export as Plain Text or Plain Text (Accessible).

However, when I save as a PDF and try to open on my Mac in the native application Preview, as well as when I export as an HTML webpage, the OCR corrections are not being reflected- it is showing the uncorrected terms in the HTML one, and it is picking up the uncorrected OCR output when I copy text from the PDF that opens in the Preview app.

Is there a way to fix this?

Thank you so much.

This topic has been closed for replies.

1 reply

gary_sc
Community Expert
Community Expert
August 16, 2022

Hi Leo,

 

I'm trying to follow your chain of activities. I'm with you the entire way until the last paragraph, where you state, "However, when I save as a PDF and try…." 

 

Am I to understand that everything you've done up to that point is before the save? If that's your concern, it shouldn't make a difference because once Acrobat gets access to the document, it's already converted the data into a PDF and does so before the OCR step.

 

Can you verify that you're performing the scan from within Acrobat? If so, you're using Apple's Image Capture for the scanning.

 

Also, it sounds like the text looks correct when viewed in Acrobat but is inaccurate when viewed with Preview. Is that correct?

 

Lastly, here is the breakdown of the three modes of OCR creation. Can you please try the other two and see if the same thing occurs?

 

OCR type Options

Searchable Image

Ensures that text is searchable and selectable. This option keeps the original image, deskews it as needed, and places an invisible text layer over it. The selection for Downsample Images in this dialog box determines whether the image is downsampled and to what extent. Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").

Searchable Image (Exact)

Ensures that text is searchable and selectable. This option keeps the original image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image. Typically this is what a FedGov agency requires if submitting a scanned image of text.

Editable Text & Images

Synthesizes a new custom font that closely approximates the original and preserves the page background using a low-resolution copy.