How to output OCR files in clear text?

Question

I am a novice user with Adobe Acrobat Pro 2020, and I would like some help.

I am a very old retired home user, and so nothing I produce needs to conform with any publishing, etc, requirements. Everything is for my own use.

Let me explain the issue I can't master. I am often dealing with 100 - 200 year old text. I am very impressed with the quality of Adobe's OCR. From what I see, OCR produces text letters that looks very similar to that which was OCR'd. ie it is readable but imperfect text, no crispness, but still correctly OCR'd. Where a bit of the original is too distorted for the OCR to recognise, the OCR outputs that bit as it looked after scanning, ie not OCR'd. Seems very clever to me.

I want to output my pdf with the recognised characters present as clear, crisp letters (should be possible as the OCR has correctly recognised it all), with the OCR still using the occasional substitution of a sort of facsimile of bits it can't recognise.

I have tried exporting to Word, but that introduces multiple errors, which are not apparent in Adobe's output. Exporting to text, text (aceesible), rtf, all introduce extra errors.

Now, what I am asking, can I get Adobe OCR to output with a crisp text (not fussy about the font), while still substituting (what I call a facsimile) of the bits the OCR can't recognise.

I have attached an image to show what I see when I OCR an old text, and below that is the same file exported to Word. I am hoping to achieve is for Adobe's OCR to output crisp letters like in the bottom image, etc.

Take care in these dangerous times,

Doug

JR Boulay · Answer

Otherwise you can use this Preflifht fixup, but as suggested it will show you a layer containing "invisible" text only, so it just can be copy-pasted. 1. 2.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded