How to output OCR files in clear text?
- August 24, 2022
- 2 replies
- 946 views
I am a novice user with Adobe Acrobat Pro 2020, and I would like some help.
I am a very old retired home user, and so nothing I produce needs to conform with any publishing, etc, requirements. Everything is for my own use.
Let me explain the issue I can't master. I am often dealing with 100 - 200 year old text. I am very impressed with the quality of Adobe's OCR. From what I see, OCR produces text letters that looks very similar to that which was OCR'd. ie it is readable but imperfect text, no crispness, but still correctly OCR'd. Where a bit of the original is too distorted for the OCR to recognise, the OCR outputs that bit as it looked after scanning, ie not OCR'd. Seems very clever to me.
I want to output my pdf with the recognised characters present as clear, crisp letters (should be possible as the OCR has correctly recognised it all), with the OCR still using the occasional substitution of a sort of facsimile of bits it can't recognise.
I have tried exporting to Word, but that introduces multiple errors, which are not apparent in Adobe's output. Exporting to text, text (aceesible), rtf, all introduce extra errors.
Now, what I am asking, can I get Adobe OCR to output with a crisp text (not fussy about the font), while still substituting (what I call a facsimile) of the bits the OCR can't recognise.
I have attached an image to show what I see when I OCR an old text, and below that is the same file exported to Word. I am hoping to achieve is for Adobe's OCR to output crisp letters like in the bottom image, etc.
Take care in these dangerous times,
Doug
