I have and old manually typed document that I scaned to PDF. The PDF document reads well. I then converted the PDF to Word file from Acrobat DC, In some instancies the Word file had a text compression (letters and spaces mashed together) I tried changing fonts, paragraph and line spacing and could not correct.. When I pasted the compressed line in this text box the compression went away. So I made a jpeg with a screen capture.
First off, I can't help you. However, I can join in your grief in that I also had a typed document that after scanning and OCR-ing also had dreadful results. Parts were so bad it was eaiser to retype them than to repair the text.
Other than to rasis my hand and say "Me Too," was curious as to what typewriter did the original? The original that I was dealing with used the Courier font. Was yours a Selectric where you could change the font ball?
Anyhow, good luck to all. I find it interesting that when processing text in a document, manual typing text seems to work less well than other kinds of text on paper. Curious.
Thank you for reporting the issue. Have you started facing this issue recently?
Can you please tell us if it is specific to some font or a file? If possible, can you please share the input and output files with us so that we can take a look into the issue and provide you the better experience with our services.
Thanks and Regards,
Software Engineer II
Adobe Acrobat Team
We had a similar example about a year ago.
Found that the text (or portions of the text) had kerning/tracking applied, and it probably was introduced by Acrobat's OCR utility, attempting to mimic/represent the visual appearance of the original scanned text.
Adobe, please check your utility: it should not be adding tracking/kerning/letterspacing to any text during the OCR process.
We corrected the resulting Word file by removing all manual overrides at the character level. Two ways to do that in Word/Windows (Mac-ers, only the second method is available in Word/Mac):
Hope this helps.
Interesting approach, good to know for the future.
What I did was to save the documents as straight, unformatted text ( .txt) that removed ALL formatting, including the page breaks so that importing that back into Word for subsequent formatting and correcting all of the OCR errors.
Saving to .txt format (ascii text) often does the trick on this type of legacy formatting.
But not always.
We've found ascii retains and passes through some deep formatting, like mail merge codes and section breaks.
So give .txt a try, and if that doesn't completely strip the file down to its pure content, you'll have to do more extensive stripping. It's one reason why we keep a copy of Corel WordPerfect on one of our computers: its Reveal Codes utility is a lifesaver! (FYI, Corel is now Alludo.)
Interesting. Unfortunately I am not sure one can find WordPerfect any more for the Mac. I wonder if BBEdit (by BareBones Software) might also be a viable option. I can't check right now as I'm on holiday and am away from my computer with all of the software that I normally have access to. (I'm using my wife's laptop right now.)
When I get back in early Oct I'll see what BBEdit can do with those issues.
Thanks for the thumbs up on this issue, I didn't know that.
its not helpful..