Can Adobe Acrobat OCR exploit known properties, e.g. the font, of the scanned document?

Question

It appears to me, from a not in-depth use of Adobe Acrobat, that the OCR engine cannot be "configured" to allow exploiting any "known properties" of the document being OCR'ed. The simplest possible "known property" is the specific font used to print the document, that was later scanned. Fonts are different, and some are more "recognizable" than others, in that for some fonts there is better discrimination between character shapes. Characters that are often mis-recognized by an OCR engine, due to their similar shapes, are 0 and O, 8 and B, 5 and S. And there clearly are others. However, some fonts have better discrimination between characters than other fonts. E.g., the OCR-A family of fonts was specifically designed so that documents printed in that font could be optimally processed with OCR. Other fonts may have similar inter-character discrimination.

It appears it is not possible to specify such "known properties" of the document, or to "train" the Adobe Acrobat OCR engine to work optimally with the properties of a given document. The font is just one possible known property. Another possibility is knowledge that all characters are in a specific subset of characters, e.g., only upper case letters. Am I missing something?

Lovekesh Garg · Answer

Currently, we don't have much of this information. You check document properties(ctrl+D) to see what all fonts OCR recognized.

There are so many types of font and other information available. It's quite difficult for any OCR engine to correctly identify all those fonts information. Also depending upon scanned document properties (like resolution), it becomes more difficult to guess these font properties. Also, it will slow the OCR engine to grab all of the information.

Thanks.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded