Can Adobe Acrobat OCR exploit known properties, e.g. the font, of the scanned document?
It appears to me, from a not in-depth use of Adobe Acrobat, that the OCR engine cannot be "configured" to allow exploiting any "known properties" of the document being OCR'ed. The simplest possible "known property" is the specific font used to print the document, that was later scanned. Fonts are different, and some are more "recognizable" than others, in that for some fonts there is better discrimination between character shapes. Characters that are often mis-recognized by an OCR engine, due to their similar shapes, are 0 and O, 8 and B, 5 and S. And there clearly are others. However, some fonts have better discrimination between characters than other fonts. E.g., the OCR-A family of fonts was specifically designed so that documents printed in that font could be optimally processed with OCR. Other fonts may have similar inter-character discrimination.
It appears it is not possible to specify such "known properties" of the document, or to "train" the Adobe Acrobat OCR engine to work optimally with the properties of a given document. The font is just one possible known property. Another possibility is knowledge that all characters are in a specific subset of characters, e.g., only upper case letters. Am I missing something?
