Copy link to clipboard
Copied
Product: Adobe Acrobat DC
Function: Recognize Text
When looking in the settings, there are 3 options provided in the drop down menu.
Output:
(1). Searchable Image
(2). Searchable Image (Exact)
(3). Editable Text and Images
Please could someone explain, what are the detailed differences for each of these options.
I am unable to find clear and detailed information about the same Adobe Help, hence this post.
Thanks in advance.
Copy link to clipboard
Copied
#1 - Provides an OCR output whose glyphs have no stroke or fill -- so, "invisible" or "hidden".
This method also dresses up the image a wee bit. Thus, an altered image rather than the exact image is provided by the scanner.
Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").
#2. An OCR output developed as in #1. But, the exact image remains untouched.
Typically this is what a FedGov agency requires if submitting a scanned image of text.
So, the original image out of the scanner maintains its integrity and the OCR output supports find / search.
#3 ClearScan - Introduced a few versions back. When the bit-map of a character's image is recognized that is replaced with a font (character glyph is seen as it has fill and stroke applied). What is not recognized is left. And more magic...
Bottom line - That image out of the scanner that *was* the exact replica of the hardcopy and thus a valid/legal document of record is blown away, gone. Typically not acceptable for something submitted to a FedGov agency.
Copy link to clipboard
Copied
#1 - Provides an OCR output whose glyphs have no stroke or fill -- so, "invisible" or "hidden".
This method also dresses up the image a wee bit. Thus, an altered image rather than the exact image is provided by the scanner.
Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").
#2. An OCR output developed as in #1. But, the exact image remains untouched.
Typically this is what a FedGov agency requires if submitting a scanned image of text.
So, the original image out of the scanner maintains its integrity and the OCR output supports find / search.
#3 ClearScan - Introduced a few versions back. When the bit-map of a character's image is recognized that is replaced with a font (character glyph is seen as it has fill and stroke applied). What is not recognized is left. And more magic...
Bottom line - That image out of the scanner that *was* the exact replica of the hardcopy and thus a valid/legal document of record is blown away, gone. Typically not acceptable for something submitted to a FedGov agency.
Copy link to clipboard
Copied
Dear Gary,
I thank you so very much for your very detailed, crisp and clear response in point form.
Thank you so very much for taking out the time and putting in the efforts to respond so wonderfully.
It is very much appreciated by myself.
Regards,
Hormuz
Copy link to clipboard
Copied
My pleasure
Copy link to clipboard
Copied
Work in gov. Thank you for the explanation, it helped a lot.
Copy link to clipboard
Copied
Thanks, however for translation and then correct text for Diagrams (Machine Operating and Maintenance manual) , is there a setting ? Translated Text from default settings are
My best shot is to post-process Translated PDF to Word and then turn off images; is there a simpler way?