Copy link to clipboard
Copied
I have scanned a 25-page document with Acrobat Pro DC (details below) from an HP MFP. I applied OCR to the scan, and can hilight text but cannot search. Copy/paste word into text editor results in unprintable characters. I find that the document is in Identity-H encoding.
I tried the steps outlined in https://community.adobe.com/t5/acrobat/copy-text-in-pdf-gives-me-gibberish-is-there-a-way-to-ocr-to-... to no avail.
I still have the original document that I can re-scan. How can I control the encoding such that Acrobat produces a searchable document (the whole point of my scanning the document)? Thanks.
What I don't understand is how Acrobat could OCR something that it cannot search itself.
Architecture: x86_64
Build: 20.9.20067.384717
AGM: 4.30.101
CoolType: 5.14.5
JP2K: 1.2.2.46033.
Copy link to clipboard
Copied
Hi DaveToo,
It's quite possible that your not getting any search results because the quality of your scan is not getting the words you are searching for. For example, if you're searching for "apple" but the word apple in the text was converted into (say) aple, you would not find that word (because you're not searching for that word).
Alternatively you mention "Identity-H encoding," I have to admit I know very little about this but I did find this that explains a number of the dynamics very well.
While you do say you are scanning the documents, you do not say how you are scanning them. IF the problem is caused by a poor quality scan, than it's hard to get past that for a good quality OCR. Perhaps the information in this blog I wrote may be of assistance.
http://photosbycoyne.com/Gary's_Help/Scanning/clean-scanning.html
Good luck, let us know.