Highlighted

Cannot search scanned/OCR'd document: identity-H encoding

New Here ,
Jul 28, 2020

Copy link to clipboard

Copied

I have scanned a 25-page document with Acrobat Pro DC (details below) from an HP MFP.  I applied OCR to the scan, and can hilight text but cannot search.  Copy/paste word into text editor results in unprintable characters.  I find that the document is in Identity-H encoding.

 

I tried the steps outlined in  https://community.adobe.com/t5/acrobat/copy-text-in-pdf-gives-me-gibberish-is-there-a-way-to-ocr-to-...   to no avail.

 

I still have the original document that I can re-scan. How can I control the encoding such that Acrobat produces a searchable document (the whole point of my scanning the document)?   Thanks.

 

What I don't understand is how Acrobat could OCR something that it cannot search itself.   

 

 

Architecture: x86_64
Build: 20.9.20067.384717
AGM: 4.30.101
CoolType: 5.14.5
JP2K: 1.2.2.46033. 

TOPICS
Scan documents and OCR

Views

32

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Cannot search scanned/OCR'd document: identity-H encoding

New Here ,
Jul 28, 2020

Copy link to clipboard

Copied

I have scanned a 25-page document with Acrobat Pro DC (details below) from an HP MFP.  I applied OCR to the scan, and can hilight text but cannot search.  Copy/paste word into text editor results in unprintable characters.  I find that the document is in Identity-H encoding.

 

I tried the steps outlined in  https://community.adobe.com/t5/acrobat/copy-text-in-pdf-gives-me-gibberish-is-there-a-way-to-ocr-to-...   to no avail.

 

I still have the original document that I can re-scan. How can I control the encoding such that Acrobat produces a searchable document (the whole point of my scanning the document)?   Thanks.

 

What I don't understand is how Acrobat could OCR something that it cannot search itself.   

 

 

Architecture: x86_64
Build: 20.9.20067.384717
AGM: 4.30.101
CoolType: 5.14.5
JP2K: 1.2.2.46033. 

TOPICS
Scan documents and OCR

Views

33

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Jul 28, 2020 0
gary_sc LATEST
Adobe Community Professional ,
Jul 28, 2020

Copy link to clipboard

Copied

Hi DaveToo,

 

It's quite possible that your not getting any search results because the quality of your scan is not getting the words you are searching for. For example, if you're searching for "apple" but the word apple in the text was converted into (say) aple, you would not find that word (because you're not searching for that word).

 

Alternatively you mention "Identity-H encoding," I have to admit I know very little about this but I did find this that explains a number of the dynamics very well.

 

https://community.adobe.com/t5/acrobat/font-encoding-settings-removing-identity-h-encoding/td-p/1060...

 

While you do say you are scanning the documents, you do not say how you are scanning them. IF the problem is caused by a poor quality scan, than it's hard to get past that for a good quality OCR. Perhaps the information in this blog I wrote may be of assistance.

 

http://photosbycoyne.com/Gary's_Help/Scanning/clean-scanning.html

 

Good luck, let us know.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Jul 28, 2020 0