• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Cannot search scanned/OCR'd document: identity-H encoding

New Here ,
Jul 28, 2020 Jul 28, 2020

Copy link to clipboard

Copied

I have scanned a 25-page document with Acrobat Pro DC (details below) from an HP MFP.  I applied OCR to the scan, and can hilight text but cannot search.  Copy/paste word into text editor results in unprintable characters.  I find that the document is in Identity-H encoding.

 

I tried the steps outlined in  https://community.adobe.com/t5/acrobat/copy-text-in-pdf-gives-me-gibberish-is-there-a-way-to-ocr-to-...   to no avail.

 

I still have the original document that I can re-scan. How can I control the encoding such that Acrobat produces a searchable document (the whole point of my scanning the document)?   Thanks.

 

What I don't understand is how Acrobat could OCR something that it cannot search itself.   

 

 

Architecture: x86_64
Build: 20.9.20067.384717
AGM: 4.30.101
CoolType: 5.14.5
JP2K: 1.2.2.46033. 

TOPICS
Scan documents and OCR

Views

277

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 28, 2020 Jul 28, 2020

Copy link to clipboard

Copied

LATEST

Hi DaveToo,

 

It's quite possible that your not getting any search results because the quality of your scan is not getting the words you are searching for. For example, if you're searching for "apple" but the word apple in the text was converted into (say) aple, you would not find that word (because you're not searching for that word).

 

Alternatively you mention "Identity-H encoding," I have to admit I know very little about this but I did find this that explains a number of the dynamics very well.

 

https://community.adobe.com/t5/acrobat/font-encoding-settings-removing-identity-h-encoding/td-p/1060...

 

While you do say you are scanning the documents, you do not say how you are scanning them. IF the problem is caused by a poor quality scan, than it's hard to get past that for a good quality OCR. Perhaps the information in this blog I wrote may be of assistance.

 

http://photosbycoyne.com/Gary's_Help/Scanning/clean-scanning.html

 

Good luck, let us know.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines