OCR does not recognize language correctly

Question

We downloaded a trial version of Acrobat DC to see if we could use it to convert docs to PDF/A for records management and archiving purposes.

Some of the documents are just scans, and need OCR first. As we want to add metadata based on content, we have to open and treat each document individually. Acrobat DC immediately starts OCR-conversion without asking, on the assumption that:

- we want the document to be OCR'd (correct)

- it can identify the language of the document itself.

Well that second assumption is wrong: All documents we have tested are identified as being written in Dutch, whereas some are actually in French and even in English (incredible but true). So for every document we have to wait for the first OCR to complete and then have a rerun where we correct the language settings - which is extremely time consuming.

Is there a way to prevent OCR conversion starting automatically and have it run only after defining yourself what the language of the document is?

Lovekesh Garg · Answer

Thanks for reporting your concern.Yes we can prevent it running OCR automatically. - Go to Acrobat preferences (Ctrl+K or Edit> Preferences)- Go to Convert to PDF> BMP/TIFF- Edit Settings> Scan Optimization Settings> Uncheck “Recognize Text” checkbox.It will disable OCR for all BMP/TIFF files while opening them in AcrobatNow you can run Text recognition whenever you want with your own settings everytime.Hope it will resolve your issue. Please feel free to ask anything you want.Thanks.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.