Acrobat 2017 optimize OCR on watermarked documents

Question

I am using Acrobat 2017 and I have a document with a watermark on it and need to make the text searchable, but the problem I am having is when I OCR the document, text around and under the watermark is not searchable.

It seems like OCR in 2015 did a much better job.
Is there something I can do to optimize the OCR?

gary_sc · Accepted Answer

Hi ATP2018,Let me add to Akanchha statement: Having not seen the previous or current documents with watermarks, it's impossible to know exactly but consider: how could or should any application at this point in time know what's underneath the watermark?I say "at this point in time" because it's always possible that at some point, machine knowledge (or AI) will recognize full sentences and be able to conjecture what text might be missing. We kinda/sorta see this when auto-correct on our phones assumes it knows what text we meant to write (and you know how accurate that has been up to date).There is a long-standing observation with Photoshop when someone is teaching and shows how to remove unwanted objects, someone from the audience will ask how does Photoshop know what's behind the sign that was just removed? [Well, PS doesn't know any more than the PS user who cloned something near the sign to replace the sign.]My point is that any OCR package can only work with what it has. As Akanchha, pointed out, a poor quality scan has a direct affect on the quality of the resultant OCR. For example, the text combination "ir" may be seen as "n" or "ii"unless the scan is of very good quality and high resolution. If there is ANYTHING that obscures or alters the text, be it a watermark or a pen (or pencil line), all bets are off.Hopefully AI will get better on this but by then, and if Skynet gets any more powerful, all bets are off.

AkanchhaS8194121 · Answer

Hi ATP2018,

We apologize for the delay in response to your query.

The result of an OCR document depends on the quality of a scanned document you are working on. The OCR function work perfectly on the high quality scanned file, that helps in the text recognition.

In your case there could be a possibility text under the watermark is blur or have poor resolution quality, so while running the OCR its not recognizable.

You may check the related thread- Improve Quality of a PDF document

Regards,

Akanchha

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded