Skip to main content
ATP2018
Participant
April 18, 2018
Answered

Acrobat 2017 optimize OCR on watermarked documents

  • April 18, 2018
  • 2 replies
  • 1967 views

I am using Acrobat 2017 and I have a document with a watermark on it and need to make the text searchable, but the problem I am having is when I OCR the document, text around and under the watermark is not searchable.

It seems like OCR in 2015 did a much better job.
Is there something I can do to optimize the OCR?

This topic has been closed for replies.
Correct answer gary_sc

Hi ATP2018,

Let me add to Akanchha statement: Having not seen the previous or current documents with watermarks, it's impossible to know exactly but consider: how could or should any application at this point in time know what's underneath the watermark?

I say "at this point in time" because it's always possible that at some point, machine knowledge (or AI) will recognize full sentences and be able to conjecture what text might be missing. We kinda/sorta see this when auto-correct on our phones assumes it knows what text we meant to write (and you know how accurate that has been up to date).

There is a long-standing observation with Photoshop when someone is teaching and shows how to remove unwanted objects, someone from the audience will ask how does Photoshop know what's behind the sign that was just removed? [Well, PS doesn't know any more than the PS user who cloned something near the sign to replace the sign.]

My point is that any OCR package can only work with what it has. As Akanchha, pointed out, a poor quality scan has a direct affect on the quality of the resultant OCR. For example, the text combination "ir" may be seen as "n" or "ii"unless the scan is of very good quality and high resolution. If there is ANYTHING that obscures or alters the text, be it a watermark or a pen (or pencil line), all bets are off.

Hopefully AI will get better on this but by then, and if Skynet gets any more powerful, all bets are off.

2 replies

gary_sc
Community Expert
gary_scCommunity ExpertCorrect answer
Community Expert
May 6, 2018

Hi ATP2018,

Let me add to Akanchha statement: Having not seen the previous or current documents with watermarks, it's impossible to know exactly but consider: how could or should any application at this point in time know what's underneath the watermark?

I say "at this point in time" because it's always possible that at some point, machine knowledge (or AI) will recognize full sentences and be able to conjecture what text might be missing. We kinda/sorta see this when auto-correct on our phones assumes it knows what text we meant to write (and you know how accurate that has been up to date).

There is a long-standing observation with Photoshop when someone is teaching and shows how to remove unwanted objects, someone from the audience will ask how does Photoshop know what's behind the sign that was just removed? [Well, PS doesn't know any more than the PS user who cloned something near the sign to replace the sign.]

My point is that any OCR package can only work with what it has. As Akanchha, pointed out, a poor quality scan has a direct affect on the quality of the resultant OCR. For example, the text combination "ir" may be seen as "n" or "ii"unless the scan is of very good quality and high resolution. If there is ANYTHING that obscures or alters the text, be it a watermark or a pen (or pencil line), all bets are off.

Hopefully AI will get better on this but by then, and if Skynet gets any more powerful, all bets are off.

AkanchhaS8194121
Legend
May 2, 2018

Hi ATP2018,

We apologize for the delay in response to your query.

The result of an OCR document depends on the quality of a scanned document you are working on. The OCR function work perfectly on the high quality scanned file, that helps in the text recognition.

In your case there could be a possibility text under the watermark is blur or have poor resolution quality, so while running the OCR its not recognizable.

You may check the related thread- Improve Quality of a PDF document

Regards,

Akanchha