Problem with OCR deleting some information on document

Report · Nov 06, 2018

We have problems with the OCR on some documents. I contacted Adobe technical support for help and said that the help I received did not answer my problem at all.

This situation is critical for my company because the OCR removes certain information from the original document and that it is legal documents that can not be altered as well!

The file in error currently works if I transfer it in image or if I pass it in post script and return in PDF but not if I work directly on it.

The proposed solution is to go through the ADOBE PDF driver but it is impossible for us for several reasons.

Is there any way that a programmer evaluates my file?

Report · Nov 06, 2018

OK, I think I have enough information. (Hopefully)

When you are setting up your Acrobat to do the OCR, go into settings (as shown below) and set the Output to Searchable Image (Exact).

IF that doesn't work (or you already have that set), try the other two and see if that makes a difference.

The only reason I'm being a bit nebulous here is that different scanners can do different things to an image that may help/hurt this process.

Let us know...

View solution in original post

Report · Nov 06, 2018

HI Stephanie,

My wife's a retired attorney so yes, I do understand how critical this is.

Some questions to start:

Are these already scanned documents or are you doing the scanning in house?

If you're scanning in house, how are you doing them? (E.g., what kind of scanner, what resolution, are you scanning from within Acrobat or from the scanner's software that came with the scanner, what is your OS, what version of Acrobat, etc., etc.)

What is the nature of the part of the page that is being removed? That is is it hand-written notes, stamps on page, what?

Depending on all of these things, there should be some answers.

Lastly, would it be possible for you to redact the sections on the page that is confidential and share the rest with us so we can see what's taking place?

Report · Nov 06, 2018

This is the courthouse that provides me the document, so I have no control over how it is generated. The current document was produced by Xerox WorkCentre 5855, PDF version 1.5 (Acrobat 6.x).

The page contain hand-writting and text. I need to make OCR on the complet page. Is one of the hand-0writting line that dissapear.

I cannot redact the file since all the contains is confidential. Can i send a link to a specif person?

Report · Nov 06, 2018

OK, I think I have enough information. (Hopefully)

When you are setting up your Acrobat to do the OCR, go into settings (as shown below) and set the Output to Searchable Image (Exact).

IF that doesn't work (or you already have that set), try the other two and see if that makes a difference.

The only reason I'm being a bit nebulous here is that different scanners can do different things to an image that may help/hurt this process.

Let us know...

Report · Nov 06, 2018

When i set the option to Searchable Image (Exact), it's ok.

Ideally, I need to use the function "Editable Text and images" to correct the orientation of the page and to be able to isolate certain part of the page.

Report · Nov 06, 2018

HI Stephanie,

Can you straighten the images in Photoshop prior to Acrobat? Yes it would add to the time but it would also straighten the pages.

But in the grand scheme of things, how important is that?

But for your initial problems, did the Searchable Image (Exact) solve your main problem?

Report · Nov 06, 2018

It's impossible. We have 5000 pages to deal with each months!

The Court of appeal ask for OCR on each page. We also have to give the best image, which supposed to be straight. Also, some times we need to edit each element in the page.

If I cannot solve this issue, I will use the searchable image option and work manually on documeng who needed it.

Report · Feb 21, 2025

I've had multiple problems with entire pages of text disappearing when running the OCR Recognize text function on a PDF document. I recently had a problem again, and in order to restore it to it's original state, I backed out of the PDF itself. Then I right clicked on the PDF file name, chose "Restore to Previous Version" and restored it to the date I originally created/downloaded/saved the PDF. This restored all of the text that had been erased (thank goodness!).

I then went back into the PDF, went to Print, then print to "Adobe PDF", which creates another PDF copy of it, which I then saved. Then that saved version I just made could then be scanned with OCR Text Recognition with no problems and no erasing data.

Report · Mar 12, 2025

Hi @Cristal319841111ozk,

Hope you are doing well. Sorry for the trouble, and the delayed response.

This looks very strange, and shouldn't be happening. Would you mind sharing a few pieces of information for further investigation:

1. The OS and version of your system;

2. A screen recording of the entire event for a better understanding;

3. A sample file where you experience the issue.

Also, please ensure you are on latest version (2025.001.20432) for the best experience.

To do so, go to Menu-> Help-> Check for Updates.

Regards,
Souvik.