New Participant

Question

OCR issue

Forum|Forum|4 years ago
February 23, 2021
3 replies
2173 views

user is receiving an error when attempting to process a PDF file for OCR that was created from a microsoft word document; "Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text."

The reason for processing it through OCR is to allow the document's contents to be searchable via windows explorer search bar.

From my understanding the OCR is not running because it is already renderable, but appears to be not searchable.

I did find an articale that suggested running the OCR process on a different source of the file, but its not clear how they would do that when they are just saving as PDF from Word 2016.

The other solution suggested is to convert the PDF to TIFF then run OCR, this could work but seems like alot of additional work for something that should be a single step.

are there specific options that need to be selected when saving it as PDF in microsoft word in order to allow adobe pro to run the OCR process as expected?

R

RBOstrum

New Participant

Hi towcts,

This thread is a little old but I thought I would comment anyway. Most replies are correct, OCR only works on an image. But your file appears to be mixed image and text - of which the latter cannot be OCR'd and may therefore produce an error if you try. In my experience ms word does this sat of thing very well.

The situation does seem a little strange because generally text information is handled fine by an OCR engine. It is simply copied and put into the final product, as-is.

Although it is not a very elegant solution, I would recommend converting to some image format then performing OCR on the entire page.

I would love to hear about whatever solution you came up with.

S_S

Community Manager

Hi @9622631,

Hope you are doing well. Thanks for writing in!

Have you tried printing the file using an Adobe PDF printer? This should flatten the PDF file and allow Acrobat to run OCR on the entire document instead of separate sections, or, rather, significantly reduce the difference in the quality of OCR performed.

Hope this helps.

Regards,
Souvik.

F

fde_5279

New Participant

Hi Souvik,

Thank you for replying. Unfortunately, I am not the one who has the original problem. I was simply making a comment on someone else's issue.

I do find your comments interesting though. I would interpret it that an MS word file printed produced using the adobe PDF printer must therefore be flattened and turned into a number of image files. Otherwise, there would be no OCR performed at all. As I indicated, normally, if text exists in an MS file when a PDF is produced it is simply transferred through to the resulting PDF as is with no OCR performed at all. Mind you, I might be misunderstanding what you are saying.

Brent

a_C_student16379412

Inspiring

Hi towcts, OCR won't work on a document that is already text. If it did somehow run, I'm afraid it would do nothing to solve the searching problem. Something is wrong either with the document or (more likely, I think) the user's workflow. I would try recreating the PDF from Word using the PDF Maker plugin, or if the plugin isn't available then Save As PDF. If the search still doesn't work, what exact steps is the user doing to perform the search?

gary_sc

Community Expert

Hi Towcts,

OCR is manditory if you want to search a scanned document. Otherwise, a document that has been scanned is a "picture" of the text, there is no actual text.

If you have a Word document (or any digitally created document with text, e.g., Excel, InDesign, etc.) and properly convert that into a PDF, it is searchable. If you can select the text in a document with your cursor for copy and paste, it should be searchable. Acrobat Standard or Pro will not, cannot run OCR on a document that is searchable any more than a doctor will run an EKG on a patient who's obviously dead — there's no point.

As a Mac user I am not familiar with Microsoft Explorer Search but I have a hunch it is similar (in kind) to Apple's "Spotlight," so if I am in error on this specific issue, I hope to be corrected.

If you are sure that Explorer's Search is not picking up the data from your document, I wonder if there was something wrong in the way it was converted into a PDF?

Exactly what was the PDF process done on the document in question?

T

towctsAuthor

New Participant

I had the user walk through the steps. the first error message about renderable text comes up on some, but not all pages in their document. after ignoring the errors the file is saved as an A/2B PDF file, at which point the additional error occurs(see attached). after this step they perform a preflight on it; which looks like just a verification step that the file is good to go.

using windows file explorer we ere not able to search the directory where the PDF was saved to for a word locaed inside the PDF file. I noted that seach file contents was not checked, so I ensured that option was enabled in the windows file explorer search bar, but still did not show the expected document in the search.

gary_sc

Community Expert

Hi Towcts,

You still have not provided exactly what steps you are performing to create the PDF however, you do mention some issues that are very PC specific and that means I'll have to beg off. As stated, I'm a Mac guy and you're getting into PC terminology about which I know not much at all.

I'm sure a PC person can slip in here and help you far better than I can.

One thing though, a number of us "helpers" will not help someone who places screenshots as attachments. If you could please click on the photo icon above where you type (see below) you can attach the image and it will show in the forum thread. Much easier for all.

Good luck!

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded