OCR issue
Copy link to clipboard
Copied
user is receiving an error when attempting to process a PDF file for OCR that was created from a microsoft word document; "Acrobat could not perform recognition (OCR) on this page because: This page contains renderable text."
The reason for processing it through OCR is to allow the document's contents to be searchable via windows explorer search bar.
From my understanding the OCR is not running because it is already renderable, but appears to be not searchable.
I did find an articale that suggested running the OCR process on a different source of the file, but its not clear how they would do that when they are just saving as PDF from Word 2016.
The other solution suggested is to convert the PDF to TIFF then run OCR, this could work but seems like alot of additional work for something that should be a single step.
are there specific options that need to be selected when saving it as PDF in microsoft word in order to allow adobe pro to run the OCR process as expected?
Copy link to clipboard
Copied
Hi Towcts,
OCR is manditory if you want to search a scanned document. Otherwise, a document that has been scanned is a "picture" of the text, there is no actual text.
If you have a Word document (or any digitally created document with text, e.g., Excel, InDesign, etc.) and properly convert that into a PDF, it is searchable. If you can select the text in a document with your cursor for copy and paste, it should be searchable. Acrobat Standard or Pro will not, cannot run OCR on a document that is searchable any more than a doctor will run an EKG on a patient who's obviously dead — there's no point.
As a Mac user I am not familiar with Microsoft Explorer Search but I have a hunch it is similar (in kind) to Apple's "Spotlight," so if I am in error on this specific issue, I hope to be corrected.
If you are sure that Explorer's Search is not picking up the data from your document, I wonder if there was something wrong in the way it was converted into a PDF?
Exactly what was the PDF process done on the document in question?
Copy link to clipboard
Copied
I had the user walk through the steps. the first error message about renderable text comes up on some, but not all pages in their document. after ignoring the errors the file is saved as an A/2B PDF file, at which point the additional error occurs(see attached). after this step they perform a preflight on it; which looks like just a verification step that the file is good to go.
using windows file explorer we ere not able to search the directory where the PDF was saved to for a word locaed inside the PDF file. I noted that seach file contents was not checked, so I ensured that option was enabled in the windows file explorer search bar, but still did not show the expected document in the search.
Copy link to clipboard
Copied
Hi Towcts,
You still have not provided exactly what steps you are performing to create the PDF however, you do mention some issues that are very PC specific and that means I'll have to beg off. As stated, I'm a Mac guy and you're getting into PC terminology about which I know not much at all.
I'm sure a PC person can slip in here and help you far better than I can.
One thing though, a number of us "helpers" will not help someone who places screenshots as attachments. If you could please click on the photo icon above where you type (see below) you can attach the image and it will show in the forum thread. Much easier for all.
Good luck!
Copy link to clipboard
Copied
fair enough. thank you for the tip about inserting a photo rather than attach. what are you expecting for details about how the PDF is created? such as which program is performing which operation?
Copy link to clipboard
Copied
Hi Towcts,
Yes, it can make a big difference. For example, a number of folks go to their system and select "Save as PDF..." as opposed to "Save as Adobe PDF..." While on the surface there doesn't seem to be much difference, the "Save as PDF..." is done by the operating system (either Mac or PC) while the latter IS done by Adobe Acrobat. If the users does not see "Save as Adobe PDF...," that means they do not even have Acrobat Standard or Pro on their system. The differences in quality, performance, and functionality are significant.
That's why I ask exactly how the PDF was created.
Copy link to clipboard
Copied
Hi towcts, OCR won't work on a document that is already text. If it did somehow run, I'm afraid it would do nothing to solve the searching problem. Something is wrong either with the document or (more likely, I think) the user's workflow. I would try recreating the PDF from Word using the PDF Maker plugin, or if the plugin isn't available then Save As PDF. If the search still doesn't work, what exact steps is the user doing to perform the search?
Copy link to clipboard
Copied
Hi towcts,
This thread is a little old but I thought I would comment anyway. Most replies are correct, OCR only works on an image. But your file appears to be mixed image and text - of which the latter cannot be OCR'd and may therefore produce an error if you try. In my experience ms word does this sat of thing very well.
The situation does seem a little strange because generally text information is handled fine by an OCR engine. It is simply copied and put into the final product, as-is.
Although it is not a very elegant solution, I would recommend converting to some image format then performing OCR on the entire page.
I would love to hear about whatever solution you came up with.
Copy link to clipboard
Copied
Hi @rbostrum,
Hope you are doing well. Thanks for writing in!
Have you tried printing the file using an Adobe PDF printer? This should flatten the PDF file and allow Acrobat to run OCR on the entire document instead of separate sections, or, rather, significantly reduce the difference in the quality of OCR performed.
Hope this helps.
Regards,
Souvik.
Copy link to clipboard
Copied
Hi Souvik,
Thank you for replying. Unfortunately, I am not the one who has the original problem. I was simply making a comment on someone else's issue.
I do find your comments interesting though. I would interpret it that an MS word file printed produced using the adobe PDF printer must therefore be flattened and turned into a number of image files. Otherwise, there would be no OCR performed at all. As I indicated, normally, if text exists in an MS file when a PDF is produced it is simply transferred through to the resulting PDF as is with no OCR performed at all. Mind you, I might be misunderstanding what you are saying.
Brent
Copy link to clipboard
Copied
Hi there ,
Thank you for your insights and for engaging in the discussion. You are correct that when a PDF is created using the Adobe PDF Printer from an MS Word file, the text should typically be preserved rather than converted into images. OCR is only needed when a document contains scanned images or non-selectable text.
If a PDF generated via the Adobe PDF Printer requires OCR, it may indicate that the text was rasterized during the printing process, possibly due to specific print settings or embedded objects. If you were referring to someone else's issue, it would be helpful for the original poster to check how the PDF was created and whether any settings might have caused text to be converted into images.
Let us know if you have any further thoughts or need clarification.
Amal

