Skip to main content
January 4, 2017
Question

OCR Image in Print-to-PDF Word Document

  • January 4, 2017
  • 1 reply
  • 1928 views

So, I'm a lawyer. I drafted a brief in MS Word. In a various parts, I included images of the transcript of a hearing --- i.e., images of words. When you convert to PDF, the parts that I typed are OCR'd properly, but the images of the transcript are not. Can somebody tell me how to force Adobe to recognize not just the MS Word-typed words, but to also OCR the images contained in the document.

It's driving me crazy. (Or, I should say, the Ninth Circuit's ridiculous form rules are driving me crazy. But one way or another, I need to fix it.)

This topic has been closed for replies.

1 reply

Karl Heinz  Kremer
Community Expert
Community Expert
January 4, 2017

When you convert from Word to PDF, the document does not have to be OCRed, the text in your document should be accessible right away so that you can search or highlight text. A document that contains such "real" text and images of text will - when you start the OCR process - complain about "renderable text". This means that you cannot OCR a document that contains both real text and text in images - at least not in Adobe Acrobat.

If you can split the document so that the scans are always on a separate page, you may be able to OCR these pages if you delete any other text that might be on them (e.g. page numbers or headers/footers).

For such more challenging OCR tasks, I keep a copy of Abbyy's FineReader around  - this is a dedicated OCR application that can actually OCR such a mixed content document.

January 5, 2017

Karl,

Thanks for the reply. So, basically, Adobe cannot OCR an image that is

surrounded by renderable text? (When I said "OCR" in my post, I gather that

the proper term is "renderable" as it applies to MS Word text.)

The point is that the brief should look much like a magazine article: there

is text, text, text, then an image, followed by text, text, text, in a

steady, even flow. And according to court rules, even the words in the

image of a transcript must be OCR'd and searchable.

Well, it appears you've reached the same conclusion I did: PDF misses this

basic function.

On Wed, Jan 4, 2017 at 3:55 PM, Karl Heinz Kremer <forums_noreply@adobe.com>

Bernd Alheit
Community Expert
Community Expert
January 5, 2017

johnd31108412 wrote:

...

Well, it appears you've reached the same conclusion I did: PDF misses this

basic function.

PDF is a file format. You mean Adobe Acrobat?