Skip to main content
Participant
February 4, 2017
Question

DC Pro won't auto-OCR a PDF that contains vector text

  • February 4, 2017
  • 1 reply
  • 650 views

I have an application that creates PDF documents with pages containing both a TIFF document and headers and footers rendered with vector text.  Normally when you open a PDF and try to edit it, Acrobat DC Pro will automatically scan and OCR the current page so that you can copy from or edit the page.  However, it refuses to do this on the PDFs generated by my application because of the vector text that exists on the pages.  Is there a way to add the headers and footers in a way that will not prevent DC from auto-OCRing or is there a setting in DC that can change this behavior?

This topic has been closed for replies.

1 reply

Karl Heinz  Kremer
Community Expert
Community Expert
February 5, 2017

I don't know what you mean by "vector text", but Acrobat's OCR will not work if you have "renderable text" on a page. There is no way around that. I would recommend that you OCR first, and then apply headers/footers (you can do that in Acrobat). If that is not an option, you will have to use a different OCR application. For these more challenging OCR jobs (e.g. renderable text, two or more different languages, language not supported by Acrobat's OCR, ...) I have a copy of Abbyy's FineReader, which is a dedicated OCR application.

cory891Author
Participant
February 5, 2017

I guess "renderable text" is the correct term.  However, if I run a full OCR text recognition on the entire file, all of the pages in the PDF do get processed by the OCR engine correctly, so it isn't a problem of the OCR engine not being able to do the job.  It is just Acrobat DC Pro's automatic OCR feature (accessed through the "Edit PDF" feature) that will not process the content.  A lot of our users are complaining about it.

Legend
February 5, 2017

If you are generating a PDF for the purposes of OCR, add rasterised text. Lot of work. Acrobat SDK no help here.