Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

DC Pro won't auto-OCR a PDF that contains vector text

New Here ,
Feb 04, 2017 Feb 04, 2017

I have an application that creates PDF documents with pages containing both a TIFF document and headers and footers rendered with vector text.  Normally when you open a PDF and try to edit it, Acrobat DC Pro will automatically scan and OCR the current page so that you can copy from or edit the page.  However, it refuses to do this on the PDFs generated by my application because of the vector text that exists on the pages.  Is there a way to add the headers and footers in a way that will not prevent DC from auto-OCRing or is there a setting in DC that can change this behavior?

TOPICS
Acrobat SDK and JavaScript
562
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 04, 2017 Feb 04, 2017

I don't know what you mean by "vector text", but Acrobat's OCR will not work if you have "renderable text" on a page. There is no way around that. I would recommend that you OCR first, and then apply headers/footers (you can do that in Acrobat). If that is not an option, you will have to use a different OCR application. For these more challenging OCR jobs (e.g. renderable text, two or more different languages, language not supported by Acrobat's OCR, ...) I have a copy of Abbyy's FineReader, which is a dedicated OCR application.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2017 Feb 04, 2017

I guess "renderable text" is the correct term.  However, if I run a full OCR text recognition on the entire file, all of the pages in the PDF do get processed by the OCR engine correctly, so it isn't a problem of the OCR engine not being able to do the job.  It is just Acrobat DC Pro's automatic OCR feature (accessed through the "Edit PDF" feature) that will not process the content.  A lot of our users are complaining about it.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 05, 2017 Feb 05, 2017
LATEST

If you are generating a PDF for the purposes of OCR, add rasterised text. Lot of work. Acrobat SDK no help here.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines