Copy link to clipboard
Copied
I have an application that creates PDF documents with pages containing both a TIFF document and headers and footers rendered with vector text. Normally when you open a PDF and try to edit it, Acrobat DC Pro will automatically scan and OCR the current page so that you can copy from or edit the page. However, it refuses to do this on the PDFs generated by my application because of the vector text that exists on the pages. Is there a way to add the headers and footers in a way that will not prevent DC from auto-OCRing or is there a setting in DC that can change this behavior?
Copy link to clipboard
Copied
I don't know what you mean by "vector text", but Acrobat's OCR will not work if you have "renderable text" on a page. There is no way around that. I would recommend that you OCR first, and then apply headers/footers (you can do that in Acrobat). If that is not an option, you will have to use a different OCR application. For these more challenging OCR jobs (e.g. renderable text, two or more different languages, language not supported by Acrobat's OCR, ...) I have a copy of Abbyy's FineReader, which is a dedicated OCR application.
Copy link to clipboard
Copied
I guess "renderable text" is the correct term. However, if I run a full OCR text recognition on the entire file, all of the pages in the PDF do get processed by the OCR engine correctly, so it isn't a problem of the OCR engine not being able to do the job. It is just Acrobat DC Pro's automatic OCR feature (accessed through the "Edit PDF" feature) that will not process the content. A lot of our users are complaining about it.
Copy link to clipboard
Copied
If you are generating a PDF for the purposes of OCR, add rasterised text. Lot of work. Acrobat SDK no help here.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now