Copy link to clipboard
Copied
I have an original PDF that is OCR'd. For certain purpose, I have to print that PDF to PDF. (Save as wouldn't serve the purpose for me.) I have found out that printed PDF is no longer OCR'd and CAN'T be OCR'd. May I know if there is any way I can
1. Print a PDF to PDF and keep OCR on the printed PDF
or
2. OCR the printed PDF
Thank you,
Copy link to clipboard
Copied
Printing to PDF will likely remove all OCR.
ORC should still work in the printed PDF, if the resolution is sufficient.
Can you explain why you need to print to PDF? There may be a better option.
Copy link to clipboard
Copied
A bit different from Luke's question, HOW did you print to PDF, Acrobat doesn't allow that. Did you print (to PDF or Adobe PDF) via a 3rd party application (e.g., Apple's Preview application?)?
Copy link to clipboard
Copied
Interesting - I would have never tried to print a PDF from Acrobat, but since you mentioned one can't, I tried! I guess I like to live life on the edge 😉
It took me through the motions, after selecting the Adobe PDF as the desired printer, and way in the back (behind a few other windows) was a Save PDF File As... dialogue box. Who knew?
As long as the PDF is image based, one should be able to run the OCR on it. The result is very dependent on the quality and contrast in the text image.
Dave
Copy link to clipboard
Copied
Dave, I found this out one time when I wanted a PDF of a PDF but that had multiple pages on each page (like 2-up or 4-up). Can't be done.
Copy link to clipboard
Copied
Print to PDF (or "refrying the PDF") will always result in a lower quality PDF - that does not necessarily mean quality as in resolution, but features that are missing. When you OCR a PDF, the font that is used for the recognized text is created on the fly, based on the glyphs that are in the text. When you then print to PDF, the information about how to get from the glyph (the drawing of a character) back to the original character is lost. Also, because the file is OCRed, it is no longer an image. Your only option at this point is to save the PDF as individual images (e.g. high resolution TIFF images) and then combine these images into a PDF file and then OCR again. Not a straightforward approach, but that will work. Having said that, chances are that there is a way to accomplish what you need without saving to PDF and then trying to OCR again - what is it that you think is not possible without refrying the document?
Copy link to clipboard
Copied
People give many, many reasons why printing PDF to PDF is a bad idea. This is one reason. I know you state you must do this, but perhaps if you share the issue you are solving, we can help you find a way that is less damaging.