I am running adobe reader X 10.1.4 and adobe X pro 10.1.4 but evertime I scan in a document and then try to recognize the text, I get this error, " Acrobat could not perform recognition (OCR) on this page becasue: This page contains renderable text." I do not always get this problem. How can I fix it?
I do not know what causes this. But you are more likely to get a reply if you ask in the Acrobat forum; this is the Adobe Reader forum.
"Renderable text" is the text you'd have as PDF page content when you output a PDF from an authoring file (say an MS Word file).
OCR can only function on a PDF page that has only an image of text.
It may be that whatever you use to scan is adding at least one character of text (which may be hidden - e.g., no stroke - no fill).
Perhaps a header or footer having the path to where the file is created.
OCR first checks if there is any renderable text present on the document. If it finds any text content already present there, if would quit with the message mentioned by you. So, probably the document on which you encounter the error is already having some text content on it. It can be either header/footer lying beyond the appropriate margins or ClearScan (OCR) run while creating the document.
Would it be possible for you to share a sample document which has this issue?
I found this and it worked for me:
How to fix "Text recognition" (OCR) for "Find" [Ctrl+F] in Adobe Acrobat without converting to TIFF or XPS printer.
Use "Nuance PDF Create Assistant" or maybe similar software.
Click "Add" then "Open file..."
Select file - double click or "Open"
Select "Create a PDF for each input document"
Select (below) "Searchable PDF"
Click the gear button "Start PDF creation (Alt + G)"
In the "Save As" box, Choose "File name:", Click "Save".
"Print Info" Box "Job Queue", wait till you see the file in "PDF Creation Result" box.
DONE! Click "Close"
You can now do word searches where this was not possible.
The PDF file will retain original clarity and bookmarks.
Forget about TIFF/JPG or printing to XPS. Just print the PDF to the Acrobat print driver with settings (advanced) "as image". Be sure that print settings will use the existing page size or else larger pages will be cropped. I set the dpi to 300. After printing, the document will be ready to be OCRed by Acrobat. This solution makes smaller images (but, if you use OCR "Searchable Image (exact)" it will retain existing image size). It also "fixes" all sorts of issues I've encountered when I used to dump the PDF to JPG and convert back to PDF. I'm using Acrobat 8.3.1 and have had no problems with newer PDF formats using this method.
I tried your method, but ending up losing all the bookmarks in the newly created pdf file
Is there a way to automate the task of retaining or re-inserting the bookmarks? This is 800+ page document, so manually adding would be far-fetched.
Can you please provide the exact steps which you have followed? Are you using Windows or MAC?
I followed Kevin's above post #6. Basically printing the file to printer 'Adobe PDF' as an 'image'. To print as an image option is available under 'Advanced', in the print dialogue box.
This allows to save the new pdf file in a different directory. As Kevin described OCR can then be run on this image file.
However this new file won't have any of the bookmarks, from the original file.
I am using Windows XP SP3 and Acrobat X Pro version 10.1.4 .
I you want to try, try a file with few pages to save time. Pl let me know, if you find a solution.
Thanks a lot,
By the way, could you please specify whether the document you are using contains text or purely images which means no selectable text?
The original I am trying to make searchable by OCR contains text, as I am able to select and copy text from the it.
Perhaps that is why I get the message 'This page contains renderable text'
The original also has the bookmarks, which I lost in the regenerated image file, after following instructions in post # 6.
Ok... if it contains text and you are able to copy text from it...why do you now want to run OCR on it?
A document that contains text AND images will not OCR. Adobe has decided not to address this issue. To be compliant to accessability standards, it is necessary to have all images and text accessible.
Kevin, thanks for that. So you have, not a scanned page, but a normal PDF page which contains images which are themselves scanned (or rasterised diagrams with text). Is that correct?
There seem to be several different scenarios. The original poster wrote "I scan in a document and then try to recognize the text, I get this error," which I would very much like to understand, as it shouldn't happen.
Message 13 by JKASingh mentions "original I am trying to make searchable by OCR contains text, as I am able to select and copy text from the it." Maybe the same situation as you Kevin, but it did sound an odd requirement. I wonder if some people (and I have seen this question more than this) have expectations of OCR over and above what it does.
The documents I deal with have already been composed as PDFs by someone else. I get the "page contains renderable text" error and so I have to process the page to an image. I do not actually know why the error occurs. Hope that helps.
Hi Kevin / Apangasa,
I tried your suggestions as per posts #15 and 16.
However when as a last step, I try to save the file I get Adobe error message that the file cannot be saved
I don't know why it won't save. The only thing that comes to mind is that it might be a MS Windows or network issue (if saving on network drive), and you could try rebooting Windows and try again. I have never encountered an issue with saving an Acrobat file other than as a Windows/network issue.
"accessible" --- For PDF this is compliance with ISO 14289-1 (currently and part 2 once that's rolled out post PDF part 2).
ISO 14289-1, PDF/UA-1 is clear -- if an image you shall tag with the Figure element and use appropriate alt text for this element.
So, an image of a logo, say "A Brilliant Conclusion" does not have to have OCR. The renderable text around the image is tagged appropriately (again, PDF/UA-1 compliant) and the logo image is tagged with the "Figure" element.
If your authoring application applies a style name (e.g., "someStyleName") to figures upon export to PDF then that shall role map to the PDF element "Figure".
In sum, for such situations OCR is not required.
The workaround suggested by Kevin would work for you.
To summarize, this is what you can do to have the bookmarks in the OCRed document:
1: Open the PDF
2: Export the document to tiffs and then merge them in Acrobat
3: Run OCR ON the merged file.
4: Save this document
5: Now go to your original document.
6: Go to Tools Page (across the right hand side).
7: Move to Pages > Replace and select the OCRed document in the dialog "Select file with new pages"
8: Specify the page numbers as the first and the page in the document
9: Replace the pages.
Yes, this should work. I was able to test this with success. However, instead of export to tiff, try printing to PDF as image.
1. Open PDF
a. Set printer driver to "Adobe PDF"
b. On print dialog window, click "Advanced" and set to "Print as Image". I use 150 dpi, but you may want a different quality setting.
c. On print dialog window, I have "Auto-Rotate and Center" checked. However, chances are you will have to review document to set correct paper orientation, because tables that have text vertical and horizontal will confuse the orientation detection.
d.On print dialog window, I have "Choose Paper Source by PDF page size" checked. This will allow different sized pages to be generated. Without it checked, the pages will be cropped.
3. Run OCR on the resulting saved document, and save.
4. Open original document, select Document > Replace Pages, and select the OCRed copy. Replace all pages.
5. Save the results as a new file.
Note: This appears to retain the OCR results, tags, etc. If you are not sure, you might try changing steps 3 through 5, to the following:
3. Open original document, select Document > Replace Pages, and select the printed copy (printed to Adobe PDF print driver). Replace all pages.
5. Save the results as a new file.
6. OCR the resulting new file. Bookmarks should be retained during OCR.
Let me know how it goes.
Does not save as PDF - saves a file of 0kb. 😞
I don't know how to keep the bookmarks. Sorry. If anyone knows how, I'd be interested. My suggested method just avoids dumping the pages to an image and then converting back (which also loses the bookmarks).
Is there a way to export/imiport bookmarks?
Yes. You need a program called JPDFBookmarks. What you do is: import the PDF with bookmarks into the program, click Tools > Dump, and save the bookmarks to a text file; then print the PDF to image as stated previously; then open the new PDF into JPDFBookmarks and click Tools > Load to import the bookmarks from the text file.
I've upgraded to Acrobat XI Pro... the "renderable text" error is now an warning. I'm assuming that that means the page is being OCRed, because batch processing says OCR was a "success." Reviewing the document after processing, it has full OCRed status and looks much better than when I print it and then OCR to avoid the warning message.
I don't know if Adobe changed how it works or if it never was a serious issue (ie: should have been a warning all along). If anyone can find a definitive answer about the "renderable text warning" in Acrobat XI Pro, please reply.