• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Renderable text

New Here ,
Sep 17, 2012 Sep 17, 2012

Copy link to clipboard

Copied

I am running adobe reader X 10.1.4 and adobe X pro 10.1.4 but evertime I scan in a document and then try to recognize the text, I get this error, " Acrobat could not perform recognition (OCR) on this page becasue: This page contains renderable text."  I do not always get this problem.  How can I fix it?

TOPICS
Scan documents and OCR

Views

178.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 01, 2013 Sep 01, 2013

Copy link to clipboard

Copied

This solution by Chyron8472 worked.

A word of caution though... The Image conversion of the pdf file (I use 150dpi reso) ballooned the file size nearly 20 times, from about 10MB to 192MB! When I had used a higher reso (300 dpi) the file size was even bigger and JPDFBookmarks could not open it. It appears there is a file size limit in JPDFBookmarks.

-JKASingh

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 03, 2013 Sep 03, 2013

Copy link to clipboard

Copied

You might want to try two methods to reduce the size of the resulting document.

1. Save the PDF with PDF Optimzer settings. This can be done using "Save As" or through the Action Wizard. Using "Save As," select file type "Adobe PDF Files, Optimized," and click the activated "Settings" button to choose settings. Using the Action Wizard, use the "Save" action and set the settings. Under the settings for the optimizer, the image files can be downsampled and extraneous features can be discarded.

2. Run the document through the "Reduce File Size" filter (Action Wizard, I don't know if it is available on Acrobat Standard). This filter works better if you can restrict the compatability to more recent versions of Acrobat. The results for me were not as dramatic as for #1, but it did help.

Also, if it is acceptable to change the image pages of the PDF, use the "Optimize Scanned PDF" filter under Action Wizard. There are several actions that can help in reducing page image size, but some testing will be required to determine what final image will be acceptable. I believe the most significant is the "Background Removal" setting, but care must be taken when using it on PDFs that contain maps. (I personally run OCR on the PDF and then run Optimize Scanned PDF with OCR turned off. Afterwards, I replace the map pages with the original if necessary, and then OCR one more time without doing Optiimize Scanned PDF.)

Hope that helps.

-kevin

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Mar 01, 2013 Mar 01, 2013

Copy link to clipboard

Copied

I haven't had time to try it yet, but you might try the following. After processing the PDF, open the original document (that has bookmarks) and replace all the pages with the processed PDF pages. I read a forum post that said this worked for them. http://forums.adobe.com/thread/462209

I also read else where that there are several plugins that will do this for you. Haven't looked into it yet.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 14, 2014 Nov 14, 2014

Copy link to clipboard

Copied

Thanks so much Kevin

I used your acrobat print to searchable text method and then I was able to export it to word doc.  Both my Workfast translation software or Omnipage failed to OCR it. With your method, I just need to do a little bit of correction here and there. I appreciate your sharing!

May CHEN (Dongmei)

Interpreter:Mandarin/Cantonese<>English

Translator:Chinese<>English

Tel/Fax: 03-94840538

Mobile: 0431562446

A member of the Australian Institute of Interpreters & Translators

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jan 28, 2017 Jan 28, 2017

Copy link to clipboard

Copied

Thanks for this. I spent most of the day, including wasted time with 'customer support' NOT solving this problem. Read Out Loud kept 'telling' me the page was empty or blank.

I applied your fix and voila, now ROL functions. It only reads a page at time even though I select read to end of document, but one bridge at a time.

Anyway, thanks again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 19, 2017 Apr 19, 2017

Copy link to clipboard

Copied

With the latest release of Acrobat DC on 11th April 2017, the issue of error "Page contains renderable text" has been resolved. Go to What's new in Adobe Acrobat DC for more details.

To get the latest product update, click on the menu Help-> Check for updates

Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2019 Feb 04, 2019

Copy link to clipboard

Copied

Omg! This so much better than converting to TIFF! Thank you!!!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 23, 2021 Apr 23, 2021

Copy link to clipboard

Copied

Thank you so much Kevin - I had a 50 page document with this problem and your solution worked perfectly.  I would have lost whatever hair I have left if I had to do use the one-page-at-a-time TIFF/JPG print solution.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 14, 2023 Nov 14, 2023

Copy link to clipboard

Copied

LATEST

Kevin - Thanks! Eleven years later! Still works!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 31, 2014 Jan 31, 2014

Copy link to clipboard

Copied

i had pdf files that were experiencing the same issue so i went to online2pdf.com, reconverted them to pdf (from pdf to pdf), tried ocr on the reconverted pdfs and it worked for me.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 03, 2014 May 03, 2014

Copy link to clipboard

Copied

This worked for me! I tried a bunch of the other stuff here, too. My document was 350 pages long.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
May 23, 2014 May 23, 2014

Copy link to clipboard

Copied

This seems to be a recurring problem. The best summary of solutions that I have found is at http://nlsblog.org/2014/03/10/adobe-acrobat-renderable-text/ in particular try Solution 4: “Sanitize”. Seems to keep best resolution and does not blow up file size.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 08, 2014 Aug 08, 2014

Copy link to clipboard

Copied

Some of these suggestions should work, but....

(I have the error message "Acrobat cannot OCR this page bc it contains renderable text"  when trying to OCR a 350 page .pdf with Acrobat X Pro on a MacBook Pro)...when I send it to print, I do not have the choice of a print driver named "Adobe PDF," nor can I enter it.

I can do everything else under advanced, but the solution does not work, I guess because I am unable to select a print driver named "Adobe PDF."

Is there any one who can enlighten me?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 09, 2014 Aug 09, 2014

Copy link to clipboard

Copied

That driver (and that solution) no longer exist in Mac OS. You should only OCR scanned documents -- and only need to OCR scanned documents. Where is this document from/how is it made, and why do you want to OCR it?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 09, 2014 Aug 09, 2014

Copy link to clipboard

Copied

Thanks for your feedback.

PROJECT: 15 illustrated telecom training manuals, ranging from 99-350 pages, with a hyperlinked within-the-document Table of Contents, require frequent updating, but original word doc files were missing, presumed deleted at an unknown point in time. I was asked to convert 15 .pdfs back to word .docx for time sensitive editing and republication.

PROBLEM: The messed-up formatting in every single line of text when .pdf were saved to word .docx was unacceptable. I was asked to rekey every manual and thought to myself, 'no way.’

SOLUTION: After several failures with OCR scanning b/c every other page, it seemed, had renderable text, I successfully converted one 236-page manual to individual .tif files and recombined them into one .pdf binder. The OCR scan finally worked perfectly. I saved the double converted file as a word docx for editing.

Waiting to hear from client if the new .docx file is free of substantive formatting errors, overt and hidden. He’d be happy with no formatting b/c it's easy to reformat the manuals as a whole, the style is set. The .ai illustrations will be removed and replaced with updated illustrations in the new .docx.

What would you do to bring the 15 manuals back to format-error free or unformatted, word .docx for frequent updating?

Linda

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 09, 2014 Aug 09, 2014

Copy link to clipboard

Copied

I would expect a very time consuming expensive process because of the need to carefully check every word. I don't share your faith that OCR will be 100% correct. Sometimes such tasks are offshored because they are so labour intensive.

But that aside, I would simply scan and OCR. I would discover what process was adding renderable text (which makes a nonsense of the idea of OCR) and stop it doing that.

Sorry if this comes over as overly negative. You don't have an easy problem and I hope you are able to bill for it suitably.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 09, 2014 Aug 09, 2014

Copy link to clipboard

Copied

Agreed — didn’t expect error-free CR, but absolutely must have error-free/no formatting. I find proof reading faster, less physically tiring than rekeying thousands of pages.

Are you suggesting a physical scan of the manuals, as in feeding a printed copy page-by-page into a scanner?

I didn’t do much other than a couple hours of research and run several Acrobat programs, which are very fast. I enjoy a challenge and probably won’t charge the client much.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 10, 2014 Aug 10, 2014

Copy link to clipboard

Copied

I'm suggesting that I would only use OCR if I had a paper original. That's what it's for. That's all that it's for. It's not for renewing or tidying the text of PDFs which already contain text.

So, these are not scans?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 10, 2014 Aug 10, 2014

Copy link to clipboard

Copied

No, not scans. Just an attempt to avoid rekeying thousands of pages.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

But why would you need to rekey if you have text already? Actually, I see you wrote "The messed-up formatting in every single line of text when .pdf were saved to word .docx was unacceptable" I guess that's behind it. Doesn't change Acrobat though, it will absolutely refuse to OCR if there is text already, Let's look at your text extraction issues. What is wrong with the Word document? And how does OCR fix the Word document, given that the result of OCR should be very much the same?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

hmmm.

These were telecom student training manuals and workbooks whose original word docs were lost/deleted somehow and required constant updating for his online courses that had been booked and paid for. The client wanted a document that either retained the original formatting or had no formatting, but didn’t want to be dealing with surprise formatting that compromised the consistency look of all his training materials and bogged down his editing process.

I was not using Acrobat tools as they were intended to be used. Certainly a larger firm would hire lesser paid specialists in data entry to rekey 15 manuals of a few hundred pages each. My client was a one-man operation with a graphic layout subcontractor who didn’t know enough to help him so he posted his problem on Craigslist, without understanding the extent of the difficulty of converting 15 docs into unformatted or perfectly formatted word docs that could be edited quickly.

I created the first .docx by converting an untagged .pdf training manual used for online students. Tagging a .pdf is important if formatting is of paramount importance, but the client didn’t know this. He is a telecom expert who wrote the manuals, not a file conversion expert.

So the .docx I created from the untagged .pdf was deemed unsuitable for editing when client discovered that each line had acquired unique formatting in the conversion from .pdf to .docx. Client tried setting the style globally on the converted file, but kept encountering unsolvable formatting problems in the editing process — for example, an entire blank page where he didn’t want it that couldn’t be deleted. The client couldn’t be bogged down in this fashion with 15 manuals to update on a frequent basis. Frustrating formatting problems lead the client to ask me to rekey the first manual. But 9.5 hours for rekeying at my hourly rate was too much money for him.

Rekeying is below my pay grade. I thought I could find a workaround, believing I could win additional business from him if I could solve the problem with a small fee. I understood OCR to give generally accurate characters without adding any formatting - taking characters as it reads them and converting them to text that can be saved as a .docx that can be edited. If all I had to do was proof for character recognition errors, I could accept that. I could read and proof the technical material quickly.

Later today, I will learn the whether anything is wrong with the .docx I created -- whether there are formatting challenges during editing the double-converted, OCR scanned file saved as .docx. We’ll see.

This client can’t be the first person who has improperly stored, and lost, his original word docs. And I can’t be the first person to try this - a professor of new media who teaches the Adobe Creative Suite to college students had suggested I could find a solution in Adobe with either InDesign or Acrobat, and I began reading the forums for processes to experiment with.

Does this answer your question?

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

OCR will make a PDF that is no more or less formatted than the PDF you have already. It might, by sheer luck, export more as you want, but that's not because OCR strips a lot out.

I have a possible halfway house. Have you tried selecting all text in a page, and pasting into Word, perhaps with Paste Special to lose any attempt to preserve font etc.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

and if we scanned a 350 page manual and performed an OCR, what about formatting? would formatting occur differently in line-by-line fashion, or be maintained globally, or not exist at all?

the effort to scan that many pages without access to a high-speed scanner, or the cost at a service store like Staples, would not be worth it unless the answer was known, and was either consistent whole-document formatting came or no formatting at all. but no one has been able to answer that question.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

I don't really understand what you mean by "formatting". Can you be as specific as possible?

I really don't think you should rescan or OCR.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 11, 2014 Aug 11, 2014

Copy link to clipboard

Copied

Let’s take one, simple example of formatting: Margins. Margins are usually set for the entire document at the outset. With an 'aligned left-ragged right’ defined for the whole document format, and a set margin width of say 6.5 inches, each line of text is a different length, but all lines come as close to the right hand margin as possible without dividing a word. It’s a tidy look.

When the .pdf was converted to .docx, every single line had a different margin format set for it, based on where the line ended. When new text was entered, it retained the format of the line above it. The formatting had to be removed, line-by-line, for nearly 250 pages. It could not be done globally, one time. (The client said don’t bother, just rekey.)

When text was edited on this converted .docx, a blank page appeared where text used to be but was edited out. That blank page had hidden formatting that prevented it from being deleted and prevented the text on the forthcoming pages from filling it in. Nothing I did would get rid of it.

Equally simple, chapter headings have set formatting, separate from text. Their font and style can be different from the body of text under it. Chapter sub-headings have set formatting, separate from headings and body copy. Captions have separate formatting. Bulleted text has a format to determine, as does numbered copy. For these, it’s usually spacing and position rather than font styles. For a picky publisher, many more things are formatted so that there is a consistent style for all different kinds of text.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines