Renderable text

Report · Sep 17, 2012

I am running adobe reader X 10.1.4 and adobe X pro 10.1.4 but evertime I scan in a document and then try to recognize the text, I get this error, " Acrobat could not perform recognition (OCR) on this page becasue: This page contains renderable text." I do not always get this problem. How can I fix it?

Report · Aug 11, 2014

OCR = Optical CHARACTER Recognition.
No more. No less.

No Community Chest Card to zip past "GO" and collect $200.

If it was "point and click" anyone's grandkid could do it & there'd be no need to hire someone eh.

Sometimes the success path has been transcribe from paper to digital authoring file.

(not so bad if you're an old school touch typist with high speed/accuracy)

Sometimes it has been updating modifications to an authoring file set (say WP 5.1 DOS to FrameMaker or Word to FrameMaker).

Sometimes it has been export from PDF to Word and do the requisite cleanup (if the PDF isn't a well-formed Tagged (ISO 14289-1) PDF there is always "cleanup").

Sometimes it has been scan - OCR - export PDF to TXT or RTF or DOC or DOCX with the follow up "cleanup".

At the end of the day the issue is not with any software (Adobe's or others) but rather with the human element.

(Parenthetically; I kinda like my "pay grade" - but that might be because I can hop into it and do any of the above with zeal and alacrity.)

Be well...

Report · Aug 11, 2014

Love this info. Many thanks. It’s true, I was looking for the right Monopoly card to pass “GO” and collect my fee.

I’m guessing I didn’t find it, as I have not heard from the client who wanted to pay less than $10/hour for data entry. I can’t work for that, but agreed to re-key one file as a trial and he winced at my invoice.

I thought I went pretty fast, also with great zeal, doing nearly 250 pages in 9 hours with very few typos. Alas, my hands ached and I didn’t want to do 14 other longer files in short turn around and end up with carpal tunnel at that pay scale.

I can connect you with the client for this work if you are interested in rekeying at the rate he wants to pay? lmk. He’s a nice guy.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Report · Aug 11, 2014

Hi Walt -

Regardless of how you proceed to handle getting new word docs, I’d like to make a suggestion about creating new .pdf from newly edited student workbooks. I would reate well-formed tags for the .pdf.

My ongoing dialogue about the file conversion“formatting” problems has yielded some great information. The most salient (other than the answer I used and delivered to you) being below. From many online responders I heard that OCR works best, because it is specifically designed for, scanned documents. Some were galled that I would OCR a file that wasn’t scanned, because that’s not what OCR was designed for.

But, if you have a printed version of all your student workbooks, and can perform some high-speed scanning at Staple’s, you could then OCR those scanned documents with Adobe Acrobat X Pro, and perhaps get exactly what you need with the least amount of formatting cleanup, referred to frequently in the posting below.

I’m curious to know who it works out. And, I have an invoice for you for $55 that I’ll email.

Linda Guthrie

978-764-5200

lgguthrie@comcast.net

Report · Aug 12, 2014

I'm going to give some background on PDFs that might be helpful.

There is a lot less in a PDF than people think. There are no styles. There are no headings, headers or footers. There are no margins, no paragraphs. What?

This is because there is only text. Text at a position on a page. We see a margin where all the text starts in the same left position. We see a paragraph where the text position suggests indenting or vertical space. We see a header when it's at the top of a page and repeated. And so on.

Given this, the job of text extraction from a PDF is just guesswork. Adobe try to make a Word document suitable for most folk with no choices. I feel this is a huge mistake on their part. Should there be hard returns at the end of each line (I think this is what you mean) or not? Clearly no default suits everyone.

Anyway this is why I suggested an experiment in copy text, paste special, unformatted text. Did you try this, and how did it compare?

And when you say it can't be fixed in Word, do you just mean you can't find a button to do it?

Report · Dec 31, 2015

Just use the following link. It is easy and super fast. It converts the Renderable text into a word document.However, you will still need to spend some 2-3 minutes on formatting the new word doc.

Free Online OCR - convert scanned PDF and images to Word, JPEG to Word

Happy New Year

Sameer Pimpalkhute

Report · Sep 23, 2021

Building from previous postings in this thread (thanks to everyone who contributed constructively over the years to workarounds for this longstanding defect), I have devised a new recipe. It worked for me just now in Acro Pro DC 2021.007.20091, on a 192-page 20 MB document that has dozens of bookmarks. Yay! (Standard disclaimer: YMMV. If the recipe below fails for you, please try to find a mod that works for you and post it here. If my recipe works for you, please upvote this posting so that others may find it.)

Open PDF
Print
Set printer driver to "Adobe PDF"
On print dialog window, click "Advanced" and set to "Print as Image". I use 150 dpi, but you may want a different quality setting. (If "Print as Image" is greyed-out, try unticking "Print in grayscale (black and white)" on the main "Print..." menu.)
On print dialog window, the original poster of this recipe had ticked the "Auto-Rotate and Center" box, with the following note: "chances are you will have to review document to set correct paper orientation, because tables that have text vertical and horizontal will confuse the orientation detection.
On print dialog window, the original poster of this recipe had ticked the "Choose Paper Source by PDF page size" box, with the note: "This will allow different sized pages to be generated. Without it checked, the pages will be cropped."
Run OCR on the resulting saved document, and save.
Open original document. In older versions of Acrobat: select Document > Replace Pages, select the OCRed copy, then Replace all pages. In Acrobat DC: choose Tools & Organize Pages, select all thumbnails, click Replace from the Organize Pages toolbar, choose the file with the pages you want to include. (It's a good idea to confirm that the documents have the same number of pages, otherwise all bets are off regarding the bookmarks in the new document.)
Save the results as a new file
Confirm (with some random testing) that bookmarks are accurate and that all pages are fully OCRd

BTW https://helpx.adobe.com/acrobat/kb/error-could-perform-recognition-acrobat.html is now thoroughly out of date.

Report · Feb 15, 2022

Thanks for the summary. Here's another tip.

I kept getting the "renderable text" error within PDFs that had been freshly minted from mere TIFF images, so that they really had no text whatsoever. I realized that this error really has nothing to do with text. When the OCR engine fails, it often gives this error, even if renderable text is not the real cause.

In my case, I determined that the real cause was the print size. The images did not have their print size or DPI set correctly, so that Acrobat thought they were 40" high. This apparently confused the OCR engine, which must rely on print size in some regard. All I had to do was use XnViewMP's "Batch Convert" feature to change the DPI of all of the TIFF files to the correct 600dpi before combining them into a PDF using Acrobat. After that, OCR worked correctly.

Adobe Community

Renderable text