Copy link to clipboard
Copied
I am running adobe reader X 10.1.4 and adobe X pro 10.1.4 but evertime I scan in a document and then try to recognize the text, I get this error, " Acrobat could not perform recognition (OCR) on this page becasue: This page contains renderable text." I do not always get this problem. How can I fix it?
Copy link to clipboard
Copied
OCR = Optical CHARACTER Recognition.
No more. No less.
No Community Chest Card to zip past "GO" and collect $200.
If it was "point and click" anyone's grandkid could do it & there'd be no need to hire someone eh.
Sometimes the success path has been transcribe from paper to digital authoring file.
(not so bad if you're an old school touch typist with high speed/accuracy)
Sometimes it has been updating modifications to an authoring file set (say WP 5.1 DOS to FrameMaker or Word to FrameMaker).
Sometimes it has been export from PDF to Word and do the requisite cleanup (if the PDF isn't a well-formed Tagged (ISO 14289-1) PDF there is always "cleanup").
Sometimes it has been scan - OCR - export PDF to TXT or RTF or DOC or DOCX with the follow up "cleanup".
At the end of the day the issue is not with any software (Adobe's or others) but rather with the human element.
(Parenthetically; I kinda like my "pay grade" - but that might be because I can hop into it and do any of the above with zeal and alacrity.)
Be well...
Copy link to clipboard
Copied
Love this info. Many thanks. It’s true, I was looking for the right Monopoly card to pass “GO” and collect my fee.
I’m guessing I didn’t find it, as I have not heard from the client who wanted to pay less than $10/hour for data entry. I can’t work for that, but agreed to re-key one file as a trial and he winced at my invoice.
I thought I went pretty fast, also with great zeal, doing nearly 250 pages in 9 hours with very few typos. Alas, my hands ached and I didn’t want to do 14 other longer files in short turn around and end up with carpal tunnel at that pay scale.
I can connect you with the client for this work if you are interested in rekeying at the rate he wants to pay? lmk. He’s a nice guy.
Linda Guthrie
978-764-5200
lgguthrie@comcast.net
Copy link to clipboard
Copied
Hi Walt -
Regardless of how you proceed to handle getting new word docs, I’d like to make a suggestion about creating new .pdf from newly edited student workbooks. I would reate well-formed tags for the .pdf.
My ongoing dialogue about the file conversion“formatting” problems has yielded some great information. The most salient (other than the answer I used and delivered to you) being below. From many online responders I heard that OCR works best, because it is specifically designed for, scanned documents. Some were galled that I would OCR a file that wasn’t scanned, because that’s not what OCR was designed for.
But, if you have a printed version of all your student workbooks, and can perform some high-speed scanning at Staple’s, you could then OCR those scanned documents with Adobe Acrobat X Pro, and perhaps get exactly what you need with the least amount of formatting cleanup, referred to frequently in the posting below.
I’m curious to know who it works out. And, I have an invoice for you for $55 that I’ll email.
Linda Guthrie
978-764-5200
lgguthrie@comcast.net
Copy link to clipboard
Copied
I'm going to give some background on PDFs that might be helpful.
There is a lot less in a PDF than people think. There are no styles. There are no headings, headers or footers. There are no margins, no paragraphs. What?
This is because there is only text. Text at a position on a page. We see a margin where all the text starts in the same left position. We see a paragraph where the text position suggests indenting or vertical space. We see a header when it's at the top of a page and repeated. And so on.
Given this, the job of text extraction from a PDF is just guesswork. Adobe try to make a Word document suitable for most folk with no choices. I feel this is a huge mistake on their part. Should there be hard returns at the end of each line (I think this is what you mean) or not? Clearly no default suits everyone.
Anyway this is why I suggested an experiment in copy text, paste special, unformatted text. Did you try this, and how did it compare?
And when you say it can't be fixed in Word, do you just mean you can't find a button to do it?
Copy link to clipboard
Copied
Just use the following link. It is easy and super fast. It converts the Renderable text into a word document.However, you will still need to spend some 2-3 minutes on formatting the new word doc.
Free Online OCR - convert scanned PDF and images to Word, JPEG to Word
Happy New Year
Sameer Pimpalkhute
Copy link to clipboard
Copied
Building from previous postings in this thread (thanks to everyone who contributed constructively over the years to workarounds for this longstanding defect), I have devised a new recipe. It worked for me just now in Acro Pro DC 2021.007.20091, on a 192-page 20 MB document that has dozens of bookmarks. Yay! (Standard disclaimer: YMMV. If the recipe below fails for you, please try to find a mod that works for you and post it here. If my recipe works for you, please upvote this posting so that others may find it.)
BTW https://helpx.adobe.com/acrobat/kb/error-could-perform-recognition-acrobat.html is now thoroughly out of date.
Copy link to clipboard
Copied
Thanks for the summary. Here's another tip.
I kept getting the "renderable text" error within PDFs that had been freshly minted from mere TIFF images, so that they really had no text whatsoever. I realized that this error really has nothing to do with text. When the OCR engine fails, it often gives this error, even if renderable text is not the real cause.
In my case, I determined that the real cause was the print size. The images did not have their print size or DPI set correctly, so that Acrobat thought they were 40" high. This apparently confused the OCR engine, which must rely on print size in some regard. All I had to do was use XnViewMP's "Batch Convert" feature to change the DPI of all of the TIFF files to the correct 600dpi before combining them into a PDF using Acrobat. After that, OCR worked correctly.