OCR renderable text error

New Here ,
Jan 01, 2011 Jan 01, 2011

Copy link to clipboard

Copied

someone was having the problem below with an older version of acrobat.

is there now a solution in acrobat mac x?

i note that exporting to image file loses quality and increases file size

thanks

Well, since this is the digital age, it makes sense that I ought to  read the PDFs in digital form (this is a stretch for me, I really like  paper), which is facilitated by a tablet since I can actually see the  page when it’s in the portrait configuration.  It also makes sense that I  ought to mark up the file in Acrobat, using the native highlighting and  searching tools, which is also facilitated by the tablet for obvious  reasons.

Here’s the problem.  Apparently *every* PDF file, in every digital library, is tagged with headers, or footers, or bates numbers, or some other tag that halts the OCR recognition of the PDF file.   If you google “This page contains renderable text”, you’ll see that  this has been a complaint since Acrobat 6 at least.  So you can’t just  OCR the document and get a nice,  mark-up-able document.

Now, I know what you’re thinking.  There has to be a workaround,  right?  Of course, there is.  You can manually remove the headers and  try again.  Oh, now there’s a footer; you can take that out too  (manually) and try again.  Oh, now there’s a bates number, okay, take  that out too.  There’s STILL some renderable text in there somewhere,  well, now you can either try and edit out the blocks of renderable text  (again, manually, made more entertaining by the fact that you can’t just  right click on the page and say “remove renderable text”), or you can  export the entire document to a graphics file (say, a TIFF), re-convert  it to a PDF file (which turns the entire document into a rasterized  image), and THEN run the OCR tool to get an actual mark-up-able  document.  This process is made more enjoyable by the fact that Acrobat  will turn that 300 page dissertation you’re reading as part of your  research into 300 distinct TIFF files, which you then need to recombine  into a PDF file.  Multiply this by 100, and you’ll see what sort of a  barrier to productivity this is for me to get started organizing my  existing document collection.

This is CLOSE TO THE DUMBEST THING I HAVE EVER SEEN.  And I’ve seen a  LOT of bad design.  Rather than prompting me “This document has  renderable text” and giving me “Cancel” as the only option, any  feature-driven developer would say, “Gosh, people get really frustrated  by this.  I know, because I can read the results of a simple google search.    We need to change this right away!  Here, I’ll make it so that you  can just click ‘Treat existing renderable text as white space’ or even  prompt the user to rasterize the renderable text and embed it in the  document, then OCR the resulting file!”

The only conceivable reason I can imagine that this hasn’t taken  place is because your lovable electronic document vendor wants to make  it a colossally, enormously painful process for someone to actually do anything to the document they’re providing you to use.  Thank you, electronic  document vendor.  You’re going to be wasting about 20% of the time that  you’re saving me by giving me electronic access to this document in the  first place.

Progress is grand.  Collide it with self-interest, progress seems to lose out more often than not.

Now, if you’ll pardon me, I’m going to go get some sleep.  Then I’m  going to get up in the morning and go to work.  Then I’m going to come  home, and instead of enjoying some family time with my kids, I’m going  to fart around with manual document conversion.

Views

20.3K

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Beginner , Mar 29, 2011 Mar 29, 2011

Elias,

I completely agree with your anger. I ran into the same problem and I think I have figured out a workaround. I wrote up a blog post about it.

http://www.ideationizing.com/2011/03/ocr-acrobat-pdf-with-renderable-text.html

I hope this works for you.

Likes

Translate

Translate
Community Beginner ,
Mar 29, 2011 Mar 29, 2011

Copy link to clipboard

Copied

Elias,

I completely agree with your anger. I ran into the same problem and I think I have figured out a workaround. I wrote up a blog post about it.

http://www.ideationizing.com/2011/03/ocr-acrobat-pdf-with-renderable-text.html

I hope this works for you.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 14, 2011 Dec 14, 2011

Copy link to clipboard

Copied

LATEST

Hi Grant,

This may seem like stupidity, but I have windows7 and there wasn't an choice for that in the download center.  Do you have any advise for window 7 users?

Thanks,

Giovanni

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines