Skip to main content
Participant
October 7, 2008
Question

Acrobat 9 crashes on OCR

  • October 7, 2008
  • 31 replies
  • 24844 views
I've been trying to convert a batch of large PDF files to PDF searchable files by using the OCR of Acrobat. In the middle of a batch, a large (1000+ page) document crashes acrobat. I have narrowed it down to this image:

http://img90.imageshack.us/img90/2418/badke2.png
59,520 bytes

When I convert it to PDF (File->Create PDF->From Single File) and then use Acrobat to "Document->OCR Text Recognize->Recognize Text using OCR", Acrobat always crashes.

Is this true for anyone else that could try it?

It kills my batch processing and is making this large conversion quite painful. Is there a way around it?
This topic has been closed for replies.

31 replies

Participant
January 5, 2009
Olga Satchouk says:

> I was able to reproduce crash in both cases.
> Fix will be available in the next dot release
> of Acrobat - 9.1.

How about the same fix for AA8.1.3, for those of us volunteers who can't afford the high cost of version 9? We don't have a company budget to fall back on, even though the work we are doing clearly benefits society as a whole.

Regards,

Terry Smythe
Winnipeg, Canada
smythe@shaw.ca
Participant
January 5, 2009
On 5 Jan 2009 at 2:37, Daniel E. Smith wrote:

> I am able to perform OCR on these pages using OmniPage Pro
> 1.5 with no problem, so it is not the pages.

Agreed. In every case, where AA8 crashed, I was able to have AA8 run OCR against the offending page as if nothing was wrong, then carry on. Very mysterious, and extremely aggravating.

But the big question..... Is there another utility out there that will run OCR in such a way that the PDF file becomes searchable thereafter?

I have no trouble running OCR from any number of OCR packages, and all work just fine, but the OCR results are always external to the PDF file. The PDF file remains non-searchable even after running it with ABBYY, OmniPagePro, TextBridge, ScanSoft, etc.

So far, AA7 or better is the only utility I have found that when OCR is run, it leaves behind a PDF file that is searchable.

This is important in the case of a very large set of very large PDF files initially created by some utility other than Acrobat. 100% of these PDF files are not searchable.

In my case, some 50,000+ pages of a historical newspaper, 1881 to 1943, were scanned into TIFF format by some automated process, likely using an ADF. Then the TIFF files were converted by some automated utility into PDF files, all non-searchable.

I want to concatenate the TIFF files by year, then convert these yearly files into yearly PDF files. But such a process leaves them all non-searchable.

I've basically done this by using AA8, but the process was incredibly time consuming and aggravating, requiring constant attention for all these dumb repetitive errors that keep popping up, ignoring my earlier selection to ignore all errors. GGrrrrr..................... Urge to kill........... :-)

Regards,

Terry
Participant
January 5, 2009
No. The problem is not just big files - although the problems are worse in that case. I tried OCR on a 1600 page TIFF. Acrobat would crash with no indication of where it encountered a problem and all OCR done to that point was lost. I split the document into 100 page sections to identify the source of the crash. 4 of the 100 page sections bombed. I then did OCR on each of the four by splitting them into single pages. I identified four pages of the 1600 that were causing the crash. OCR on those single pages crashed Adobe on multiple machines running every possible setting. I provided the pages to Jason Reuer at Adobe (jreuer@adobe.com) at his request and heard nothing back despite multiple e-mails. So much for customer service. I am able to perform OCR on these pages using OmniPage Pro 1.5 with no problem, so it is not the pages. I also provided Adobe with pages from other documents that produce crashes. Again, I heard no response. In this forum, Adobe claims that the problems are fixed in Acrobat 9.1, but they have not responded to any of our requests about how to get Acrobat 9.1. If anyone knows, please tell the rest of us.

Adobe batch handing capability compounds the problem. Instead of loading, performing OCR and saving one document at a time, it tries loading them all at once into memory. I am trying to perform OCR on 100,000 small documents, so that method is a disaster. One crash and everything is lost. As everyone has noted - a crash is certain, so Acrobat is basically useless in its current state for OCR. This has been true for Acrobat 7, 8 and 9.

This is all vetry frustrating.
Participant
January 4, 2009
I also encountered same problem, using AA8.1.3, against 63 very large (600megs+) PDF files that were created with a utility that did not make them searchable.

I also found that AA8 crashed during running OCR. Huge shortcoming.... AA8 does everything in memory, stores nothing in temporary files, forgets everything it did upon crashing, forces a mammoth time consuming restart from the beginning.

I finally got through them all by breaking each file into 2 files (a and b), then running OCR 100 pages at a time within each file, with constant attention.

A random error message, "Cannot find file" kept popping up, stopping, waits for me to click on "OK", adding to huge delays. Gotta constantly watch for this error. Even though I click on "Ignore this error message", it does not listen, keeps popping up and stopping processing, waiting for a dumb "OK" response.

But at least, as it successfully gets through 100 pages, I can save it, then carry on to next 100 pages. If it crashes, I can restart to the last successful 100 pages completed.

I conclude that AA8 absolutely hates large files. My computer is a 2.6Ghz DualCore with 3.5 gigs of memory, still not enough.
Took me an entire week to work my way through these 63 files, which BTW, tied up my computer for that whole time, insufficient memory to do other things.

That AA8 does not work with temporary files is a huge shortcoming when working with large PDF files. I find it astonishing that upon an unexpected failure, it has no way to remember where it was when the crash occured.

This situation begs the question...... Is there any other utility out there that will make a PDF file searchable, that is not made searchable when first created by some utility other than Acrobat?

Regards,

Terry Smythe
Winnipeg, Canada
smythe@shaw.ca
Participant
January 3, 2009
We have had the same problem and provided samples to Jason Reuer at Adobe. His e-mail address is jreuer@adobe.com. This is very frustrating. We tried purchasing the extended version of Acrobat Pro 9 and it has the same problem as does Adobe 8.1. Since I was unable to get any response back from Jason Reuer, I tried communicating with Olga Satchouk at Adobe who responded here. Her e-mail address is no longer valid.

Does anyone have any idea how to get around this problem now, or when the mystery version 9.1 that supposedly corrects it will be released. Acrobat is useless to our organization with this mission critical bug and it is frustrating that Adbobe has not been more responsive.

Also, does anyone have the e-mail address of someone higher up at Adobe so that we can elevate this issue, or at least let them know of Adobe's failure to repsond to all our requests.
Participant
December 18, 2008
Same problem here. Just purchased 9.0 for a project and am having this problem with over a thousand documents. ocrlibraryinf.dll.
When will 9.1 be available?!

For those of you who only have a few pdf's to ocr, try extracting the culprit pages as tif, then importing them back in. It worked for me, but I cant do that for 1000+ documents.
Participant
December 8, 2008
Same problem; using Acrobat 9 on three stations and Acrobat 8 on one station. Acrobat 9 has failed to complete a batch OCR on ANY station while Adobe 8 has yet to fail, and it has done several large batches.
Any help yet? I have 60,000 pages to OCR.
Participant
December 3, 2008
Yes; I have the same problem with 9.0 Pro:

When I try to OCR a .pdf document to perform searches, it always crashes with the following error:

AppName: acrobat.exe AppVer: 9.0.0.332 ModName: ocrlibraryinf.dll

ModVer: 2.0.0.1 Offset: 000206f1
Participant
December 2, 2008
Does anyone know when the 9.1 version will be available? I'm having these same frustrating OCR problems!
Participant
November 7, 2008
I am having the same problem with 40 + pages to ocr.