Skip to main content
chris_1015
Participating Frequently
January 8, 2019
Question

Best way to OCR a large file without crashing

  • January 8, 2019
  • 5 replies
  • 11116 views

Hello,

I am looking for the most efficient way to OCR large sized documents (without splitting the file into smaller components) without Adobe crashing/freezing

By large size, I mean a few thousand pages (nature of my work)

I am aware of the Optimize PDF function, but am not if that will help (or which way to adjust the settings).

Note that I am using 600 dpi for downsampling

I am using Adobe Pro DC 2015.017.20050

Thanks

This topic has been closed for replies.

5 replies

New Participant
April 26, 2021

Has anyone found a fix for this?  I have a 24,595 page document that I need to OCR ASAP... it locks up as it is when I try and save it whenever I add a page or two or a new bookmark.

 

I am currently installing Acrobat DC right now.

New Participant
May 30, 2022

I have the same problem. Over about 1500 pages, it crashes. If I leave it to OCR, I'll come back and Acrobat DC will be closed and the file not OCR'd.

 

Note: I have the computer set NOT to sleep, just in case that's an issue. And a cooling pad so it doesn't overheat from all that processing.

Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.30 GHz
16.0 GB (15.8 GB usable)
64-bit operating system, x64-based processor

 

Very frustrating!

 

 

Very truly yours,
New Participant
June 1, 2022

I just went for 200-500 pages at a time, and finally finished it all.  I kept a written list on where I left off, and it only took me about a year doing it one day a week. 🙃 I finally finished it though, but it does take a long time to search through it for words.  It works, but I was so sick of it crashing all the time.

New Participant
March 3, 2021

I have this question too. Every year I have to OCR a handfull of PDFs that are over 10,000 pages long. Adobe currently crashes both when I try to OCR the file and if I try to break it up.

AndreaMadDog
New Participant
August 6, 2020

I have the same issue on a version of DC Pro that is just a month old.

I would very much appreciate a fix here.  Having to break down the file is not an acceptible solution for a program that costs this much.

I have this issue with a file size of 500 pages so in the big scheme of things, not even that large really.

Can someone offer a solution please?

Brainiac
January 16, 2019

It's possible of course that Adobe fixed it in a later version in the years since your product was made. They have been tinkering with OCR a bit, but I haven't seen a specific reference to fixing this bug.

However, Acrobat is a tool for low volumes in OCR, you may be better off looking for a tool which takes more seriously high volume work. Can't recommend any.

gary_sc
Community Expert
January 8, 2019

Hi Chris,

What I have done in those kinds of situations is to split it in smaller sized documents (say) 200 pages and then combine then back into one once all are completed. Yes tedious, but more likely successful.

I do not know a Maximum recommended pages and I've not seen any mention of this by Adobe but a couple of hundred have been "safe" in my experience.

Let me add that you are wise and correct to scan at that resolution, the OCR quality goes up considerably as the resolution increases.

chris_1015
Participating Frequently
January 9, 2019

Gary,

Thank you very much for the assistance. Splitting these files have been tried before. However, since it is originally one document, it would need to be put back together. The problem there is that once the file is put back together, it either still crashes upon search or is entirely too big to be emailed (again, part of my work)

I was hoping there would be a way to prevent the file size from increasing to the point of crashing Adobe.

gary_sc
Community Expert
January 9, 2019

Oh my.

Just out of curiosity, what final size document are we talking about and how many pages for that document?

I know that PDFs can range up to some 4000 pages so I'm wondering if there's something in error within that document. Is it all text? Are their images? If so, are they bitmapped images or vector images?

Were they scanned and if so how? What kind of scanner was used. What kind of format were they saved as (PDF, JPG, TIF, ??)?

Sorry for the questions, but need more info...