Skip to main content
Participant
May 19, 2018
Question

OCR freezes with voluminous PDFs, 500+

  • May 19, 2018
  • 1 reply
  • 1334 views

We are a law firm using Acrobat Pro DC.  Recently we've tried to OCR large PDF files, some in excess of 1,000 pages.   The OCR process is slow and the impact on the workstation renders other tasks nearly impossible.  Is this just too big of a project to ask Acrobat Pro DC to handle?   We're using version 18.011.20040 on Windows 7 and Windows 10 workstations.  Thanks.

This topic has been closed for replies.

1 reply

gary_sc
Community Expert
Community Expert
May 21, 2018

I'm a Mac user but I still emphasize with your problem. Here's some thoughts.

On the Mac I am limited to using only Apple's Image Capture to scan the images and that is a dreadful application. Because of this I scan using my flatbed scanner (an Epson 800). However, scanning hundreds or even thousands of pages would be a dreadful operation on a flatbed scanner. I had a similar issue and a friend loaned me a FujiScan bulk loading scanner and it was WONDERFUL. BTW, it can scan both sides of a page at the same time.

However, it's OCR-ing was not as good as Acrobat's on several accounts: first off it wasn't all that accurate (at least not as accurate as Acrobat. The other issue was the storage size. I do not remember the numbers but the Acrobat OCR reduced the storage size of the FujiScanned document by over a factor or 10-20 (e.g., a document from the FujiScan might be 10 MB but after running it through Acrobat for OCR-ing it would be from 500 KB to 1 MB.

So the process is to run the pages through the FujiScan into a folder on your computer. Then run Acrobat's OCR process on the folder.

Now the other issue you bring up that Acrobat can take over a work station is sadly very true. I've complained how you can start up a folder of documents to process and then try to read email, write a letter, whatever, and after every page is processed, Acrobat jumps up and (effectively) says "another page is done!" I always went off to lunch or something when this was going on as nothing else could be done. But it was worth it so I just gave up on using my computer while this was taking place.

One other thing to consider is that the better the quality of scan (higher resolution) the better the quality of OCR. Because of this, if you want higher quality OCR, you are better off to run the FujiScan at a high resolution which will slow it dow a tad. We're talking maybe a whole second per page as opposed to 0.75 or 0.5 seconds per page, not a big deal for the better end result.

Please let us know what you end up doing.