I am looking for the most efficient way to OCR large sized documents (without splitting the file into smaller components) without Adobe crashing/freezing
By large size, I mean a few thousand pages (nature of my work)
I am aware of the Optimize PDF function, but am not if that will help (or which way to adjust the settings).
Note that I am using 600 dpi for downsampling
I am using Adobe Pro DC 2015.017.20050
What I have done in those kinds of situations is to split it in smaller sized documents (say) 200 pages and then combine then back into one once all are completed. Yes tedious, but more likely successful.
I do not know a Maximum recommended pages and I've not seen any mention of this by Adobe but a couple of hundred have been "safe" in my experience.
Let me add that you are wise and correct to scan at that resolution, the OCR quality goes up considerably as the resolution increases.
Thank you very much for the assistance. Splitting these files have been tried before. However, since it is originally one document, it would need to be put back together. The problem there is that once the file is put back together, it either still crashes upon search or is entirely too big to be emailed (again, part of my work)
I was hoping there would be a way to prevent the file size from increasing to the point of crashing Adobe.
Just out of curiosity, what final size document are we talking about and how many pages for that document?
I know that PDFs can range up to some 4000 pages so I'm wondering if there's something in error within that document. Is it all text? Are their images? If so, are they bitmapped images or vector images?
Were they scanned and if so how? What kind of scanner was used. What kind of format were they saved as (PDF, JPG, TIF, ??)?
Sorry for the questions, but need more info...
No worries. I'm just thankful for the assistance
To answer your questions, the largest files being dealt with are about 3000 pages. The pages are usually either scans of hard copy documents, inherently digital documents, or a combination of the two. So in the cases the files are scans only, I guess we are really dealing with a file with 3000 images potentially.
They were saved as PDF files
Let me know if you'd like to know anything else
Your response led me to review this whole thread. I do not think the size of the document is the issue. My guess is that there is an error somewhere in the document that is causing the crash but you did not mention that this shows up in every large document you are dealing with or just one. That's an important distinction that should be investigated.
As far as transferring this document to others, when I have very large documents to get out, Dropbox is your friend! ;>)
Apologies. The crashing occurs with every file of that estimated size, not just one or two unfortunately. Sorry for leaving that out earlier.
How does this affect our analysis of the situation?
Sorry for the inconvenience caused. Would you help me with the following details?
How to get the Crash Logs:
a. When Acrobat Crashes, Open Windows Task Manager
b. -> Got To Processes, There you can see a process "Adobe Acrobat Pro DC"
c. Right Click on this process and click "Create Dump File"
d. Dump file will be created in the Temp folder of the user (as specified on the dialog you get after creating dump files).
e. Save this DMP file on any Cloud Storage and Share the link.
Splitting the file is very time consuming. There must be a better way.
It's possible of course that Adobe fixed it in a later version in the years since your product was made. They have been tinkering with OCR a bit, but I haven't seen a specific reference to fixing this bug.
However, Acrobat is a tool for low volumes in OCR, you may be better off looking for a tool which takes more seriously high volume work. Can't recommend any.
I have the same issue on a version of DC Pro that is just a month old.
I would very much appreciate a fix here. Having to break down the file is not an acceptible solution for a program that costs this much.
I have this issue with a file size of 500 pages so in the big scheme of things, not even that large really.
Can someone offer a solution please?
I have this question too. Every year I have to OCR a handfull of PDFs that are over 10,000 pages long. Adobe currently crashes both when I try to OCR the file and if I try to break it up.
Has anyone found a fix for this? I have a 24,595 page document that I need to OCR ASAP... it locks up as it is when I try and save it whenever I add a page or two or a new bookmark.
I am currently installing Acrobat DC right now.
I have the same problem. Over about 1500 pages, it crashes. If I leave it to OCR, I'll come back and Acrobat DC will be closed and the file not OCR'd.
Note: I have the computer set NOT to sleep, just in case that's an issue. And a cooling pad so it doesn't overheat from all that processing.
Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.30 GHz
16.0 GB (15.8 GB usable)
64-bit operating system, x64-based processor
I just went for 200-500 pages at a time, and finally finished it all. I kept a written list on where I left off, and it only took me about a year doing it one day a week. 🙃 I finally finished it though, but it does take a long time to search through it for words. It works, but I was so sick of it crashing all the time.
Also, I hated DC and XI, so I just went back to Acrobat 9. I will be using this version until it isn't supported anymore 🙂