Copy link to clipboard
Copied
Hello,
I am looking for the most efficient way to OCR large sized documents (without splitting the file into smaller components) without Adobe crashing/freezing
By large size, I mean a few thousand pages (nature of my work)
I am aware of the Optimize PDF function, but am not if that will help (or which way to adjust the settings).
Note that I am using 600 dpi for downsampling
I am using Adobe Pro DC 2015.017.20050
Thanks
Copy link to clipboard
Copied
Hi Chris,
What I have done in those kinds of situations is to split it in smaller sized documents (say) 200 pages and then combine then back into one once all are completed. Yes tedious, but more likely successful.
I do not know a Maximum recommended pages and I've not seen any mention of this by Adobe but a couple of hundred have been "safe" in my experience.
Let me add that you are wise and correct to scan at that resolution, the OCR quality goes up considerably as the resolution increases.
Copy link to clipboard
Copied
Gary,
Thank you very much for the assistance. Splitting these files have been tried before. However, since it is originally one document, it would need to be put back together. The problem there is that once the file is put back together, it either still crashes upon search or is entirely too big to be emailed (again, part of my work)
I was hoping there would be a way to prevent the file size from increasing to the point of crashing Adobe.
Copy link to clipboard
Copied
Oh my.
Just out of curiosity, what final size document are we talking about and how many pages for that document?
I know that PDFs can range up to some 4000 pages so I'm wondering if there's something in error within that document. Is it all text? Are their images? If so, are they bitmapped images or vector images?
Were they scanned and if so how? What kind of scanner was used. What kind of format were they saved as (PDF, JPG, TIF, ??)?
Sorry for the questions, but need more info...
Copy link to clipboard
Copied
No worries. I'm just thankful for the assistance
To answer your questions, the largest files being dealt with are about 3000 pages. The pages are usually either scans of hard copy documents, inherently digital documents, or a combination of the two. So in the cases the files are scans only, I guess we are really dealing with a file with 3000 images potentially.
They were saved as PDF files
Let me know if you'd like to know anything else
Copy link to clipboard
Copied
Hi Chris,
Your response led me to review this whole thread. I do not think the size of the document is the issue. My guess is that there is an error somewhere in the document that is causing the crash but you did not mention that this shows up in every large document you are dealing with or just one. That's an important distinction that should be investigated.
As far as transferring this document to others, when I have very large documents to get out, Dropbox is your friend! ;>)
Best,
Gary
Copy link to clipboard
Copied
Hi Gary,
Apologies. The crashing occurs with every file of that estimated size, not just one or two unfortunately. Sorry for leaving that out earlier.
How does this affect our analysis of the situation?
Chris
Copy link to clipboard
Copied
Hi Chris,
Sorry for the inconvenience caused. Would you help me with the following details?
How to get the Crash Logs:
a. When Acrobat Crashes, Open Windows Task Manager
b. -> Got To Processes, There you can see a process "Adobe Acrobat Pro DC"
c. Right Click on this process and click "Create Dump File"
d. Dump file will be created in the Temp folder of the user (as specified on the dialog you get after creating dump files).
e. Save this DMP file on any Cloud Storage and Share the link.
Please share the dump file and a sample file via PM message How Do I Send Private Message you can use Adobe Send for cloud storage How to share a file using Adobe Document Cloud
-Tariq Dar
Copy link to clipboard
Copied
Hi there
Splitting the file is very time consuming. There must be a better way.
Copy link to clipboard
Copied
agreed. with the exception of splitting by top level bookmarks...which usually have yet to be created...the file split utility is a drunk with a hatchet...hard to avoid splitting up logically flowing content.
It was suggested to me to read the doc before splitting it. Please, no.
Easy enough if it's under 500 pages.
But these 10,000 + page records supplied to us lack intuitive compilation with respect to grouping and sorting.
Perhaps willfully; that is another duscussion.
My Idea is for the devs to get it right so I can do my work, and not be put into the position of making excuses for Adobe.
It's frustrating beyond words
Adobe DC pro 64 bit PC client.
Windows 11
Copy link to clipboard
Copied
It's possible of course that Adobe fixed it in a later version in the years since your product was made. They have been tinkering with OCR a bit, but I haven't seen a specific reference to fixing this bug.
However, Acrobat is a tool for low volumes in OCR, you may be better off looking for a tool which takes more seriously high volume work. Can't recommend any.
Copy link to clipboard
Copied
I have the same issue on a version of DC Pro that is just a month old.
I would very much appreciate a fix here. Having to break down the file is not an acceptible solution for a program that costs this much.
I have this issue with a file size of 500 pages so in the big scheme of things, not even that large really.
Can someone offer a solution please?
Copy link to clipboard
Copied
I have this question too. Every year I have to OCR a handfull of PDFs that are over 10,000 pages long. Adobe currently crashes both when I try to OCR the file and if I try to break it up.
Copy link to clipboard
Copied
Has anyone found a fix for this? I have a 24,595 page document that I need to OCR ASAP... it locks up as it is when I try and save it whenever I add a page or two or a new bookmark.
I am currently installing Acrobat DC right now.
Copy link to clipboard
Copied
I have the same problem. Over about 1500 pages, it crashes. If I leave it to OCR, I'll come back and Acrobat DC will be closed and the file not OCR'd.
Note: I have the computer set NOT to sleep, just in case that's an issue. And a cooling pad so it doesn't overheat from all that processing.
Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.30 GHz
16.0 GB (15.8 GB usable)
64-bit operating system, x64-based processor
Very frustrating!
Copy link to clipboard
Copied
I just went for 200-500 pages at a time, and finally finished it all. I kept a written list on where I left off, and it only took me about a year doing it one day a week. 🙃 I finally finished it though, but it does take a long time to search through it for words. It works, but I was so sick of it crashing all the time.
Copy link to clipboard
Copied
Also, I hated DC and XI, so I just went back to Acrobat 9. I will be using this version until it isn't supported anymore 🙂