• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
2

Best way to OCR a large file without crashing

Community Beginner ,
Jan 08, 2019 Jan 08, 2019

Copy link to clipboard

Copied

Hello,

I am looking for the most efficient way to OCR large sized documents (without splitting the file into smaller components) without Adobe crashing/freezing

By large size, I mean a few thousand pages (nature of my work)

I am aware of the Optimize PDF function, but am not if that will help (or which way to adjust the settings).

Note that I am using 600 dpi for downsampling

I am using Adobe Pro DC 2015.017.20050

Thanks

TOPICS
Scan documents and OCR

Views

6.9K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 08, 2019 Jan 08, 2019

Copy link to clipboard

Copied

Hi Chris,

What I have done in those kinds of situations is to split it in smaller sized documents (say) 200 pages and then combine then back into one once all are completed. Yes tedious, but more likely successful.

I do not know a Maximum recommended pages and I've not seen any mention of this by Adobe but a couple of hundred have been "safe" in my experience.

Let me add that you are wise and correct to scan at that resolution, the OCR quality goes up considerably as the resolution increases.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 08, 2019 Jan 08, 2019

Copy link to clipboard

Copied

Gary,

Thank you very much for the assistance. Splitting these files have been tried before. However, since it is originally one document, it would need to be put back together. The problem there is that once the file is put back together, it either still crashes upon search or is entirely too big to be emailed (again, part of my work)

I was hoping there would be a way to prevent the file size from increasing to the point of crashing Adobe.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 08, 2019 Jan 08, 2019

Copy link to clipboard

Copied

Oh my.

Just out of curiosity, what final size document are we talking about and how many pages for that document?

I know that PDFs can range up to some 4000 pages so I'm wondering if there's something in error within that document. Is it all text? Are their images? If so, are they bitmapped images or vector images?

Were they scanned and if so how? What kind of scanner was used. What kind of format were they saved as (PDF, JPG, TIF, ??)?

Sorry for the questions, but need more info...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 15, 2019 Jan 15, 2019

Copy link to clipboard

Copied

No worries. I'm just thankful for the assistance

To answer your questions, the largest files being dealt with are about 3000 pages. The pages are usually either scans of hard copy documents, inherently digital documents, or a combination of the two. So in the cases the files are scans only, I guess we are really dealing with a file with 3000 images potentially.

They were saved as PDF files

Let me know if you'd like to know anything else

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 15, 2019 Jan 15, 2019

Copy link to clipboard

Copied

Hi Chris,

Your response led me to review this whole thread. I do not think the size of the document is the issue. My guess is that there is an error somewhere in the document that is causing the crash but you did not mention that this shows up in every large document you are dealing with or just one. That's an important distinction that should be investigated.

As far as transferring this document to others, when I have very large documents to get out, Dropbox is your friend! ;>)

Best,

Gary

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jan 16, 2019 Jan 16, 2019

Copy link to clipboard

Copied

Hi Gary,

Apologies. The crashing occurs with every file of that estimated size, not just one or two unfortunately. Sorry for leaving that out earlier.

How does this affect our analysis of the situation?

Chris

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 16, 2019 Jan 16, 2019

Copy link to clipboard

Copied

Hi Chris,

Sorry for the inconvenience caused. Would you help me with the following details?

  1. Operating system name and version?
  2. Current version Acrobat installed on your affected machine.
  3. A sample file that you can reproduce the issue with.
  4. Crash logs

How to get the Crash Logs:

a. When Acrobat Crashes, Open Windows Task Manager

b. -> Got To Processes, There you can see a process "Adobe Acrobat  Pro DC"

c. Right Click on this process and click "Create Dump File"

d. Dump file will be created in the Temp folder of the user (as specified on the dialog you get after creating dump files).

e. Save this DMP file on any Cloud Storage and Share the link.

Please share the dump file and a sample file via PM message  How Do I Send Private Message  you can use Adobe Send for cloud storage How to share a file using Adobe Document Cloud

-Tariq Dar

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 05, 2020 Aug 05, 2020

Copy link to clipboard

Copied

Hi there

Splitting the file is very time consuming.  There must be a better way.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 19, 2022 Sep 19, 2022

Copy link to clipboard

Copied

LATEST

agreed.  with the exception  of splitting by top level bookmarks...which usually have yet to be created...the file split utility is a drunk with a hatchet...hard to avoid  splitting up logically flowing  content. 

It was suggested to me to read the doc before splitting it.  Please, no. 

Easy enough if it's under 500 pages.  

But these  10,000 + page records supplied to us lack intuitive compilation with respect to grouping and sorting. 

Perhaps willfully; that is another duscussion. 

My Idea is for the devs to  get it right so I can do my work, and not be put into the position of making excuses for Adobe.   

It's frustrating beyond words 

Adobe DC pro 64 bit  PC client.  

Windows 11 

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jan 16, 2019 Jan 16, 2019

Copy link to clipboard

Copied

It's possible of course that Adobe fixed it in a later version in the years since your product was made. They have been tinkering with OCR a bit, but I haven't seen a specific reference to fixing this bug.

However, Acrobat is a tool for low volumes in OCR, you may be better off looking for a tool which takes more seriously high volume work. Can't recommend any.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 05, 2020 Aug 05, 2020

Copy link to clipboard

Copied

I have the same issue on a version of DC Pro that is just a month old.

I would very much appreciate a fix here.  Having to break down the file is not an acceptible solution for a program that costs this much.

I have this issue with a file size of 500 pages so in the big scheme of things, not even that large really.

Can someone offer a solution please?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 03, 2021 Mar 03, 2021

Copy link to clipboard

Copied

I have this question too. Every year I have to OCR a handfull of PDFs that are over 10,000 pages long. Adobe currently crashes both when I try to OCR the file and if I try to break it up.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 26, 2021 Apr 26, 2021

Copy link to clipboard

Copied

Has anyone found a fix for this?  I have a 24,595 page document that I need to OCR ASAP... it locks up as it is when I try and save it whenever I add a page or two or a new bookmark.

 

I am currently installing Acrobat DC right now.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 29, 2022 May 29, 2022

Copy link to clipboard

Copied

I have the same problem. Over about 1500 pages, it crashes. If I leave it to OCR, I'll come back and Acrobat DC will be closed and the file not OCR'd.

 

Note: I have the computer set NOT to sleep, just in case that's an issue. And a cooling pad so it doesn't overheat from all that processing.

Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz 2.30 GHz
16.0 GB (15.8 GB usable)
64-bit operating system, x64-based processor

 

Very frustrating!

 

 

Very truly yours,

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jun 01, 2022 Jun 01, 2022

Copy link to clipboard

Copied

I just went for 200-500 pages at a time, and finally finished it all.  I kept a written list on where I left off, and it only took me about a year doing it one day a week. 🙃 I finally finished it though, but it does take a long time to search through it for words.  It works, but I was so sick of it crashing all the time.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jun 01, 2022 Jun 01, 2022

Copy link to clipboard

Copied

Also, I hated DC and XI, so I just went back to Acrobat 9.  I will be using this version until it isn't supported anymore 🙂 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines