Skip to main content
Participant
June 8, 2020
Question

Memory leak (which leads to out of memory) when doing OCR

  • June 8, 2020
  • 1 reply
  • 6300 views

Environment: Acrobat Pro DC version 2020.009.20063. Windows 10 version 1909 (18363.778). Total 16GB of physical RAM. Locale zh/cn.

 

I used Acrobat Pro DC to do an OCR for a scanned textbook (~500 pages, 250MB). When it progressed to ~Page 90, following 3 consequtive error dialogs were shown, and then the recognition stopped:

- "unknown error"

- "unable to locate the paper capture recognition service"

- "out of memory"

(Text may be inaccurate because the original dialog is shown in Chinese. See attached screenshots for original text.)

Task Manager showed that it used 3.5GB of RAM, the maximum value for a 32-bit program.

 

I found a workaround for this problem, that is to recognize only 80 pages, save result and then restart Acrobat before it runs out of memory. However this is very time-consuming so I hope it will be fixed.

 

I have attached sample.pdf (repeated 120 pages of TOC from the book) to reproduce this issue. Use OCR option "Chinese (simplified)", "searchable image", "300 dpi", and it will OOM at Page 91.

This topic has been closed for replies.

1 reply

Amal.
Legend
June 8, 2020

Hi there

 

We are sorry for the trouble. We tried to reproduce the issue on our end and its working fine.

 

Please update the application to the new version 20.009.20067  and see if that works for you. Go to Help > Check for Updates

 

You may also try to repair the installation. Go to Help > Repair Installation and see if that makes any difference.

 

Regards

Amal

HCP62
Participating Frequently
June 15, 2020

Hallo!
Same problem, described in https://community.adobe.com/t5/acrobat/scan-amp-ocr-quickly-fill-memory-when-processing-large-numbers-of-files/m-p/11209393?page=1#M261956

 

Lenovo P50
- i7-6700HQ
- 32 GB RAM DDR4
- Windows 10 Pro 64-Bit

 

Adobe Acrobat Pro DC 2020.009.20067 (32-Bit)
- No Updates available
- Installation repaired

 

Processing many PDF files for OCR
- maximum processor usage 20%
- after about 25 Minutes
- 3,6 GB RAM usage
- all remaining files are simply skipped and not processed
- Error message "Die Seite kann nicht verarbeitet werden, weil beim Dienst Paper Capture Erkennung ein Fehler aufgetreten ist. (6)"

 

In which order are the files processed? According to size? According to alphabetical sorting? Not apparent to me. Therefore the whole job can be cancelled!
So I can at least find out by the update date which files were not OCRed. Namely if several files were "processed" per minute.