Memory leak (which leads to out of memory) when doing OCR

Report · Jun 08, 2020

Environment: Acrobat Pro DC version 2020.009.20063. Windows 10 version 1909 (18363.778). Total 16GB of physical RAM. Locale zh/cn.

I used Acrobat Pro DC to do an OCR for a scanned textbook (~500 pages, 250MB). When it progressed to ~Page 90, following 3 consequtive error dialogs were shown, and then the recognition stopped:

- "unknown error"

- "unable to locate the paper capture recognition service"

- "out of memory"

(Text may be inaccurate because the original dialog is shown in Chinese. See attached screenshots for original text.)

Task Manager showed that it used 3.5GB of RAM, the maximum value for a 32-bit program.

I found a workaround for this problem, that is to recognize only 80 pages, save result and then restart Acrobat before it runs out of memory. However this is very time-consuming so I hope it will be fixed.

I have attached sample.pdf (repeated 120 pages of TOC from the book) to reproduce this issue. Use OCR option "Chinese (simplified)", "searchable image", "300 dpi", and it will OOM at Page 91.

Report · Jun 08, 2020

Hi there

We are sorry for the trouble. We tried to reproduce the issue on our end and its working fine.

Please update the application to the new version 20.009.20067 and see if that works for you. Go to Help > Check for Updates

You may also try to repair the installation. Go to Help > Repair Installation and see if that makes any difference.

Regards

Amal

Report · Jun 08, 2020

Hallo,

wir haben hier seit einigen Tagen genau dieses Problem! Version ist aktuell = 20.009.20067

Ich konnte mit der Beispiel-Datei den OCR-Abbruch reproduzieren.

Viele Grüße

Report · Jun 09, 2020

Anbei ein Video, welches das Problem darstellt.

siehe = OCR_Problem_2020-06-09

Report · Jun 12, 2020

This is not unique. Our organisation has over 20 Acrobat DC Pro licences on W10 and since this "latest update" to 20.009.20067, any attempt to do OCR crashes out about page 80-90 (of a 450 page document) with Out of Memory and Acrobat is using 3.6Gb of RAM on an 8Gb RAM setup.

Adobe - this is a bug which needs fixed. I have an old Acrobat 8 Standard installation on W10 and the 450 page document was OCR'd wthout error - and with Acrobat 8 Std using no more than 220Mb RAM.

Acrobat 8 has no "batch" OCR facility and I have about 100 documents of 400-500 pages to OCR. The idea of opening each one and setting OCR manually on each one is awful. Prior to this issue, I would set them all to OCR over a weekend.

PLEASE PLEASE can this be fixed soon instead of in 12 months time!! I am a Head of IT and it is 100% an Acrobat DC memory leak, not your usual "check version", "check updates", "restart workstation" fix.

Report · Jun 15, 2020

See also https://community.adobe.com/t5/acrobat/scan-amp-ocr-quickly-fill-memory-when-processing-large-number...

Report · Jun 15, 2020

Hallo!
Same problem, described in https://community.adobe.com/t5/acrobat/scan-amp-ocr-quickly-fill-memory-when-processing-large-number...

Lenovo P50
- i7-6700HQ
- 32 GB RAM DDR4
- Windows 10 Pro 64-Bit

Adobe Acrobat Pro DC 2020.009.20067 (32-Bit)
- No Updates available
- Installation repaired

Processing many PDF files for OCR
- maximum processor usage 20%
- after about 25 Minutes
- 3,6 GB RAM usage
- all remaining files are simply skipped and not processed
- Error message "Die Seite kann nicht verarbeitet werden, weil beim Dienst Paper Capture Erkennung ein Fehler aufgetreten ist. (6)"

In which order are the files processed? According to size? According to alphabetical sorting? Not apparent to me. Therefore the whole job can be cancelled!
So I can at least find out by the update date which files were not OCRed. Namely if several files were "processed" per minute.

Report · Oct 28, 2020

I'm a senior subject librarian that works on digitising core books for visually impaired students and this is a real problem. Since moving over to acrobat pro DC we've not been able to OCR texts for these students. It's very frustrating as the previous version of acrobat worked perfectly well. This will have a detrimental impact on our visually impaired student's education as not all core texts are available as ebooks.

For the OCR process to work we have to compress the pdf to such an extent that is neither use nor ornament.

Please can this be looked at as a matter of urgency.

Report · Oct 28, 2020

Hallo Robert,

Adobe Acrobat Pro DC 2020.012.20048 mit OCR funktioniert bei mir. Es gibt bei meiner Andwendung keinerlei Funktionsänderungen. Im (April?)Mai/Juni 2020 gabe es ein Problem mit überlaufendem Arbeitsspeicher (Wohl ein unsauberes Update?. Das wurde aber im Juni behoben. Nachdem es wochenlang vom Support geleugnet wurde und ich jedesmal gegen eine Wand "gelaufen" bin.

Persönlicher Hinweis: Das Forum hier, hat nichts gebracht. In diesem Forum ist der Support nicht ansprechbar. Ich musste mir erklären lassen, dass es hier nicht um Fehlerbehebung geht, sondern ausschließlich um Anwedung der jeweiligen Software.) Dein Problem ist wohl neu(?) Von daher eröffne einen neuen eigenen Beitrag und/oder wende dich an den telefonischen Support. Wie gesagt, der Support ist etwas schwierig, weil man nur mit einer Person außerhalb des Support kommunizieren kann. Diese Person vermittelt nur die Kommunikation - teilweise verkehrt - zwischen uns und dem Support. Das ist sehr Anstrengend!

Viele Grüße und viel Glück!

Report · Nov 18, 2020

Hi Robert

Hope you are doing well and sorry for the delay in response.

As described you are not able to OCR texts for students. If the issue still persists, would you mind sharing the screenshot of the issue/error message for a better understanding.

Are you getting the error 'Could not access the paper capture recognition service' when attempting to OCR? If yes, please look at the steps provided in the help page https://helpx.adobe.com/in/acrobat/kb/acrobat-could-access-recognition-service.html and see if that works for you.

Regards

Amal

Adobe Community

Memory leak (which leads to out of memory) when doing OCR