Remove text recognition within individual images?

Forum|Forum|2 years ago
February 3, 2024
1 reply
1636 views

When I run OCR on a scanned PDF, Acrobat diligently recognizes all the text, which is good, and also "recognizes" some bits of images as text, which is not so good. Removing these spurious text elements leaves holes in the image. Is there a way to remove them that restores the original image or should I resign myself to using copied images from the scan to replace the OCR-mangled elements?

This topic has been closed for replies.

try67

Community Expert

You can specify a page range when performing OCR ("Text Recognition") in Acrobat, but if you have scattered pages in your file that you wish to ignore that might not be very useful. The only other option I can think of is to extract those pages, delete them from the original file, then run OCR on it, and import them back it.

This can be done using a custom-made script (except for the OCR part, which you'll need to run manually).

E

EphraimEphraimAuthor

Participant

Unfortunately, most pages are a mix of text, equations, and diagrams, with occasional photos for variety. The most vexing document is a bill of materials in which washers, mounting holes, brackets, and even the texture of knurled knobs have wrongly become text while the part names, scales, descriptions, and measurements are properly recognized. This particular document would be solved with rule such as "don't look for text in column 1 of the table," though other documents are less structured.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded