• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
12

Remove text recognition within individual images?

New Here ,
Feb 02, 2024 Feb 02, 2024

Copy link to clipboard

Copied

When I run OCR on a scanned PDF, Acrobat diligently recognizes all the text, which is good, and also "recognizes" some bits of images as text, which is not so good. Removing these spurious text elements leaves holes in the image. Is there a way to remove them that restores the original image or should I resign myself to using copied images from the scan to replace the OCR-mangled elements?

TOPICS
Edit and convert PDFs , Scan documents and OCR

Views

95

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 04, 2024 Feb 04, 2024

Copy link to clipboard

Copied

You can specify a page range when performing OCR ("Text Recognition") in Acrobat, but if you have scattered pages in your file that you wish to ignore that might not be very useful. The only other option I can think of is to extract those pages, delete them from the original file, then run OCR on it, and import them back it.

This can be done using a custom-made script (except for the OCR part, which you'll need to run manually).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2024 Feb 04, 2024

Copy link to clipboard

Copied

LATEST

Unfortunately, most pages are a mix of text, equations, and diagrams, with occasional photos for variety. The most vexing document is a bill of materials in which washers, mounting holes, brackets, and even the texture of knurled knobs have wrongly become text while the part names, scales, descriptions, and measurements are properly recognized. This particular document would be solved with rule such as "don't look for text in column 1 of the table," though other documents are less structured.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines