Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
12

Remove text recognition within individual images?

New Here ,
Feb 02, 2024 Feb 02, 2024

When I run OCR on a scanned PDF, Acrobat diligently recognizes all the text, which is good, and also "recognizes" some bits of images as text, which is not so good. Removing these spurious text elements leaves holes in the image. Is there a way to remove them that restores the original image or should I resign myself to using copied images from the scan to replace the OCR-mangled elements?

TOPICS
Edit and convert PDFs , Scan documents and OCR
1.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 04, 2024 Feb 04, 2024

You can specify a page range when performing OCR ("Text Recognition") in Acrobat, but if you have scattered pages in your file that you wish to ignore that might not be very useful. The only other option I can think of is to extract those pages, delete them from the original file, then run OCR on it, and import them back it.

This can be done using a custom-made script (except for the OCR part, which you'll need to run manually).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2024 Feb 04, 2024
LATEST

Unfortunately, most pages are a mix of text, equations, and diagrams, with occasional photos for variety. The most vexing document is a bill of materials in which washers, mounting holes, brackets, and even the texture of knurled knobs have wrongly become text while the part names, scales, descriptions, and measurements are properly recognized. This particular document would be solved with rule such as "don't look for text in column 1 of the table," though other documents are less structured.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines