• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Improve text quality on scanned documents?

New Here ,
Jul 13, 2010 Jul 13, 2010

Copy link to clipboard

Copied

I am currently digitizing a collection of paleontology publications so that the literature for specific genera can be quickly found. My problem is that much of the literature is in very bad condition due to it's age (some of it being from the 1800's). I would like to be able to improve the text's readability in photoshop before importing it into Acrobat to be OCR'd. The biggest issue is that many letters have not been completly printed (eg. an "a" is read as "ct" by acrobat because of gaps at the top and bottom of the "a" or an "e" is read as a "c"). Any suggestions on how to make badly printed letters more "whole" (esspecialy italicized characters) would be greatly appreciated. My current process for digitizing publications involves these steps:


1) Scan the document either by ADF or on a flatbed as grayscale JPEGs at 600 dpi (although, not necessary to scan at this resolution, it greatly improves results).


2) Open the images in photoshop and apply "Auto Levels" (black and white clipping at zero), apply an "Unsharp Mask" (Amount: 100%; Radius: 250 pixels; Threshold: 0 levels), save, and exit.


3) Combine the JPEGs in a PDF and OCR the document in it's corresponding language.


This works really well if the publication is in good condition however the acuracy on other documents is easily below 75%. Again, any suggestions on how to make badly printed letters more "whole" would be greatly appreciated.

Views

24.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe
LEGEND ,
Jul 13, 2010 Jul 13, 2010

Copy link to clipboard

Copied

LATEST

Look into using other OCR solutions that can be trained like Finereader or Cooliris. Acrobat's OCR is really only meant for contemporary tasks like recognizing form data, not for book restauration. It's not the image quality or anything, you have to have a way to teach the program to interpret specific gaps and artifacts differently and you can't do that with Acro.

Mylenium

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines