• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

OCR problem with white text on coloured background

New Here ,
Jul 19, 2022 Jul 19, 2022

Copy link to clipboard

Copied

For a student with dyslexia, I scanned a text from a textbook so that she could use it in her dyslexia software. Only when I want to convert it in Acrobat Pro using OCR, it does not recognise white text on the coloured areas. How can I solve this problem? I am already using the Scan & OCR > Enhance > Scanned Document tool.

TOPICS
How to , Scan documents and OCR

Views

672

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 19, 2022 Jul 19, 2022

Copy link to clipboard

Copied

HI Skpoel,

 

Interesting problem. If you do a google search, this issue has thwarted many people. Before I continue, let me compliment you on your scans. I see many scans here in these threads, and yours are superb. 

 

I experimented, but I'm not sure how well this will suit your problem; it does require Photoshop or some other photographic application. So here's what I did: I opened your tif into Photoshop and opened a curve window.

2022-07-19_04-41-13.png

Then I took the end-points and flipped them; I took the left handle and dragged that to the top and took the right handle and dragged that to the bottom. This "trick" is often done by folks who have scanned color negatives and need to get the photo opposite. At that point, your image looks like this:

2022-07-19_04-44-25.png

However, when I run that through Acrobat, the top section cannot be read, but the bottom section OCRed perfectly, but the top part, not so much. 

 

Here's the text from the bottom section:

 

Meer wet en? < < < < < < < < < < < < < < < < < < < < < < < < < < < < < <

Probeer eens uit: maak je eigen stempel!

Er zijn duizenden speksteenblokjes van de Indusmensen

gevonden. Ze gebruikten ze om in klei te stempelen, als een

soort handtekening. Wil jij je eigen stempel maken? Speksteen

kun je in kunstwinkels kopen en bewerken met een vijl

en schuurpapier. (Zoek maar eens op internet op "Speksteen

bewerken".) Je kunt anders ook een stempel van een stukje

gips, gum of zacht linoleum maken. Die materialen kun je

met een guts bewerken.

Met je stempel kun je je handtekening in klei

stempelen zoals de Indusmensen dat deden. Wil

je dat je stempel eruitziet als een echte Indusstempel?

Dan moet hij vierkant zijn, met schriftsyrnbolen

aan de bovenkant en een dier in het

midden. Welk dier wordt jouw handtekening?

26 • Atlantis groep 6.7.8 leesboek 1

 

 

I'm not sure how you can use this, but it's a potential part of the solution

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 20, 2022 Jul 20, 2022

Copy link to clipboard

Copied

Thank you for your answer. It is purely a contrast thing in my opinion. The OCR has trouble recognising white text, by making the image negative, the text becomes black and the OCR can recognise it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 20, 2022 Jul 20, 2022

Copy link to clipboard

Copied

LATEST

HI Skpoel,

 

I do not think that it's totally contrast, as the original image had plenty of contrast. Rather, I suspect that for some reason, OCR requires darker text to the background than lighter text compared to the background. I'm sure you've done google searches on this already, and when I did, I found many examples of every OCR reader having these issues.

 

One of the things I did was to desaturate the image to remove the color from the issue, with no difference. It was only after I simply flipped to a negative image (like a color negative) did I have any success.

 

I suppose one thing you could do, if very ambitious, would be to take the image, flip the image as I did, copy the bottom section, and paste it onto the top section. Then you could OCR the entire page. That WILL work but it does add to the workload.

 

Good luck!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines