Copy link to clipboard
Copied
For a student with dyslexia, I scanned a text from a textbook so that she could use it in her dyslexia software. Only when I want to convert it in Acrobat Pro using OCR, it does not recognise white text on the coloured areas. How can I solve this problem? I am already using the Scan & OCR > Enhance > Scanned Document tool.
Copy link to clipboard
Copied
HI Skpoel,
Interesting problem. If you do a google search, this issue has thwarted many people. Before I continue, let me compliment you on your scans. I see many scans here in these threads, and yours are superb.
I experimented, but I'm not sure how well this will suit your problem; it does require Photoshop or some other photographic application. So here's what I did: I opened your tif into Photoshop and opened a curve window.
Then I took the end-points and flipped them; I took the left handle and dragged that to the top and took the right handle and dragged that to the bottom. This "trick" is often done by folks who have scanned color negatives and need to get the photo opposite. At that point, your image looks like this:
However, when I run that through Acrobat, the top section cannot be read, but the bottom section OCRed perfectly, but the top part, not so much.
Here's the text from the bottom section:
Meer wet en? < < < < < < < < < < < < < < < < < < < < < < < < < < < < < <
Probeer eens uit: maak je eigen stempel!
Er zijn duizenden speksteenblokjes van de Indusmensen
gevonden. Ze gebruikten ze om in klei te stempelen, als een
soort handtekening. Wil jij je eigen stempel maken? Speksteen
kun je in kunstwinkels kopen en bewerken met een vijl
en schuurpapier. (Zoek maar eens op internet op "Speksteen
bewerken".) Je kunt anders ook een stempel van een stukje
gips, gum of zacht linoleum maken. Die materialen kun je
met een guts bewerken.
Met je stempel kun je je handtekening in klei
stempelen zoals de Indusmensen dat deden. Wil
je dat je stempel eruitziet als een echte Indusstempel?
Dan moet hij vierkant zijn, met schriftsyrnbolen
aan de bovenkant en een dier in het
midden. Welk dier wordt jouw handtekening?
26 • Atlantis • groep 6.7.8 • leesboek 1
I'm not sure how you can use this, but it's a potential part of the solution
Copy link to clipboard
Copied
Thank you for your answer. It is purely a contrast thing in my opinion. The OCR has trouble recognising white text, by making the image negative, the text becomes black and the OCR can recognise it.
Copy link to clipboard
Copied
HI Skpoel,
I do not think that it's totally contrast, as the original image had plenty of contrast. Rather, I suspect that for some reason, OCR requires darker text to the background than lighter text compared to the background. I'm sure you've done google searches on this already, and when I did, I found many examples of every OCR reader having these issues.
One of the things I did was to desaturate the image to remove the color from the issue, with no difference. It was only after I simply flipped to a negative image (like a color negative) did I have any success.
I suppose one thing you could do, if very ambitious, would be to take the image, flip the image as I did, copy the bottom section, and paste it onto the top section. Then you could OCR the entire page. That WILL work but it does add to the workload.
Good luck!