ClearScan not encoding ligatures properly
I use the ClearScan option to OCR my pdfs. All are high quality scans and OCR is very accurate with one exception: Ligatures. Characters like ff, fi, fl etc. are not encoded properly, i.e., they show up blank. While they look fine on screen it makes it impossible to rely on the OCR for searching the PDFs.
To my bafflement Acrobat recognizes ligatures accurately when I don't do ClearScan but keep it as Image. I've tested multiple documents with both options, it's always the same. Simple words like "different" are fine with the Image option but show up "di erent" with ClearScan from the same source.
Clearly the mapping of the ligature to the respective two code points doesn't work as it should.
How can I fix this?
And, especially: How can I fix this after the fact, given that I already have a large number of PDFs ClearScanned.
Any help greatly appreciated!
Thanks!
