Normalise string to NFKC form
I have a document with lots and lots of Polytonic Greek. A lot of the Greek text is incorrect (reflecting its origin as an OCR-traced scan), with diacritics being particular troublesome. Three different types appear in the document:
- separate (full-width diacritic followed by letter): ῞Ι (U+1FDE + U+0391)
- decomposed (letter followed by combining diacritics): Ἵ (U+0391 + U+0314 + U+0301)
- precomposed (single glyph): Ἵ (U+1F05)
It’s easy enough to convert separate representations into decomposed characters in a few GREP queries. But as is well-known, InDesign does not handle Unicode normalisation very well, meaning that any further find/replace or GREP styles targeted at precomposed characters will not work on decomposed characters.
Meanwhile, JavaScript string normalisation didn’t arrive until ECMAScript 6 and thus doesn’t work in ExtendScript (and there are still some deal-breakers in UXP scripting that means I can’t use that), so the obvious, built-in choice won’t work.
Is there some way – through a script or otherwise – to normalise all decomposed characters to precomposed according to NFKC?
(If scripting, ExtendScript would be preferable, since I’m hovering somewhere slightly south of useless in both VBScript and, particularly, AppleScript.)
