How the SDK handles unicode
I spent several painful hours learning the following about how the SDK handles unicode characters -- perhaps I've missed where this is documented? Here's what I learned:
- Lua strings are sequences of 8-bit characters (bytes).
- A unicode ZString is represented as a Lua string containing the UTF-8 encoding of the unicode ZString. For example, the trademark character (TM) is unicode codepoint 2122 (hex), and the ZString LOC "$$$/unicode/tm=^U+2122" is represented as a Lua string of length three, the UTF-8 encoding of that character (decimal bytes 226 132 162).
- A posting from Adobe employee "escouten" last year said that all SDK APIs treat all Lua strings as UTF-8 encoding of unicode strings. I've personally observed that with LrView, LrFileUtils, and LrTasks.execute, but haven't checked other APIs. In particular, a Windows unicode filename will be returned by LrFilteUtils as a Lua string encoding the the filename in UTF-8. Passing that filename in a command line to LrTasks.execute works correctly. (But writing a Windows batch file with a UTF-8 filename won't in general work -- a topic for another day.)
