Skip to main content
johnrellis
Legend
November 3, 2010
Question

How the SDK handles unicode

  • November 3, 2010
  • 2 replies
  • 1982 views

I spent several painful hours learning the following about how the SDK handles unicode characters -- perhaps I've missed where this is documented?  Here's what I learned:

- Lua strings are sequences of 8-bit characters (bytes).

- A unicode ZString is represented as a Lua string containing the UTF-8 encoding of the unicode ZString.  For example, the trademark character (TM) is unicode codepoint 2122 (hex), and the ZString LOC "$$$/unicode/tm=^U+2122" is represented as a Lua string of length three, the UTF-8 encoding of that character (decimal bytes 226 132 162).

- A posting from Adobe employee "escouten" last year said that all SDK APIs treat all Lua strings as UTF-8 encoding of unicode strings.  I've personally observed that with LrView, LrFileUtils, and LrTasks.execute, but haven't checked other APIs.  In particular, a Windows unicode filename will be returned by LrFilteUtils as a Lua string encoding the the filename in UTF-8.  Passing that filename in a command line to LrTasks.execute works correctly.  (But writing a Windows batch file with a UTF-8 filename won't in general work -- a topic for another day.)

This topic has been closed for replies.

2 replies

Participating Frequently
September 1, 2014

Hi John,

I'm struggling a little with this UTF-8 topic currently. I can sympathize with your several painful hours now. :-)

1) Can you (or somebody else) reproduce the following issue: (Win 8.1. LR 5.6)

If your photos are stored in a UTF-8 encoded directory such as c:\users\username\Pictøäöüש (the last letter being the Hebrew letter shin). (This is kind of my test case after users from Norway and Israel reported problems.)

    local picName = selectedPhoto:getRawMetadata ("path")

    outputToLog (picName)

I get the wrong result:

C:\Users\username\Pictøäöüש\7L6B7931.CR2

If I use, on the other hand, getFormattedMetadata:

outputToLog (selectedPhoto:getFormattedMetadata ("folderName") .. " and " .. selectedPhoto:getFormattedMetadata ("fileName"))

 

I get a correct result (but not the full pathname)

Pictøäöüש and 7L6B7931.CR2

Going from there, I could probably figure out the full path name (which does not seem to be offered in getFormattedMetadata), but I would like to figure out what's wrong with selectedPhoto:getRawMetadata ("path").

2) The following is more for reference: I cannot seem to pass previews.db path name to sqlite if the path of the previews.db (LR catalog path) contains non-ASCII utf-8 characters.  (Other UTF8 commands on the command line work well.) chcp 65001 doesn't help. sqlite is supposed to accept UTF8 characters in the db name, but somehow doesn't (at least my version, which is somewhat older). I have worked around this issue by first cd-ing to the directory and then starting sqlite i.e. along the lines of "cd <previews-dir> && sqlite3 previews.db" This seems to work so far, even if some new issues have come up of which I don't know yet whether they are related to this or not.

johnrellis
Legend
September 1, 2014

Re 1): I can't reproduce the problem on LR 5.6 / Windows 8.1.  Here's what photo:getRawMetadata() returns for me:

When I log the result to a file and then examine it with Sublime 2, I see the expected answer:

Perhaps the problem you're observing is somewhere between the call to your function outputToLog() and the text editor you're using to examine the log file.  Even in 2014, Unicode is an unnatural act for much software.

Participating Frequently
September 2, 2014

Hi John,

re 1) many thanks for checking this, it was indeed a problem with the text editor not showing the result correctly, I didn't expect Notepad to not handle UTF-8 by default (Windows is not my native platform and I haven't used it much for a couple of years). Worse, LrDialogs.message likewise gives the wrong output too! - which at the time I had taken as confirmation.  I have now checked with Sublime 2 and it looks good. One problem solved...

re 2) I'll get back to you later on this.

- Chris

areohbee
Legend
November 3, 2010

Thanks John,

I took the liberty to add this Pearl Of Wisdom to the lrdevplugin FAQ as well:

https://www.assembla.com/wiki/show/lrdevplugin/Character_Encoding_-_Unicode_UTF-8

Rob