Skip to main content
johnrellis
Legend
February 16, 2023
Question

Export Keywords doesn't include a byte order mark and Import Keywords doesn't accept one

  • February 16, 2023
  • 1 reply
  • 2697 views

The command Export Keywords > Include Keyword Tag Options (.csv)  should include a byte order mark (BOM) and the Import Keywords command should skip a BOM if present.  Since the use of BOMs is a de facto standard for UTF-8 CSVs, LR's lack of proper BOM handling is a design defect.

 

Currently, if a user with Excel (the most common tool for manipulating CSVs) opens a LR-exported .csv, Windows Excel will assume the file is encoded in the current Windows code page, garbling any non-ASCII Unicode characters; Mac Excel will similarly garble Unicode characters.  The user must use an obscure seven-step process to import the .csv into Excel properly.  In the first 24 hours of LR 12.2, a user and Rikk Flohr both stumbled over this, thinking the Export Keywords command was broken.

 

And if a user tries to import a CSV modified by Excel into LR, she'll get an error message, "The file cannot be imported, because it contains corrupted data", because Excel has written a BOM that LR doesn't recognize. The user will have to use Windows Notepad or Mac Text Edit to save the modified file without the BOM. 

 

Only Unicode nerds whose brains are filled with such useless details will be able to figure this out.

 

Widely used apps that recognize the BOM: Windows Excel, Mac Excel, Google Sheets, Windows Notepad, Mac Text Edit, Sublime, Mac Numbers. Mac Pages, Windows Word, Mac Word.  I'm sure there are many others.

1 reply

Legend
February 16, 2023

Checking files in Notepad++ on Windows and BBEdit on Mac, both Lightroom Classic and Excel CSV exports are shown as UTF-8 without BOM. Both text editors have an indicator and can convert between different code sets. This is on a US English system, CSV exports done on Windows. BBEdit display is similar.

 

I was able to open the LR export CSV just fine in Excel 2016, right-click Open With (not using the import wizard) and it opened correctly without errors. Have not tested in Mac Excel or Lightroom.

 

johnrellis
Legend
February 16, 2023

[This post contains formatting and embedded images that don't appear in email. View the post in your Web browser.]

 

The issue with Excel arises with non-ASCII Unicode characters encoded as multiple bytes in UTF-8.  For example, create a keyword "Activité" and do Export Keywords.  Double-clicking or right-clicking Open With on the .csv opens in Windows Excel 2016 and Mac Excel with the keyword garbled:

 

Since the .csv doesn't include a BOM, Excel interprets the file as encoded in another 8-bit character set (the current code page on Windows; I'm not sure which set on Mac), and the "é" character gets garbled.

 

Importing the .csv into Excel using the seven-step recipe correctly opens the file as UTF-8:

 

If you modify the file by replacing "Activité" with "XActivité" and then do File > Save As, with type CSV UTF-8, then Excel correctly includes a BOM in the saved file. But when you import that saved .csv with LR's Import Keywords, you get this error message:

 

 

 

 

 

 

Participant
February 23, 2023

If you export a keyword file with non-ASCII Unicode characters (e.g. "ë"), then create a new catalog and try to import the exported file you get the error message.