Skip to main content
Silly-V
Legend
March 4, 2018
Question

Encoding question

  • March 4, 2018
  • 1 reply
  • 2300 views

I have been using Andy VanWagoner's CSV parse for a long time now, it always worked fine with 'regular' csv data, but recently I'm working with a CSV file provided by clients which contain typographic quotes.

The thing is, I'm able to read the files, and write these files with all characters preserved OK. However, with this file when the data is parsed as CSV and then stringified, then each typographic character turns into the weird characters.

I'm looking to find more information on this, and what's the reason for the characters being so converted in the code?

I am okay with doing what I'm doing right now :

Using this array to convert the characters which result after being parsed with the parser.

["Äú", "“"],

["Äù", "”"],

["Äô", "’"],

Of course the array is incomplete and I am not sure what we call either set of these when it comes to their encoding kind. What is "Äú" ?

Does anyone have a good way to deal with this to include other characters like this which may eventually come up?

This topic has been closed for replies.

1 reply

Inspiring
March 6, 2018

I assume you have checked the file encoding type? its not in UTF8 but you're reading as ISO-XXXX etc.

Why Do I Get Odd Characters instead of Quotes in My Documents? - Ask Leo!

Perhaps just convert them to single quotes before running through the csv parser.

Glenn Wilton

O2 Creative

Silly-V
Silly-VAuthor
Legend
March 7, 2018

Unfortunately I get an error at that link, but what I also discovered is that while I had the characters come out as "Au"-looking letters, if I saved my actual script file from Sublime Text "with encoding" UTF-8, the characters are actually:

["€œ", "Ò"],

["€", "Ó"],

["€™", "Õ"],

So now the 'flattened' characters look like "E™" and the typographic quotes are no longer in their quote form, but they look like "O" characters with squiggly lines. Now with such an encoded script file, my script appears to be working correctly on both Mac and Windows, producing the correct typographic quotes in preview, but still showing the 'flattened' "E" characters when the .csv file is opened in Excel.

This looks like it's working for me, but whatever the reason is, I will need to eventually construct a table of all such characters to handle future ones that I may come across.

Inspiring
March 7, 2018

It does look like an encoding issue, is the file data correct BEFORE you parse the CSV data?

If it correct any only getting muddled when parsing then perhaps also try refactoring your CSV reader to use String.chatAt(x) instead of String as it maybe returning different results?

The Scripting Tools Guide mentions a few things about Unicode, I didn't see any notes about which format it prefers, except it assumes system encoding which is different on mac and pc. It looks like you might need to manually sent the encoding when reading and writing the files in the file object props, File.encoding="UTF-8" to force that encoding, or give the user a option of selecting the encoding before parsing.

If you are reading back in UTF-8 into excel you need to tell excel its uft8 or you get the same problem.

Glenn

O2 Creative