Encoding question
Copy link to clipboard
Copied
I have been using Andy VanWagoner's CSV parse for a long time now, it always worked fine with 'regular' csv data, but recently I'm working with a CSV file provided by clients which contain typographic quotes.
The thing is, I'm able to read the files, and write these files with all characters preserved OK. However, with this file when the data is parsed as CSV and then stringified, then each typographic character turns into the weird characters.
I'm looking to find more information on this, and what's the reason for the characters being so converted in the code?
I am okay with doing what I'm doing right now :
Using this array to convert the characters which result after being parsed with the parser.
["Äú", "“"],
["Äù", "”"],
["Äô", "’"],
Of course the array is incomplete and I am not sure what we call either set of these when it comes to their encoding kind. What is "Äú" ?
Does anyone have a good way to deal with this to include other characters like this which may eventually come up?
Explore related tutorials & articles
Copy link to clipboard
Copied
I assume you have checked the file encoding type? its not in UTF8 but you're reading as ISO-XXXX etc.
Why Do I Get Odd Characters instead of Quotes in My Documents? - Ask Leo!
Perhaps just convert them to single quotes before running through the csv parser.
Glenn Wilton
O2 Creative
Copy link to clipboard
Copied
Unfortunately I get an error at that link, but what I also discovered is that while I had the characters come out as "Au"-looking letters, if I saved my actual script file from Sublime Text "with encoding" UTF-8, the characters are actually:
["€œ", "Ò"],
["€", "Ó"],
["€™", "Õ"],
So now the 'flattened' characters look like "E™" and the typographic quotes are no longer in their quote form, but they look like "O" characters with squiggly lines. Now with such an encoded script file, my script appears to be working correctly on both Mac and Windows, producing the correct typographic quotes in preview, but still showing the 'flattened' "E" characters when the .csv file is opened in Excel.
This looks like it's working for me, but whatever the reason is, I will need to eventually construct a table of all such characters to handle future ones that I may come across.
Copy link to clipboard
Copied
It does look like an encoding issue, is the file data correct BEFORE you parse the CSV data?
If it correct any only getting muddled when parsing then perhaps also try refactoring your CSV reader to use String.chatAt(x) instead of String
The Scripting Tools Guide mentions a few things about Unicode, I didn't see any notes about which format it prefers, except it assumes system encoding which is different on mac and pc. It looks like you might need to manually sent the encoding when reading and writing the files in the file object props, File.encoding="UTF-8" to force that encoding, or give the user a option of selecting the encoding before parsing.
If you are reading back in UTF-8 into excel you need to tell excel its uft8 or you get the same problem.
Glenn
O2 Creative
Copy link to clipboard
Copied
I will keep trying to narrow this issue down. Now I am thinking that the 'stringify' portion of the parser somehow produces the issue. Do you see anything that jumps out at you in this block?
stringify: function(table, replacer, delimiter) {
delimiter = delimiter || ',';
replacer = replacer || function(r, c, v) { return v; };
var csv = '', c, cc, r, rr = table.length, cell;
for (r = 0; r < rr; ++r) {
if (r) { csv += '\r\n'; }
for (c = 0, cc = table
if (c) { csv += delimiter; }
cell = replacer(r, c, table
var rx = new RegExp("[" + delimiter + "\\r" + "\\n\"]");
if (rx.test(cell)) { cell = '"' + cell.replace(/"/g, '""') + '"'; }
csv += (cell || 0 === cell) ? cell : '';
}
}
return csv;
}
I say this because reading from one of these files and writing the read string doesn't produce these characters.
Moreover, I will try to repeat all my tests and make sure I'm seeing the thing I think I am seeing.
The one thing that's for sure, each of these csv files I open in Sublime Text will open with correct characters, but when opened in Excel either on Mac or PC both, the †begin to appear.
I want to add, that I don't really care about Excel's reading of the files, but my one issue is how after parsing/stringifying, my own code produces such a file that even when opened in Sublime, has the strange characters - and my only workaround right now is to use my hard-coded table (and also have to save my own .jsx file with encoding too!) to replace and un-replace these characters.

