Splitting a CSV with a Regular Expression

Report · Sep 23, 2016

I am working on a script that will read CMYK values from a CSV file and add swatches to the Swatches panel. I can easily split the incoming CSV file by commas (.split(“,”);) but I cannot seem to get a regular expression working to split by both commas and new lines.

Here’s the snippet of code I have at the moment that is not working:

var fileIn = new File("/Users/brianp/Desktop/AVL-PMS-TEMP.csv");
fileIn.open();
var csvIn = fileIn.read();
fileIn.close();
var regEx = "/,|\\n/";
csvRecords = csvIn.split(regEx);

Am I barking up the wrong tree here?

I could edit the CSV ahead of time to find all newlines and replace them with commas, but it seems more elegant to adapt my code to the CSV file as supplied.

Thanks in advance!

Report · Sep 23, 2016

csvRecords = csvIn.split(/[,\n\r]/);

That's the shortcut. To predefine the regex, leave out the quotes:

var regEx = /,|\n/;

Better to add the carriage return (\r) as well.

Use character classes [...] if you can instead of alternatives (...|...), they're more efficient (allegedly).

Peter

Report · Sep 23, 2016

For what it's worth, I tend to split records line by line given that the separator is known:

var f = File ( "/myCSV.csv" );
var headers = [], rows = [];
var sep = "\t"; //a tab character for separator in this case
f.open( 'r');
headers = f.readln().split(sep);
while (  !f.eof ) {
  rows[ rows.length ] = f.readln().split(sep);
}
f.close();
alert( "Hre are the headers:\r"+headers "\rand there are "+rows.length+" rows");

Loic

Ozalto | Productivity Oriented - Loïc Aigon

Report · Sep 23, 2016

@Peter: Awesome, thank you! I knew it was probably something silly like quotes / no quotes. < #embarassed > I'm very much a newbie at Javascript / Extendscript, learning slowly as I go.

(As an aside, the Adobe documentation is very... obtuse. As a beginner a lot of it makes absolutely no sense to me! The Jongware version is slightly better, but a lot of it still not very clear / obvious to me... a lot of what I've learned has been through somewhat painful trial and error! )

@Loic: ahhh, I hadn't thought about doing it that way! I know the CSV I'm using has ten records per line, and that [1] is the swatch name and [6][7][8] and [9] are the CMYK values, so I built my loop around that math. But of course I'd have to re-do the code if given a CSV that was structured differently... this is for a specific project that I'm in the middle of right now, but when I get a chance I'll go back and try to rework my code to read the file differently, and hopefully make it more flexible in the process!

Thanks so much, both of you!

Brian

Report · Sep 23, 2016

But of course I'd have to re-do the code if given a CSV that was structured differently...

That you can easily avoid. If you know the header label, you can reach it without knowing its index with minor adjustments. Here is how I do it generally:


var f = File ( Folder.desktop+"/sample.csv" );  
var headers = [], rows = [], header; 
//Setting used separator
var sep = "\t"; //a comma character for separator in this case  
//Opening file for reading
f.open( 'r');  
//Getting headers on first line
headers = f.readln().split(sep);
//Storing every headers 'index in a object so we can later retrieve the index by calling object property
var n = headers.length;
var db= {}
while ( n-- ) {
  header = headers;
  db[header] = n;
}
//Storing rows
while (  !f.eof ) {  
  rows[ rows.length ] = f.readln().split(sep);  
}  
//Closing file
f.close();  
//Now we can get access to a row[header] value without concern for its index, only its name.
alert( "The Scientific name for the 3rd record is "+rows[2][db["Scientific name"]] );

Loic

Report · Sep 25, 2016

GitHub - cparker15/CSV-js: A CSV (comma-separated values) parser written in JavaScript.

It needs very little modifications to work for ExtendScript.

Splitting a CSV with a Regular Expression

1 Correct answer