Skip to main content
Inspiring
June 8, 2010
Question

CSV INSERT TAKES OVER 24 HOURS

  • June 8, 2010
  • 3 replies
  • 3608 views

Hi


I have a simple CSV with 5 columns. Name, Email, Custom1, Custom2, Custom3, Custom4


The CSV has 120 000 rows.


When I insert the CSV into a MySql DB, it starts off perfectly. About 2000 is 3 minutes.

Thereafter it slows down alarmingly to about 1 record every second or 2.

I really need to speed this process up.


My code is fairly simple... See below:


Please can someone assist me with this as it is causing me endless headaches!


Best regards

Delon

<cffile action="upload" filefield="CSVFILE" destination="C:\inetpub\vhosts\abc.com\httpdocs\login\csvfiles\">
<cffile action="READ" file="#CSVFILE#" variable="thefile">
           
            <cfset processed=0>
            <cfloop list="#thefile#" delimiters="#chr(10)#" index="el">
                <!--- check that that there is no comma in name --->
                <cfif listlen(el) GTE "2" AND findnocase('@',listgetat(el,'2')) NEQ 0>
                    <!--- check that we dont have this email already --->
                    <cfquery name="CheckDup" datasource="#application.db#">
                    SELECT    name
                    FROM     tblrecipients
                    WHERE    <!--- name     =     '#trim(listgetat(el,'1'))#'
                    AND      --->
                    email     =     '#trim(listgetat(el,'2'))#'
                    AND     ListID    =    #ListID#
                    AND     userid    =    #Session.userid#
                    </cfquery>
                   
                    <cfif CheckDup.RecordCount eq 0>

                        <cfquery  datasource="#application.db#" timeout="9999999">
                        INSERT INTO tblrecipients(name,email,listid,userid,COLUMNA,COLUMNB,COLUMNC,COLUMND)
                        VALUES('#trim(listgetat(el,'1'))#','#trim(listgetat(el,'2'))#',#ListID#,#Session.userid#,
                        <cftry>'#trim(listgetat(el,'3'))#'<cfcatch type="any"> '</cfcatch></cftry>,
                        <cftry>'#trim(listgetat(el,'4'))#'<cfcatch type="any"> '</cfcatch></cftry>,
                        <cftry>'#trim(listgetat(el,'5'))#'<cfcatch type="any"> '</cfcatch></cftry>,
                        <cftry>'#trim(listgetat(el,'6'))#'<cfcatch type="any"> '</cfcatch></cftry>
                        )
                        </cfquery>
                        <cfset processed=processed+1>
                    </cfif>
                   
                </cfif>
            </cfloop>

This topic has been closed for replies.

3 replies

Inspiring
June 8, 2010

A couple of things jump out at me in the CF code.  As Adam mentioned, you should read the file line by line.  You're effectively

doing that anyway with the delimiters= in the CFLOOP, so just remove the CFFILE to read the file, and change the CFLOOP to read the file instead.

All those CFTRYs in the INSERT statement can't be helping either - just append 4 empty items to the line before the INSERT, and you won't need the CFTRYs at all.

Are there lots of email dupes?  If the number of distinct emails isn't too great, try loading them into a query first and then doing a QofQ to look for the dupes, instead of going all the way back to the database each time.  If you know that the same emails occur a number of times, then at least put a CachedWithin= in the database query to save some of that overhead.

But by far the best bet is Adam's other suggestion - get this out of CF and into the database's ETL if possible.  If you're using MS SQL, this is a trivial SSIS task.  I used to get CSV data files that were being processed by CF, which over time became huge files.  CF was taking hours to process, but when we made it an SSIS task, it only took seconds to complete.

Inspiring
June 8, 2010

Hi Reed

Thank you for you for your response.

Whilst I may think I know a lot about CF and MySQL, I actually don’t know nearly as much as I need to.

Could I ask you to explain how to use SSIS?

From the time the CSV is uploaded, what do I do?

DO you have a working example for me to see? I really would be most grateful.

Best regards

Delon

Inspiring
June 8, 2010

I googled "mysql bulk load", and this was the first match: http://www.classes.cs.uchicago.edu/archive/2005/fall/23500-1/mysql-load.html

It seems to explain it fairly succinctly (I sound vague because I've never done this, I just know it's possible).  It looks pretty straight forward?

It at least gives you a stepping off point to read up on "LOAD DATA" (http://dev.mysql.com/doc/refman/5.1/en/load-data.html), which seems to be the relevant command to do what you need to do...

--

Adam

Inspiring
June 8, 2010

First: read the file line by line, rather than the whole thing in one fell swoop.

Second: parameterise your queries, rather than hard-coding values into your SQL string.

Third: if you're only querying for a recordCount with CheckDup, why not just return the record count (eg: count(name)) rather than return a recordset and then count it?

Fourth: investigate bulk-loading this via the DB, omitting CF from the mix completely, or perhaps only involving it to prep the bulk-load file.

(those are in the order they occurred to me, not in the order of importance... it's probably 4, 1, 2, 3).

--

Adam

ilssac
Inspiring
June 8, 2010

ColdFusion is not the best tool for bulk loading data like this.  Most database management tool worthy of the name have bulk loading features that can directly load data by reading CSV files.

If you must use ColdFusion, you maybe running into a issue where CF can't store the entire datafile in memory, causing the server to do lots of memory swapping.  With a large file like that, I would try to create a system that reads a line at a time and load that line.