Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Large Text Import to SQL

New Here ,
Feb 04, 2011 Feb 04, 2011

I was hoping someone could offer some insight on this issue, because I have the feeling that it's not a terribly rare one.
I have a relatively large text file in fixed-width format, roughly 40,000 rows, that needs to be uploaded and imported into a table. This works fine for other tables but, for one table with roughly 15 fields, the script times out consistently. If I cut it down to roughly 20,000 rows, it runs fine.
I'm executing this through a file loop and doing individual insert queries on each iteration. At the end, a stored procedure also has to be run, and that takes several seconds to execute.
Simply changing the request timeout in CF Admin is not an option.
I'm not overly familiar with threading, so I'm not sure if that could help or not. I did notice that it maxes out at 5,000.
Maybe I could loop through parts of the file and that would help mitigate the timeout?

Any help appreciated.

Thanks.

1.3K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Feb 04, 2011 Feb 04, 2011

If it's timing out then almost certainly your methodology is wrong.

ColdFusion is not really made for jobs like this, I'd  take a look into using something like SSIS to import the data. If a web  interface must be involved, use CF to upload and save the file, then  call a stored proc or SSIS packsge to import the data.

CF will take forever then crash, a native database method will suck it all up in seconds

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Feb 04, 2011 Feb 04, 2011

I agree that it is best to use the database ETL tools like SSIS whenever possible.  They will alwasy be faster.  But sometimes you don't have a choice. But to help you we need more information about what you are doing.  What database product and version?  What format is the incoming file (CSV, XML, ...).

I do a lot of this sort of thing,  and my experience is that the problems are usually related either to the fact that the CF code ends up doing individual database INSERTs for each row, which makes it run long (but not crash, so long as the request timeout is high enough), or due to memory problems in parsing XML data (which can crash the job). 

I sounds like you are getting a delimited file, since you say that you are processing it line by line.  So try this to narrow down the field of problems and solutions: Replace the CFQUERY tag that does the INSERT with a CFFILE that appends the data to a file.  You'll need a dummy CFFILE before the loop to write out column headers.  How long does this run?  It will probably run pretty fast. 

Now look into what options you have for your database product for doing bulk inserts (hint, if it is SQL Server then it's called BULK INSERT).  If you have that option, then try doing a bulk insert of the file you just created.  It is very finicky in terms of all columns being populated, text delimiters, etc., so it might take some trial and error.  But it will be fast.  You will need to be sure that both CF server and the database server have access to wherever you store the file, and that may have some implications on the login account that each of their services are running under (see other threads for repeated discussions of that topic).


Good luck,

-reed

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2011 Feb 04, 2011

Thanks for getting back to me so quickly.

To answer some follow-up questions:

- Yes, it's SQL Server (sorry, I meant to put that in the subject line)

- Yes, it has to be through a web interface

- It is a fixed-width text file

Per bulk insert, I've never done it with fixed-width, so I wasn't sure how to even go about it, but I think that's a good place to start. However, a couple of potential problems:

- The SQL Server instance is not on the same server as the web server

- The SQL Server likely is not able to connect out, so I couldn't write to the web server, then have SQL Server pick that file up. But I'm really not sure and could probably try that (I'm a very short-term contractor with these people).

I'll post the relevant code and that may help narrow down what needs to happen. I don't know that I'm doing everything as efficiently as it could be done.

<cffile action="upload" destination="#ExpandPath(uploadDir & filename)#"

fileField="theUpload" accept="text/plain,text,application/zip,application/x-zip-compressed" result="uploadStr">

<cfif uploadStr.contentType NEQ 'text/plain' AND uploadStr.contentType NEQ 'text'>

<cfzip action="list" file="#ExpandPath(uploadDir & filename)#" name="qryZip">

<cfset unzipFilename = "">

<cfloop query="qryZip">

<cfif Attributes.currentTable EQ ListFirst(tableList) AND qryZip.name EQ 'add'>

<cfset unzipFilename = 'add'>

<cfelseif Attributes.currentTable EQ ListLast(tableList) AND qryZip.name EQ 'pl'>

<cfset unzipFilename = 'pl>

</cfif>

</cfloop>

<cfif Len(unzipFilename) EQ 0>

<cfset errorCode = 'zipNoFile'>

<cfelse>

<cfzip action="unzip" file="#ExpandPath(uploadDir & filename)#" destination="#ExpandPath(uploadDir)#">

<cffile action="delete" file="#ExpandPath(uploadDir & filename)#">

<cffile action="rename" source="#ExpandPath(uploadDir & unzipFilename)#" destination="#ExpandPath(uploadDir & filename)#">

</cfif>

</cfif>

<cfcatch type="any">

<cfset errorCode = 'upload'>

<cfset errorMsg = cfcatch.message>

</cfcatch>

</cftry>

<cfif Len(errorCode) EQ 0>

<cftransaction>

<cftry>

<cfquery result="qryDelete" datasource="#Application.dsn#">

DELETE FROM #Attributes.currentTable#

</cfquery>

<cfset lineNum = 1>

<cfset zipsDone = "">

<cfloop file="#ExpandPath(uploadDir & filename)#" index="line">

<cfif lineNum GT 1>

<cfquery result="qryInsert" datasource="#Application.dsn#">

INSERT INTO #Attributes.currentTable# (

<cfloop array="#fields#" index="f">

<cfif f[1] NEQ fields[1][1]>,</cfif>

#f[1]#

</cfloop>

) VALUES(

<cfloop array="#fields#" index="f">

<cfif f[1] NEQ fields[1][1]>,</cfif>

<cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#Trim(Mid(line,f[2],f[3]))#" maxlength="#f[4]#">

</cfloop>

)

</cfquery>

</cfif>

<cfset lineNum++>

</cfloop>

<cfstoredproc datasource="#Application.dsn#" procedure="dbo.sp_BatchUpload">

<cfprocparam cfsqltype="CF_SQL_VARCHAR" value="#Attributes.currentTable#">

<cfprocparam cfsqltype="CF_SQL_VARCHAR" value="#DateFormat(Now(),'mmddyyyy')#-#TimeFormat(Now(),'HHmmss')#-BATCH-AFTER">

<cfprocparam cfsqltype="CF_SQL_VARCHAR" value="#Application.userInfo.userID#">

</cfstoredproc>

<cfcatch type="any">

<cfif cfcatch.type EQ 'database'>

<cfset errorCode = 'db'>

<cfset errorMsg = cfcatch.sql & '<br/>' & cfcatch.queryError>

<cfelse>

<cfset errorCode = 'db'>

<cfset errorMsg = cfcatch.message>

</cfif>

<cftransaction action="rollback">

</cfcatch>

</cftry>

<cfif Len(errorCode) EQ 0>

<cffile action="delete" file="#ExpandPath(uploadDir & filename)#">

<cftransaction action="commit">

<cfset showForm = false>

<cfelse>

<cftransaction action="rollback">

</cfif>

</cftransaction>

</cfif>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 04, 2011 Feb 04, 2011

Okay, so I right after I posted that I looked at it and realized "Hey, I don't need to loop through the fields to make the column list on each iteration and could just build a string before the loop and hash it in" and that sped it up enough to not time out. I also removed the "result" attribute for the insert query, thinking that would also help it a little.

But I would love to hear some suggestions on this code to maybe make it more efficient.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 04, 2011 Feb 04, 2011
LATEST

One way that might make it more efficient is to put your loop inside the query.   Instead of

<cfloop>

<cfquery>

insert into atable

(f1, f2)

values

(v1,v2)

do this

<cfquery>

insert into atable

(f1, f2)

select v1, v2

from somesmalltable

where 1 = 2

<cfloop>

union

select #v1#, #v2#

</cfloop>

I have a template that does this with oracle.  It includes this comment:

<!--- There are ~ 4500 lines in the file.  With that big a union query, oracle might choke so break it up into blocks of 500.
Use pipe delimiters in case there are commas in the data
--->

To break it up, go to cflib.org and look for a function called ListSplit.  It converts a really long list to an array of shorter lists.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Valorous Hero ,
Feb 04, 2011 Feb 04, 2011

coatimundiAZ1 wrote:


<cfloop file="#ExpandPath(uploadDir & filename)#" index="line">

If you can not get this into some type of bulk insert situation, you are at least looping through the file one line at a time so it is probably not a memory issue, just a time out issue.

You do not have to set a timeout for the entire applicaiton.  You can set a request time out for just this file with the <cfsetting....> tag.

I.E.  <cfsetting requestTimeOut = "600"> <!--- ALlow 600 seconds (five minutes) for this request to run --->

You could put that at the top of this file and it would not effect any other request.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources