Copy link to clipboard
Copied
Hi,
Using cf10 and sqlserverbackend. Have around 1 lakh record to process from a table A. Need to compare data is present in another table B in another server .
If not present need to update few columns of record in table A.
Looping and processing is giving timeout.
Try to take valuelist for id list and do the process in 10 batch using query of query. It is giving error as parameter cannot contain more than 2100 .
Please suggest better way to process one lakh record in Coldfusion
Copy link to clipboard
Copied
Just to clarify, a lahk is about one hundred thousand, right?
The way I see it, there are two problems here. The first is, how can you do it at all? The second is, how can you do it in a performant manner? Let's take a look at the first one, first. You mention you're getting a timeout when looping and processing. My guess is that you're trying to do too much within a single CFML program. CF has a setting that you can control within the specific program using the CFSETTING tag:
<cfsetting requestTimeout = "30">
There are other ways to control the request timeout as well, but you can research them on your own. If you set requestTimeout to zero, the program shouldn't time out at all. Here's a starting point for learning about the CFSETTING tag and the requestTimeout attribute:
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-r-s/cfsetting.html
If you set requestTimeout to a large enough value, the program shouldn't time out. But the query or queries might still timeout. You might be able to control that with the CFQUERY tag's timeout attribute. Go here to learn about that:
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-p-q/cfquery.html
Unfortunately (sort of), you can't prevent the queries from timing out, and the database might have its own timeout value. But let's assume you can set it high enough that you can perform whatever you're trying to do. At this point, you should be able to solve your looping and processing timeout problem. Let's say you're using SQL Server. If I recall correctly, the maximum number of integers you can use in an IN clause is 32,767. But the more you have in the tableB query, the longer it's going to take. You may have to use shorter lists to avoid that timeout. Here's a quick intro from Ben Nadel on using the list attribute of CFQUERYPARAM, which might improve performance.
https://www.bennadel.com/blog/425-coldfusion-cfqueryparam-list-attribute-is-sweeet.htm
So anyway, you've solved the timeout problem. Great! But it's going to be VERRRRRY SLOOOOW. And you may encounter serious performance problems even though everything is working. How can you make it perform adequately? Unfortunately, you're not going to have a lot of options here since the databases are on separate servers. If they were on the same server, you could use linked tables from one database to the other, and use a single query to find the matching records ... and it would probably be pretty fast? Is it possible for you to move them onto the same server, even temporarily? For example, we have a client that needs to query a remote database as well as a local one for a specific application. To make that performant, we run a single query during off-peak hours that basically copies the remote database, which has hundreds of thousands of records, into the local database. We don't do that through CF, but through something else that doesn't have to respond to a user's request to a web server.
Another option to improve performance is to use CFTHREAD to handle the part of this that doesn't need to respond to a user. That would be less optimal than just moving the tableB records to the tableA database server while you're using them. CFTHREAD basically lets you use a separate thread pool, and those threads don't directly service users so they don't need to finish immediately. Here's some information about CFTHREAD.
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-t/cfthread.html
I hope all this helps!
Dave Watts, Eidolon LLC
Copy link to clipboard
Copied
Thank you @Dave Watts
One hundred thousand is same as one lakh. I will check your other points as well
These huge records are populated into table A from a txt file by another team.
From the user interface, some action is done and these records supposed to process following that.
Is it possible to compare like A difference B(A - B) by putting table A records on one array and other server data in another array?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
I think putting the records in arrays will be just as slow. I don't think event gateways will help. Sorry!
Dave Watts, Eidolon LLC
Copy link to clipboard
Copied
@RzAf , there is a simple solution, one you would apply to any large job. Just split the job up into small batches.
Let's say the table's primary-key is myTblKey. Then, there are some 100 000 primary-key values, say, myTblKey1. myTblKey2, ..., myTblKey100000. As SQL Server restricts you to less than 2100 parameters, you could spilt the job into 50 batches. Each batch would then process some 2000 records.
To maintain performance, you could run the batches in series. That is, one after the other:
the job that processes the batch ( myTblKey1 to myTblKey2000 ), followed by the job that processes the batch ( myTblKey2001 to myTblKey4000 ), followed by the job that processes the batch ( myTblKey4001 to myTblKey6000 ), and so on.
One way to do so is to run each batch as a scheduled task.
Copy link to clipboard
Copied
Finally used linked server to fetch records from different server and it seems working fine now