• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Process lakhs of record

Explorer ,
Jun 21, 2023 Jun 21, 2023

Copy link to clipboard

Copied

Hi,

 

Using cf10 and sqlserverbackend. Have around 1 lakh record to process from a table A.  Need to compare data is present in another table B in another server .

If not present need to update few columns of record in table A.

 

Looping and processing is giving timeout. 

Try to take valuelist for id list and do the process in 10 batch using query of query. It is giving error as parameter cannot contain more than 2100 .

 

Please suggest better way to process one lakh record in Coldfusion 

 

TOPICS
Database access

Views

254

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 21, 2023 Jun 21, 2023

Copy link to clipboard

Copied

Just to clarify, a lahk is about one hundred thousand, right?

 

The way I see it, there are two problems here. The first is, how can you do it at all? The second is, how can you do it in a performant manner? Let's take a look at the first one, first. You mention you're getting a timeout when looping and processing. My guess is that you're trying to do too much within a single CFML program. CF has a setting that you can control within the specific program using the CFSETTING tag:

 

<cfsetting requestTimeout = "30">

 

There are other ways to control the request timeout as well, but you can research them on your own. If you set requestTimeout to zero, the program shouldn't time out at all. Here's a starting point for learning about the CFSETTING tag and the requestTimeout attribute:

 

https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-r-s/cfsetting.html

 

If you set requestTimeout to a large enough value, the program shouldn't time out. But the query or queries might still timeout. You might be able to control that with the CFQUERY tag's timeout attribute. Go here to learn about that:

 

https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-p-q/cfquery.html

 

Unfortunately (sort of), you can't prevent the queries from timing out, and the database might have its own timeout value. But let's assume you can set it high enough that you can perform whatever you're trying to do. At this point, you should be able to solve your looping and processing timeout problem. Let's say you're using SQL Server. If I recall correctly, the maximum number of integers you can use in an IN clause is 32,767. But the more you have in the tableB query, the longer it's going to take. You may have to use shorter lists to avoid that timeout. Here's a quick intro from Ben Nadel on using the list attribute of CFQUERYPARAM, which might improve performance.

 

https://www.bennadel.com/blog/425-coldfusion-cfqueryparam-list-attribute-is-sweeet.htm

 

So anyway, you've solved the timeout problem. Great! But it's going to be VERRRRRY SLOOOOW. And you may encounter serious performance problems even though everything is working. How can you make it perform adequately? Unfortunately, you're not going to have a lot of options here since the databases are on separate servers. If they were on the same server, you could use linked tables from one database to the other, and use a single query to find the matching records ... and it would probably be pretty fast? Is it possible for you to move them onto the same server, even temporarily? For example, we have a client that needs to query a remote database as well as a local one for a specific application. To make that performant, we run a single query during off-peak hours that basically copies the remote database, which has hundreds of thousands of records, into the local database. We don't do that through CF, but through something else that doesn't have to respond to a user's request to a web server.

 

Another option to improve performance is to use CFTHREAD to handle the part of this that doesn't need to respond to a user. That would be less optimal than just moving the tableB records to the tableA database server while you're using them. CFTHREAD basically lets you use a separate thread pool, and those threads don't directly service users so they don't need to finish immediately. Here's some information about CFTHREAD.

 

https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-tags/tags-t/cfthread.html

https://helpx.adobe.com/coldfusion/developing-applications/developing-cfml-applications/using-coldfu...

 

I hope all this helps!

 

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jun 21, 2023 Jun 21, 2023

Copy link to clipboard

Copied

Thank you @Dave Watts 

 

One hundred thousand is same as one lakh. I will check your other points as well  

 

These huge records are populated into table A from a txt file by another team.

From the user interface, some action is done and these records supposed to process following that.

 

Is it possible to compare like A difference B(A - B) by putting table A records on one array and other server data in another array? 

 

 

 

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Jun 21, 2023 Jun 21, 2023

Copy link to clipboard

Copied

  1. Apart from that,  will using event gateways help?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 21, 2023 Jun 21, 2023

Copy link to clipboard

Copied

I think putting the records in arrays will be just as slow. I don't think event gateways will help. Sorry!

 

Dave Watts, Eidolon LLC 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 24, 2023 Jun 24, 2023

Copy link to clipboard

Copied

@RzAf , there is a simple solution, one you would apply to any large job. Just split the job up into small batches.

 

Let's say the table's primary-key is myTblKey. Then, there are some 100 000 primary-key values, say, myTblKey1. myTblKey2, ..., myTblKey100000. As SQL Server restricts you to less than 2100 parameters, you could spilt the job into 50 batches. Each batch would then process some 2000 records.

 

To maintain performance, you could run the batches in series. That is, one after the other:

the job that processes the batch ( myTblKey1 to myTblKey2000 ), followed by the job that processes the batch ( myTblKey2001 to myTblKey4000 ), followed by the job that processes the batch ( myTblKey4001 to myTblKey6000 ), and so on.

 

One way to do so is to run each batch as a scheduled task. 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 11, 2023 Sep 11, 2023

Copy link to clipboard

Copied

LATEST

Finally used linked server to fetch records from different server and it seems working fine now

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation