• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Processing large batch - best practice

Contributor ,
Apr 14, 2020 Apr 14, 2020

Copy link to clipboard

Copied

Once a month Quickbooks generates a SQL table of approximately 1000 transactions to process on Stripe. I'm looking for suggestions on how to best process this many transactions instead of cfloop. My concerns with cfloop are what if it crashes where do I restart and timeouts. Once they're processed the record is deleted.  Here are my thoughts:

 

ProcessPayment CFM page:

Select Top1 from ToBeProcessed

<cfif if ToBeProcessed.recordcount GT 1>

<!--- Process Stripe Transaction --->

Delete from ToBeProcessed transaction x

<cflocation processpayment.cfm>

</cfif>

 

This would loop over the table one transaction at a time, process it, delete it, then start again.

Thoughts? Is there a better way?

 

Gary

 

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

Gary, I'd think we'd need more info to make a really effectice recommendation,. For instance, do you mean that you would originally have looped over this qb data and used sql (in cfquery or cfstoredproc, etc) to then insert/update/delete records in some DB of your own? If so, then I understand that your concern is what happens if you don't finish all the processing.

 

But then there are two solutions for that:

  • this is where the cftransaction tag can shine. You could wrap the entire loop in that tag, and that would cause CF to tell the DB to regard all the sql processing within that outer cftransaction tag to be treated as a single unit, such that if any failed, they would all fail, and the DB should "roll back" (undo) the processing that was done (within that outer cftransaction tag).
    • You should test things to make sure your db DOES do that, using some sample data and introducing an error, before relying upon it. Also, there can be some edge cases where doing too many actions within that outer tag could potentially hold up other procesing (in the db, if the db server decides it needs to introduce lock contention. But really for most people, the benefits of using cftransaction outweigh such outlier risks.
  • Otherwise, you could address your need by creating some sort of tracking of how many records you have processed, so that you could know (if it failed) that you could start over at that point.
    • You could track that count in any of many ways, including a properties file, a db table, a log, or even a shared scope (like session or application or server, though those would be lost of CF were to restart somehow in the midst of processing).

 

Finally, as for your concern over "timeouts" (from looping over too many operations), I asume you mean the CF page request timeout (which can be set either at the server/admin level, the application level, or the page level). You can override it at the page level, by using cfsetting requestttimeout="some seconds", on this page doing the processing.

 

Let us know if that helps, or if for some reason it does not.


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

To add to what Charlie has suggested (and the cftransaction tag was made specifically for situations like this), you can also get a querylist of the IDs and use that to delete the records from the toBeProcessed table, after the processing is done, within the cftransaction tag.

 

V/r,

 

^ _ ^

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

I just coded a very similar scenario and used a loop and wrapped the processing of each itteration in cftransaction.

 

Something like

 

loop over ToBeProcessed

    transaction start

    try {

      Process Stripe transaction

      Delete ToBeProcessed

      transaction commit

    } catch {

        transaction rollback

        rethrow

    }

end loop

 

This commits each transaction as it completes and if it fails inside the try block the rollback will undo all parts of the one transaction so the ToBeProcessed table is left with only uncompleted transactions so can you just run the script again, after fixing what caused the failure.

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

There are a couple of things I'd recommend. My recommendations might be a little different from Charlie's, though that doesn't mean his are wrong! First, I'd try to figure out what are the problematic things that might happen if your operation fails in mid-process, and think about how you can address those things. Second, I'd try to figure out what are the problematic things that might happen if your operation doesn't fail, and likewise think about them too.

 

So, CFTRANSACTION will let you treat the whole batch as one single operation. That's good, in that if part fails, the whole thing will fail and you can rerun it from scratch and not worry about losing track of where you are. But there are some problems you might run into with CFTRANSACTION. One, it will effectively lock all access to the tables in question - after all, that's literally what it's supposed to do. If you expect to have other interactions with those tables, they'll be blocked until the CFTRANSACTION is done, or vice-versa. If your CFTRANSACTION will take a long time, those other operations may simply time out. We don't know enough about your data (like Charlie said) to really address this part, but it would be unfortunate if some of your sales failed because you were busy with this operation. Another approach would be to have some field you can use as a flag, and maybe use a single query to act on one row at a time, based on the value of the flag. So you could have one query that fetches all of the rows with the flag field not yet set, then loop over those, then have a nested query within the loop that does whatever is needed to each individual row. You may or may not need CFTRANSACTION in that case, as a single query is effectively a transaction all by itself. Again, we'd need to know more about what's going on.

 

But another problem you might have is that this whole operation simply takes a long time. Unfortunately, iterating over a bunch of records in CF is just going to be a lot slower than doing the same directly from SQL. So, you could try to move the whole operation into a single stored procedure, which would have its own loops, etc. This requires a little more SQL knowledge than a lot of developers have (or even need most of the time) but it could definitely improve overall performance.

 

One other thing to note: I see you have a CFLOCATION inside your code. This will break out of your CFIF or CFLOOP, and of course it'll take you to another page. You really don't want to do this, as a rule, as it's going to make processing a bunch of records more complicated than it needs to be.

 

Finally, this kind of batch processing is very well suited to CFTHREAD. Instead of waiting for the operation to complete before your code continues, you can use CFTHREAD to launch a second thread in the background that does the work, then your code can continue doing something - even if that something is as trivial as just letting the user know that things are happening in the background. Of course, if you try to capture the completion of that thread, you'll have to write some more complicated code to see when it's finished, but many times you don't even need to do that.

 

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

Great additional points, Dave and John. As for John's, I forgot to mention that cftransaction has indeed evolved over the years to offer that attribute (he shows it in script, but it works in tags also). That's a useful extension of this solution to consider.

 

And Dave of course offers many useful additional points to consider. And I did have in mind to suggest also that sometimes such processing might be better done NOT in CF.

 

That said, I would contend with Dave's assertion that using cftransaction "will effectively lock all access to the tables in question". As I noted in my first answer, there COULD be such contention , but I don't think it's right to say that it WILL. And I wouldn't say "that's literally what it's supposed to do".

 

Instead, I think all one could say was that it could be expected to lock the RECORDS being updated, which is at least a lot less impactful than if it were to lock the TABLE. (It is true that there are cases where a DB server will escalate row locks to a table lock, but I'd not expect that if these operations Gary is doing are only affecting one record, one after another in the loop.)

 

But even about record locking, we should note that different databases address that with newer and newer implementations which reduce the potential for grave impact. (Consider for example the sql server's "snapshot" isolation option.) 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Apr 15, 2020 Apr 15, 2020

Copy link to clipboard

Copied

In the example I provided I use the transaction around the code inside the loop so I lock the minimum number of records per transaction, so there is minimal chance of contention. In another scenario I had the process was adding 10k entries accross multiple tables so as these were new entries I wrapped the transacion around every 10 or so updates. It really depends on your individual requirements.

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Apr 16, 2020 Apr 16, 2020

Copy link to clipboard

Copied

Thanks for the great recommendations. I'll read up on CFTRANSACTION and will play around with that. Perfect timing since I just took an online SQL class and discovered rollbacks etc. I just didn't know how to use it in CF.

 

To address some of Charlie's questions... I'm using msSQL 2016. Yes, when I mention the page timing out I am referring to the page timing out from looping over too many transactions. I know I can set the requesttimeout on the page my concern is x seconds may be good today but as the number of transactions grows over months that would need to be adjusted. Is there any danger in setting this to some absurd number of seconds?

 

I have a process that runs at the end of the day the scans the day's invoices from Quickbooks for invoices flagged autopay and creates a new table with all the data needed to run the transactions merged into a single table. Next, I loop over the new table and send the friendly "hey your card is going to be charged $x tomorrow." At the end of the next day is the process in questions here. I need to loop over this table and actually charge the customers card and I want to be as careful as possible not to have it crash and charge the card twice because it needs to start over or fail and not finish charging all the cards. Since this table is only being used to process the autopay from the previous day being locked is not an issue. That's why I thought it would be good to delete the row after it completes. It's never needed after the card is charged.

 

Thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Apr 16, 2020 Apr 16, 2020

Copy link to clipboard

Copied

Now you have shared more information on what the transaction is it may not be as simple as first thsought. If part of your transaction means reaching out to a thirrd party payment system for charging the card then that part of the transaction will not be rolled back if something fails. From experience of similar systems I would not delete the ToBeProcessed but update it with progress of the transaction and mark complete at the very end. This allows you to have the information as an audit and troubleshooting if something goes wrong in the middle of a transaction, you will know exactly how far the transaction got so you know if the card was charged and the transaction failed after that.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Apr 16, 2020 Apr 16, 2020

Copy link to clipboard

Copied

I was thinking that. Tag it as pending when building the table. Then change it to processing right before calling out to Stripe. Then update it to Approved/Declined after processing and moving on to the next transaction. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 19, 2020 Apr 19, 2020

Copy link to clipboard

Copied

Hi Gary,

 

I have 2 questions and a remark, to begin with.

Question: Are the 1000 transactions independent or are they nested (as sub-transactions) within one overall transaction? 

Question: You mentioned timeout. So are you using cflocation to enable you to set a requestTimeout at page level?

Remark: If not, then the cflocation strategy will do the same thing as cfloop, but less efficiently.

The end tag </cfloop> passes command back to the start tag <cfloop> within the same page context. Whereas <cflocation> will run the Application file, then everything preceding the loop, before passing command to <cfloop>.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Apr 19, 2020 Apr 19, 2020

Copy link to clipboard

Copied

LATEST

A bit longwinded reply....

 

Each transaction is independent and requires a call to Stripe's API to process. I was originally thinking about the cflocation strategy although less efficient as a way to load one transaction, process it, delete it from the table then restart the whole process. I've changed that to use cfloop and when we build the table to be processed we're setting a pending flag in SQL then on the page that processes the transactions inside of the cfloop I update a flag in SQL table to processing, call the Stripe API, the update a complete flag. This adds two queries inside the loop but now we know what's pending, what's processed and what transaction did it fail on if something goes wrong. I also removed the code from the processing page that created the customer invoice PDF and emailed it to be it's own page the checks the completed flag. This greatly reduced processing time on the Stripe API page. I've used FusionReactor to time 50 transactions to get a low, high, and average time to set RequestTimeout to an appropriate number with plenty of padding.

 

My biggest concern is making sure we never call the Stripe API twice on the same transaction since that would charge the customer twice. 

 

Besides timeouts, we have a very random problem where in the middle of processing code the page just reloads and starts over. Originally we tied that problem to just certain releases of Firefox but that wasn't it. Charlie and I have dug into it and haven't solved it 100%. I am now running FusionReactor to trace this problem better. It just adds another layer of making me concerned that I've done everything possible to make sure the customer isn't double charged.

 

Gary

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation