I can't figure out what's causing these random errors. They happen 20-40 time/day. The Application server is running CF2018 ENT, and the SQL server is MS SQL 2017. Nothing shows up in the SQL logs.
- Error Executing Database Query. [Macromedia][SQLServer JDBC Driver]Your connection was terminated.
- Error Executing Database Query. [Macromedia][SQLServer JDBC Driver]Connection reset by peer: socket write error
Perhaps caused by problems with indexes in SQL Server. If so, then it would help if you rebuilt the indexes.
The indexes are fine and the SQL response time is good overall.
Shannon, you will perhaps have found the many, many instances of people on the web having reported this error, and all kinds of ideas floated about (many from a long time ago). I'm not finding any of the top several that show a good answer (of why it happens, or how to fix it).
I will say that in my daily troubleshooting the past 14 years, I have not seen it, so there would seem something rather unique.
But I will point out that one made a suggestion that is at least worth considering: did you know that for many years, there has been the option in the CF Admin, in the "advance settings" for a datasource, to specify a "validation query"? Some never use it, some always use it. It's purpose is not well-documented, but bottom line it has helped with some problems. For SQL Server (and many databases), it can be as simple as "select 1".
And yes, it's natural to fear "that's got to add a lot of overhead". No, it does not. It's used when the connection is opened, which is not "every query". And even if it were on "every request" that does some queries, well you're going to open the connection anyway, which will need to talk to the db. Having it do this one trivial query will not add burden. And if it solves your problem, that may be what matters. (With the right tools, you can also confirm that it adds negligble overhead, if someone presses that point.)
One last quick question: are you just finding these errors in the CF logs, or are real users getting the error? I ask because if it only in the logs, note that the errors could be happening only with some unique requests, which may be as much about the clients making the request, or the nature of the SQL generated for their request, etc.
Let us know how it goes.
Thanks for the update, and interesting to hear what you found (that validation query "helped" in stopping the error, but added a delay in some cases). Is it as first reported, only a very small subset of request in the day?
Either way, yep, you could pursue the tweaks the BKBK (and today Dave) have proposed. Sometimes, if you turn the right knobs enough times you'll get a happy conclusion. Personally, I'd prefer to get diagnostic insight into the problem, and really solve it (because in my experience, CF and/or its config is not always the real cause of the problem).
Indeed, in this case, it seems it could as well be a config issue in SQL Server--or an indication of some hangup in/on the SQL Server machine which is only occasional, and may have some cause that is totally unrelated to CF. Finding that takes work, of course. And most important, it takes knowing what's going on in SQL Server at the moment of those connection hangups you see. There are various tools to help with that, of course (some free, some paid).
Finally, I will also add that if you may wonder a bit more about what it means that the validation query did "sometimes slow down", the best explanation I've found is not in the docs but in a post from Steven Erat, another classic CF troubleshooter (who worked then for Adobe). See his post, and especially the next to last paragraph that shows what you likely were experiencing regarding the slowness, but which again seems to support my contention above.
Do the errors simply stop and application just works again or do you perform an action to rectify issue?
Could issue be a network problem - how about run "ping -t" to see if traffic terminates when JDBC fails?
The errors are randomly generated throughout the day. After the error occurs, subsequent request start working without intervention. Our networking team is monitoring the traffic for packet loss and other possible issues.
Hi Shannon, it just might be that datasource connections are being dropped. Legitimately because they have remained inactive for too long. You can easily test this hypothesis.
Open the ColdFusion Administrator and go to the Datasources page. For each datasource, change the configuration as follows:
Here's what I'd suggest.
1. Try disabling JDBC pooling entirely. You can do this in the CF Admin by unchecking "Maintain Connections". This will keep you from having any JDBC pooled connections at all, and will almost certainly fix your immediate problem. On the downside - and there's always a downside right? - your overall database interactivity will be slower, perhaps much slower. But this will fix you in production.
2. Play around with the "limit connections" and "timeout" settings. I've had very good results by limiting database connections to a relatively small pool, say 20 or 40 (note that without more detailed metrics I can't tell you whether those numbers will be the right ones for you). This will kill older connections when you exceed this number, and create new ones in their place. Some databases (mostly Oracle in my experience but occasionally SQL Server) will benefit from a smaller pool. Note that the timeout on your local side may cause problems with the corresponding timeout on the other side, and this'll cause your pooled connection to fail. So, you might reduce the timeout and set a lower connection amount. Or you might increase the timeout and still set a lower connection amount, or not even change the timeout at all. Note that if there's no value in the "limit connections" field, CF will keep creating new connections without killing old ones.
Dave Watts, Eidolon LLC