Too many open socket connections causing ColdFusion to crash?
I’m currently working on an e-commerce site which sends and receives information to/from the client’s order management system via XML over a TCP/IP socket. It uses a very old java-based custom tag called CFX_JSOCKET (which appears to have been written in 2002) to open the socket, send the data, and get the response. The code that calls the custom tag and sends/receives data from the OMS pre-dates my working on the site, but its always worked, so I haven’t paid it much attention.
Back in the summer of 2009 we started experiencing issues with ColdFusion (v.7 on Window 2003 at the time) locking up on a more and more frequent basis, until it ultimately became a daily issue. After extensive research we narrowed the issue down to the communication between the web server and our client’s order management server. It seemed the issue with ColdFusion hanging was either related to there being too many connections open, or to these connections hanging and resulting in dead threads. This an educated guess based on a blog post I’d seen online, not actual monitoring of either CF or the TCP/IP connections. As soon as we dialed back the timeout on the CFX_JSOCKET tag from 20 seconds to 10, the issue disappeared, so we left it at that and moved on.
Fast forward to this January. The site is hosted at a new location, on a 64-bit Windows 2008 box running ColdFusion 9. Over the years traffic on the site has continued to grow. The nature of the clients business means that August and January are their business times of the year (back to school for college kids) and in January ColdFusion once again started locking up on an almost-daily basis.
One significant difference is that the address cleansing software that previously ran on the box and was used to verify shipping addresses is not available for 64-bit, so when we moved to the new server last summer, that task was moved to the client’s order management software and handled via XML like all other interaction with that system. However, while most XML calls to that server (order input, inventory check, etc) take under a second to complete, the address cleansing call regularly takes over 5 seconds to return data, and frequently times out.
Once we eliminated the address cleansing call from the checkout process, ColdFusion once again stopped locking up regularly. So it appears that once again it’s the communication between the web server and the order management server that’s causing problems. We currently have that address cleansing call disabled on the web site in order to keep ColdFusion from crashing, but that’s not a long term solution.
We don’t have, nor can I find online, the source code for the CFX_JSOCKET custom tag, so I decided I’d write some CF code utilizing the java methods to open the socket, send the data, get the response, and close the connection. My test code is working fine (under no load). However, in trying to troubleshoot an issue I had with it, I started monitoring the TCP/IP connections using TCPView. And I noticed that all the connections to the order management server, whether opened via the custom tag or my new code, remain open in either a TIME_WAIT or FIN_WAIT2 status for well over 2 minutes, even though I know for a fact that my new code is definitely closing the connection from the web server side.
They do all close eventually, but I’m wondering 1. Why they’re remaining open that long; 2. Is that normal; and 3. If all these connections remaining open could be what’s causing ColdFusion to choke.
Does this sound plausible? If so, does anyone have any suggestions/recommendations about how to fix it? My research seems to indicate this might be a matter of the order management system not closing the connection on its end, but I’m in way over my head, and before I go to client and tell them it’s their OMS causing the issue, I need to feel a little more confident that I’m on the right track.
Any help or advice would be very greatly appreciated. And thanks for taking the time to read through my long-winded explanation of the problem.
Set-up details:
ColdFusion Version: 9,0,0,251028 Standard
Operating System: Windows Server 2008
Java Version: 1.6.0_14
Java VM Name: Java HotSpot(TM) 64-Bit Server VM
Java VM Version: 14.0-b16
Thanks,
Laurie
