Copy link to clipboard
Copied
Hi there,
I am running a ColdFusion server on a VPS to HostMySite.com and lately we are having strange problems with it. Every few minutes the server is not responding even if we restart the services - IIS and ColdFusion - and even the whole system.
HMS guys investigated the problem and they discovered that every connection to the server is openening multiple sockets for a single IP address (every single visitor).
Here is the full message from HMS technician, do you have any previous experience related to this?
I've been doing some advanced monitoring and troubleshooting of your VPS over the last 24 hours.
It is important to understand that the issue you're actually experiencing is related to TCP sockets. Every connection to your server opens a socket and sometimes multiple sockets for an individual IP (visitor).
I opened the site http://www.viaromania.eu/ and instantly there were 7 connections established from our IP address.
C:\Documents and Settings\hmsadmin>netstat -ano | find "209.41.163.23"
TCP 76.12.37.79:80 209.41.163.23:9563 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:21164 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:26819 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:36833 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:37624 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:39566 ESTABLISHED 4
TCP 76.12.37.79:3389 209.41.163.23:2577 ESTABLISHED 141388
After just browsing around a few pages on the site you can see how my connections are expanding.
C:\Documents and Settings\hmsadmin>netstat -ano | find "209.41.163.23"
TCP 76.12.37.79:80 209.41.163.23:2852 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:2900 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:11014 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:11178 TIME_WAIT 0
TCP 76.12.37.79:80 209.41.163.23:14107 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:14248 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:17177 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:17606 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:17930 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:23460 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:24594 TIME_WAIT 0
TCP 76.12.37.79:80 209.41.163.23:25191 TIME_WAIT 0
TCP 76.12.37.79:80 209.41.163.23:25507 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:32301 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:33591 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:37338 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:38404 TIME_WAIT 0
TCP 76.12.37.79:80 209.41.163.23:45140 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:49734 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:53755 ESTABLISHED 4
TCP 76.12.37.79:80 209.41.163.23:55735 TIME_WAIT 0
TCP 76.12.37.79:3389 209.41.163.23:2577 ESTABLISHED 141388
Over the last 2 days there are 205 coldfusion-out logs and they are all full of the same error:
java.net.SocketException: Software caused connection abort: socket write error
Normally when we see this we'll make a few registry adjustments that allow for more socket connections and a shorter time to live on existing socket connections. However in your case all of the registry adjustments have already been set.
MaxUserPort 65534
TcpNumConnections 200 connections
TcpTimedWaitDelay 30 seconds
I adjusted the TcpNumConnections to 500, see if this alleviates the issue. Note that allowing 500 Tcp Connections is not necessarily a good idea as this amount of traffic could theoretically bring down your server.
I created a scheduled task that executes every 60 seconds in which it counts the connections on port 80 and writes it to the file netstat.txt on the desktop.
After logging for the last 24 hours it has gone over the 500 TCP connections 19 times all between 2:21pm and 2:40pm
2:21 PM 1367
2:22 PM 1423
2:24 PM 1684
2:25 PM 1466
2:26 PM 1867so
2:27 PM 1250
2:28 PM 854
2:29 PM 796
2:30 PM 799
2:31 PM 794
2:32 PM 816
2:33 PM 730
2:34 PM 662
2:35 PM 524
2:36 PM 531
2:37 PM 539
2:38 PM 551
2:39 PM 551
2:40 PM 522
So this is pretty good news. This means your site over the last 24 hours only had 19 minutes of issues due to TCP connections.
Please, post your messages if you know why so many sockets are opened for every single IP and if this is a normal behaviour.
Greetings,
Adrian.
Copy link to clipboard
Copied
I am trying to look at some stuff on your site, but that is kind of hard as it is down. But from the 3 requests to the site that worked it appears as if you are routing pretty much everything through a /cazare.cfm template. Not just HTML, but images, JS and CSS as well. That is quite likely a big part of what is causing the problem. ColdFusion really isn't very efficient in serving static content in the first place, and on top of that it is really bad at working together with the caching mechanisms in the browser. If possible, move your static content somewhere where it does not get processed by CF. And if you can't, make sure to add some expires headers and the like to your assets to they browser does some minimal caching and will not request every asset for each request.
Copy link to clipboard
Copied
Hi Jochen and thank you for your answer.
We are using "cazare.cfm" for all the screens listing hotels and guest houses from a specific location. For instance:
http://www.viaromania.eu/cazare.cfm/Bucuresti/2-Cazare_hoteluri_pensiuni_Bucuresti.html - accommodation in Bucharest
http://www.viaromania.eu/cazare.cfm/Brasov/1-Cazare_hoteluri_pensiuni_Brasov.html - accommodation in Brasov
And so on...
I don't know if this is bad or not, but our code is using heavely <cfinclude> tag so we can keep files easy to debug and avoiding big .CFM files. I don't remember reading somewhere that <cfinclude> can cause any dealays in page loading or any server performance... maybe you can tell me if this is a bad thing or not.
After reading your post I tried to use chaced .CSS files so instead of "general.CSS" file included in the header I am using now http://www.viaromania.eu/includes/css/general.CFM and this file content is like this:
<cfset dtExpires = (Now() + 1) />
<cfset strExpires = GetHTTPTimeString( dtExpires ) />
<cfheader name="expires" value="#strExpires#" />
<cfcontent type="text/css" />
<cfoutput>
... css content here
</cfoutput>
I tried to do a similar change to "common.JS" file but so far with no luck. If you know any tutorial or something about chaching .JS files please send me the link. Anyhow, I think our problem is somehow related to the session variables. I noticed that for every single visitor we have, CF is creating 4 session variables: CFID, CFTOKEN + other 2 (I miss their name now). So for 1,000 visitors you have minimum 4,000 session variables created. Then I did this: enabled Use J2EE session variables option in CF Admin and get rid of CFID and CFTOKEN session. I am using now SESSIONID to identify my visitors. So, basically instead of having 4,000 sessions I have now only half of them.
After chaching the .CSS and enabled J2EE session variables the server started to work better. I don't know if there is just a happy coincidence or those steps were necessary but the server is ok now.
Please let me know what do you think and what else can I do in order to improve server performance. Any idea how to chache .JS files?
Adrian.
Copy link to clipboard
Copied
We are still having problems...
Too many HTTP requests are opened for each visitor we have and we have over 2,000 TCP connections opened simultaneous. Please, check the attached screenshots.
Does anyone knows if this is normal? Notice how many TCP connections are oepened for a single external IP.
Also, check how the CF monitor looks
Copy link to clipboard
Copied
MacLaeod wrote on 1/27/2010 11:01 AM:
Too many HTTP requests are opened for each visitor we have and we have over 2,000 TCP connections opened simultaneous. Please, check the attached screenshots.
Does anyone knows if this is normal? Notice how many TCP connections are oepened for a single external IP.
That number of connections for an external IP does not really mean
anything if you don't know how many people are behind it. It could be
one person, or a proxy for 3000.
I have been looking at your site a bit using Chrome's speed tracer and
the two things that stand out most on an HTTP level (and hence on a TCP
level) are:
- you have no caching set up for your /images/* folder. See
http://www.mnot.net/cache_docs/ for a decent explanation. I would
recommend both a max-age and a public setting. For an IIS manual see
http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/0fc16fe7-be45-4033-a5aa-d7fda3c993ff.mspx?mfr=true
- you are disabling HTTP caching of some of your CSS by changing the URL
every time. /livezilla/templates/style.css?cache=0.6557160455307057 is
not cacheable if you change the number on every request.
Try out Chrome's Speed Tracer yourself and make sure you fix everything
it flags as critical.
Copy link to clipboard
Copied
Thanks Jochem for your time and answers. Meanwhile, I read about caching in IIS and I setup 31 days expiration time for IMAGES / CSS / JS folders. Website seems to be faster now, I hope this will fix some of the performance issues we are facing with.
Copy link to clipboard
Copied
We are still experiencing problems since we moved the mySQL database to the same machine. Not sure what is the cause, I am trying to review and optimize every single query on the pages. Meanwhile, since Google gives more attention to speed we dropped few pages Like for instance "Agentie de turism" is now on the 4th page or something...