Copy link to clipboard
Copied
I am reviewing the ColdFusion Web Connector settings in workers.properties
to hopefully address a sporadic response time issue.
I've been advised to inspect the output from the metrics.log
file (CF Admin > Debugging & Logging > Debug Output Settings > Enable Metric Logging) and use this to inform the adjustments to the settings max_reuse_connections
, connection_pool_size
and connection_pool_timeout
.
My question is: How do I interpret the metrics.log
output to help with the choice of setting values? Is there any documentation that can guide me?
Examples from over a 120 hour period:
95% of entries -
"Information","scheduler-2","06/16/14","08:09:04",,"Max threads: 150 Current thread count: 4 Current thread busy: 0 Max processing time: 83425 Request count: 9072 Error count: 72 Bytes received: 1649 Bytes sent: 22768583 Free memory: 124252584 Total memory: 1055326208 Active Sessions: 1396"
Occurred once -
"Information","scheduler-2","06/13/14","14:20:22",,"Max threads: 150 Current thread count: 10 Current thread busy: 5 Max processing time: 2338 Request count: 21 Error count: 4 Bytes received: 155 Bytes sent: 139798 Free memory: 114920208 Total memory: 1053097984 Active Sessions: 6899"
Environment:
- 3 x Windows 2008 R2 (hardware load balanced)
- ColdFusion 10 (update 12)
- Apache 2.2.21
Copy link to clipboard
Copied
In case others have an interest in the results of this thread but it doesn't get answered here, this question was initially asked yesterday over on Stackoverflow. Wish I had an answer for you @Richard.
-Carl V.
Copy link to clipboard
Copied
@Carl - Yes, I raised it there first hoping for a response. When I got none I thought I'd spread the coverage by repeating it here 🙂
Copy link to clipboard
Copied
@Carl V - Yes, of course you're right, I'm forgetting my net-etiquette 🙂
I'll surely feedback here what I find.
Copy link to clipboard
Copied
No problem. Maybe just mention the cross-posting next time so people on different forums don't work at cross-purposes. Good luck! And by the way, I'm interested in seeing if anyone responds as you've piqued my curiosity.
-Carl V.
Copy link to clipboard
Copied
Hi Richard,
I used to like metrics logging CF9 and before however since CF10 have not found it overly useful. Note to self go check metric logging in CF11. Well then what can you do to know what is going on? Use other tools.
I generally think performance gains can be found by making connector adjustments in CF10 and CF11. There was a recent blog entry that relates and tho covering CF11 / IIS in topic I think subject matter also applies to CF10 and Apache.
http://blogs.coldfusion.com/post.cfm/coldfusion-11-iis-connector-tuning
Keep in mind my daily environment is Windows IIS where as you use Apache. I have a done a small amount of testing on Apache and think worker.properties and server.xml changes are similarly important.
I discussed my thoughts on connector adjustments here and also demonstrate use of tools to know what is going on.
http://experts.adobeconnect.com/p8l51p4s9m4/
HTH, Carl M.
Copy link to clipboard
Copied
@Carl M - If you see the last half dozen comments on that ColdFusion blog you'll see me having a "conversation" with Anit Kumar Panda of Adobe. Anit has been very supportive and has offered to call me to discuss my issue. I'll feedback the results here.
I have watched your Connect session several times. Most helpful, thanks. It gave me the insight, via JConsole, to see the results of changes to connection_pool_size
/ maxConnections
and connection_pool_timeout
/ keepAliveTimeout
. I'm guessing that I won't see any entry in JConsole for max_reuse_connections
as my understanding is that this setting is part of Adobe's customisation of mod_jk
?
My issues still stands. I now have lots of information (metrics.log and JConsole output) but no knowledge of how to use it to inform worker.properties
/ server.xml
settings.
I'll capture a day's worth of currentThreadCount
/ currentThreadBusy
chart data and post it back here with my worker.properties
/ server.xml
settings later today to aid the debate.
Copy link to clipboard
Copied
As promised, here are some screen grabs from a good period of activity...
The worker.properties
and server.xml
files for the instance are unchanged from the default install (max_reuse_connections=250
only).
My interpretation of this is that thread count reached a maximum of around 50 which is well within the maximum allowed of 200.
The result I can't explain is that the thread busy count seems to be dropping as soon as the thread is no longer in use. The Tomcat Connector docs (http://tomcat.apache.org/connectors-doc/reference/workers.html) state that the default value is indefinite (zero). I've not added a value for connection_pool_timeout
so I'm guess the Adobe mod_jk
is adding the -1
for keepAliveTimeout
which I'm also guessing mean "drop as soon as unused"?
I hope this helps inform someone who might be able to make some sense out of all this! 🙂
Richard
Copy link to clipboard
Copied
Hi Richard,
See how your currentThreadCount drops to 10. With that much load I expect you could do better by having a greater value for connection_pool_minsize in workers.properties and minSpareThreads in server.xml which when not specified is 10 by default.
Interesting observation on keepAliveTimeout, as you say drops the thread straight away. Not sure how -1 gets in there since as you point out tomcat documents say otherwise. Perhaps something special about the Adobe CF modified tomcat.
Specifying a value for connection_pool_minsize, minSpareThreads and connection_pool_timeout, connectionTimeout could be a good idea tho you say the application usage has been fine over this period. Rather than making a change yet perhaps it would be better to know what those values or charts look like when the system is performing badly.
Pardon me I did not go back to the CF blog on CF11 IIS connector matter and read the comments.
Something new that I had a thought on recently was can slowness creep in over months with the unmanaged nature of CF10/11 webserver connector log file? In Windows IIS environment isapi_redirect.log found in CF10,11\config\wsconfig\n
can become quite large. Unless some process is invoked to say manually update connector via running WSCONFIG tool which will recreate log file one would need to delete the contents of log yourself. Curious to know how large is your mod_jk.log file?
Regards, Carl M.
Copy link to clipboard
Copied
Hi Carl,
Thanks for your time with this.
Your observations and suggestions on connection_pool_minsize, minSpareThreads and connection_pool_timeout, connectionTimeout raises a question I've had in my mind for some time.
Why release the threads at all? Why not wack them up high and forget about it?
My justification for this thought is that, assumably, when a request has finished with the thread it's marked as unused and can be adopted by a new request that needs a thread. Why drop it as, in time, if it's going to be opened again by the process when it has run out of free threads. Or have I misunderstood the technology?
Your thoughts on that would be appreciated.
One thing I should mention, which I should have noted at the beginning, is that this instance is supporting about 30 sites with one site being dominant. If that has any bearing?
You've read my mind about the size of the mod_jk.log file. I had noticed earlier in the week that it had reached > 200MB since Nov '13. I'd considered that it was only a write only file and that the connector had no interest in reading the content and therefore wasn't a performance issue. That said I'm trialling log rotation in our pre-production environments as a way to make the files easier to inspect and host on our logging drive. This is what I've added to our Apache config...
# mod_jk logging level [trace/debug/error/warn/info] and log file location with rotation
JkLogLevel info
JkLogFile "|/path/to/rotatelogs.exe -f /path/to/mod_jk.%Y-%m-%d.log 86400"
...which overwrites the entry in the /path/to/Apache/conf/mod_jk.conf file...
# Where to put jk logs
JkLogFile "\path\to\coldfusion10\config\wsconfig\1\mod_jk.log"
Anyway, as you say, now that we have the connection to JConsole established we're better placed to review the threads when we have another slow down. I'll post back here when I've got some slow site data.
Copy link to clipboard
Copied
Hi Richard,
The way I see it, no point to release thread only to have to make a new thread again since tomcat can use a thread that has been marked unused. Make enough threads to begin with to cope with the load.
Your thread graph shows 50 maximum used. I do not see any reason not to set minimum setting to 50 so all the threads are present ready to take work.
Connector log size is only a recent idea I had. I have not done any load testing or performance monitoring to compare timings of similar workload for large connector log versa maintained or small connector log. Neither have I done any searching in tomcat documents or forums to get a sense if log size could be an issue. Just a bit too busy right now to get the opportunity. Certainly I would be interested to know if your maintained connector log that you have implemented helps with performance.
Since you have Jconsole running hopefully it is showing Java memory working and garbage collecting in good tolerances.
For interested readers the slide deck presentation I did is still available
www.tassweb.com.au\cf10tomcat.pdf
Regards, Carl M.
Copy link to clipboard
Copied
I know the peak topped out at 50 the other day but why stop there?
Why not 100, why not 150, why not match the connection_pool_size value? What are the consequences of a large number? These are the questions I'm struggling to find documented answers to.
We've decide to add the config to rotation the mod_jk.log file now and then wait until we get our next slowdown event and then review the JConsole thread count at that time and then consider making changes at that time.
Can you point to a resource or define that does "good tolerances" look like?
Regards,
Richard
Copy link to clipboard
Copied
So we had an event this morning and this time we were ready!
Around 10am I pushed the Apache change to rotate the mod_jk log files, as described earlier, to production on all three boxes and restarted Apache.
About 20 minutes later all the servers experienced a slowdown. A quick look at the threads didn't seem to indicate any issues...
We left it for a while, sometimes it rights itself but not on this occasion.
Around 10:28 we restarted Apache again on all three servers. In the past this has helped resolve any slowdown issue.
No joy 😞
More discussion and examination of log files. Nothing looked out of order.
So around 11:15 we started ColdFusion on all three servers. The penultimate sanction.
Still no improvement.
At 11:30 we decided to take the ultimate option and restart all three boxes.
After while, when all the servers had settled down response times returned to normal and the thread chart now looks like this...
Conclusions...
???????????????
You'd think that restarting Apache would solve any thread issue there or with mod_jk, not that we were seeing anything but no.
Restarting ColdFusion would have cleared any long running requests or memory issues but we weren't seeing anything to believe that was the problem.
Only restarting all the actual servers cleared the problem. We have experienced that before as the only solution to clear an increasing slowdown event.
I'll lob this record of events out there in the hope that some enlightened person will be able to say, "Ah, yes, I know what that is, you need to........"
In desperation and frustration,
Richard
Copy link to clipboard
Copied
Hi Richard,
Wow that does sound painful, restart server only solution. I wanted to discuss more about connector logs and threads but while some gains might be made there it does not seem that those are the major issue.
Where is the database in all this perhaps on one of the servers you restart? Does Windows task manager or resource monitor show a busy CPU or full RAM issue? Lastly any useful log details recorded in eventvwr.msc?
Sorry more questions than answers, Carl M.
Copy link to clipboard
Copied
@Carl - Yes, not ideal but like I said, our issues are normally resolved by an Apache restart, occasionally a ColdFusion restart and on the rare occasion a server restart.
The database is on it's own cluster, not on the web/ColdFusion servers.
Task Manager never seems to show any undue stress in these situations.
Can't see anything untoward in Event Viewer but I'll try and get our server guys to give them the once over.
Having recently read Russ Michael's insightful blog post http://www.michaels.me.uk/post.cfm/why-coldfusion-railo-are-not-suited-to-shared-hosting we're thinking about moving our primary domain onto it's own instance and leaving these secondary domains on a "shared hosting" instance.
That way we might be able isolate any coding issues that might be the root cause for the problem.
Richard
Copy link to clipboard
Copied
Hey Richard,
I have seen restarting individual components by themselves not help.
There are 3 components involved here
web server (Apache in your case)
CF (tomcat)
and java
try alo killing java and tomcat processes from task manager if on windows, or whatever the equivalent is on windows. This also makes restarting cf quicker.
Copy link to clipboard
Copied
Hi Russ,
Humm, like I said, for whatever problem we have, more often restarting Apache works for us, occasionally we have to add a ColdFusion restart to get things working again and rarely we have to go as far as restarting the server itself.
I'm guessing you mean the ColdFusion service when you refer to killing "tomcat" but I can't see a service listed that looks like Java on it's own. Can you advise what I should be looking for?
Richard
Copy link to clipboard
Copied
Richard,
Look for "java.exe" and "jetty.exe" in the processes tab, and look at the user name to make sure it's the same user account you configured the ColdFusion services to run under. You probably don't need to kill the "jetty.exe" one, just the "java.exe".
-Carl V.
Copy link to clipboard
Copied
@Carl V - Yes, of course you're right. I was "having a moment" there and was looking in the list of "Services" rather than "Task Manager" for "processes". D'oh!
@Russ - Yes, we're on Windows. We're enjoying the flexibility and features of Apache but there has been talk about considering IIS. First we have to prove that Apache is the issue before considering the effort involved in changing.
This is outside of my experience but I assume than BonCode is IIS only? Is there an alternative to the Adobe mod_jk?
Richard
Copy link to clipboard
Copied
Richard,
Hmmmmm. I was wrong about the process names.
Java.exe and jetty.exe are tied to the "ColdFusion 10 Jetty Service". I can't remember if that is just for the remote start/stop of ColdFusion or if it powers the SOLR search stuff.
Coldfusionsvc.exe and coldfusion.exe are tied to the "ColdFusion 10 Application Server". ColdFusionsvc.exe is the service controller, while coldfusion.exe is the actual JVM process for ColdFusion (as verified the fact that the memory footprint of this seems to match my JVM settings for ColdFusion).
-Carl V.
Copy link to clipboard
Copied
I haven;t read the whole thread, so sorry if I am repeating stuff.
But if restarting Apache solves it most of the time, this does sound more like a connector issue.
As you mention SERVICES, I presume you are on windows, in which case, have you tried using IIS instead ? I know most Apache fans balk at this idea as they have a deep seated belief that IIS sucks and that Apache is much better, but this has not been the case for a long time.
We don;t use CF10/11 on our production systems as I have seen too many issues with them and no-one has asked. But I run CF10 on cfmldeveloper.com (hundreds of sites) using the same Boncode connector that I use for Railo, I use this as it solves a number of issues with the native connector, and as yet have not had any issues with that,
If you want to give that a try, here are some instructions.
http://www.michaels.me.uk/post.cfm/running-railo-and-coldfusion-10-side-by-side-with-boncode