Skip to main content
Inspiring
October 19, 2011
Question

Diagnosing CF Server Hangs

  • October 19, 2011
  • 3 replies
  • 9786 views

We are running the latest CF 9 server running JVM 1.6_26 on a Win2003 server with an i7 processor and 8GB of ram.

Here is the JRun config:

java.args=-server  -Xms4096m -Xmx4096m -Dsun.io.useCanonCaches=false -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseParallelGC -Dsun.rmi.dgc.client.gcInterval=600000 -Dsun.rmi.dgc.server.gcInterval=600000 -Dcoldfusion.sessioncookie.httponly=true -XX:NewRatio=3 -Xbatch -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dcoldfusion.classPath={application.home}/../lib/updates,{application.home}/../lib,{application.home}/../gateway/lib/,{application.home}/../wwwroot/WEB-INF/flex/jars,{application.home}/../wwwroot/WEB-INF/cfform/jars

For the past few weeks, every couple of days the CF server grinds to a halt.

Using SeeFusion we can monitor the requests and see them just starting to stack up.

We are typically alerted to the brewing problem when our application starts sending notices

that SESSION variables are undefined. The interesting part is that typically the line

where the error occurs is after the variable has been checked if its defined:

For instance:

<cfif NOT IsDefined("SESSION.User")>

     <cflocation url="somewhere">

</cfif>

Hi <cfoutput>#SESSION.User.GetUsername()#</cfoutput>

Reports an error USER IS UNDEFINED IN SESSION on the output line AFTER the variable has been checked for existence meaning

to me that somewhere in the middle of processing the thread, memory is getting screwed up.

Anyway, after starting to see random errors like this we log into SeeFusion and see that

memory usage is running at about 85% and simple page requests are stacking up.

I can force a full GC cleanup in milliseconds but it doesn't do anything for memory usage.

The page response times begin to climb.

At first we though it might be some long running page or report on our site but looking at the actively

running requests we see nothing intensive which could be causing the issue. Looking at the

task manager, the processes on the server are all running at 0% execpt for JRun which is hovering around

15% to 18%.

The problem isn't in the database either. Our MySQL database shows no long running queries, hung processes, or crashed tables

the application could be stalling over.

All of thisleads up to the site slowing to a crawl and then becoming completely unresponsive while JRun chugs along

at 15% and memory never maxes out. This never causes any errors in the logs ie memory heap errors or connection timeouts.

Its just crawls along. I've never let it sit in this state for more than 5 or 10 minutes so I don't know if it would eventually come back.

The only way so far to bring it back it to restart the CF server at which point everything returns to normal.

In other types of situations like this I've seen JRun peg out at 100% or more percent or memroy is pegged at 100% with an eventual heap error

or the database is locked up causing the app problems. But none of that happens here.

I'm truly stuck as to how to continue to diagnose and fix this problem.

Any help would be awesome. Thanks.

    This topic has been closed for replies.

    3 replies

    Known Participant
    November 24, 2011

    Do you by any chance use the CFImage tag? I've been having this problem for several years with CF8 and just set up a brand new 64-bit server with CF9 and within an hour after starting it up, it is also locking up. I am a heavy user of CFImage.

    WebPexDevAuthor
    Inspiring
    November 29, 2011

    No we don't use the cfimage tag.

    I went though Carls webcast on tuning the JVM and put a number of the tweaks and settings to use after profiling live traffic for a few weeks.

    While where was some some improvement, the problem continues to persist. Memory slowly creeps upwards and then hovers at the 90%+ usage

    even off peak hours with very little traffic. At that point it is a matter of time before the right traffic spike kicks the server into unresponsive mode.

    It seems that either something we are doing is being held in memory and never released or there is a problem with the garbage collector.

    Forcing manual garbage collection never seems to reclaim any memory.

    In the short term we have just setup cron jobs to restart the CF server every night during off peak hours which frees up memory again for the next day.

    Not ideal in any sense but working in the short term.

    Legend
    November 29, 2011

    Hi,

    The total memory reaches maximum and forced major GC from seefusion does not release total memory. It would be interesting to know how much of the memory is committed with objects versa how much is free (that is not holding objects however being allocated to Java / CF).

    The JVM log details would indicate such tho can be hard to understand. You have JDK installed so you may well benefit by performing some Jconsol or Jvisualvm analysis found in \Java\jdk1.6.0_26\bin. To do that add these to JVM args: 

    -Dcom.sun.management.jmxremote.port=N (N=port number)

    -Dcom.sun.management.jmxremote.ssl=false

    -Dcom.sun.management.jmxremote.authenticate=false

    EG:

    JVM args= ...-XX:+UseParallelGC -Dcom.sun.management.jmxremote.port=8705 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dsun.rmi.dgc.client.gcInterval=600000 etc. Reader to backup JVM.CONFIG and make appropriate modifications without CR line feeds and so forth. Use example here as reference.

    Of interest will be Jconsol Memory tab and Jvisualvm heap and permgen chart, paste such to the thread. Caveat - while jmxremote jvm args are present you will not be able to stop the CF application service from SERVICES.MSC you will need to kill Jrun.exe task. I worry your cron restart mentioned might fail.

    Jconsol and Jvisualvm also have a full GC button which might be worth a try however I expect the seefusion GC will be performing the same task.

    Your worried the garbage collector is not working and indeed you can change GC algorithm entirely however before offering such a suggestion I would like to see some of the JVM log or jmxremote details before and when the memory reaches 90%.

    One other thing Win03 and JDK are 64 bit?

    HTH, Carl.

    Legend
    October 24, 2011

    Hi,

    Other than what has been mentioned I think three other things may help:
    -JVM log details
    -client variables
    -other logs

    While JVM logging will not resolve matter it may show if one of the java generations is having a problem and you can apply adjustment to cope with particular load.

    JVM args= ...-XX:+UseParallelGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -verbose:gc -Xloggc:cfjvmGC.log -Dsun.rmi.dgc.client.gcInterval=600000 etc. Reader to backup JVM.CONFIG and make appropriate modifications without CR line feeds and so forth. Use example here as reference.

    Creates a log file in ColdFusion9\runtime\bin\cfjvmgc.log or Jrun4\bin\ in case multiserver.

    Read the log file or use a graphical tool to help eg GCViewer tool from:
    http://www.tagtraum.com/gcviewer.html


    Adobe recommends storing client variable in a database see:

    http://help.adobe.com/en_US/ColdFusion/9.0/Admin/WSc3ff6d0ea77859461172e0811cbf3638e6-7ffc.html#WSE012D66A-E6D8-4dab-BAEC-35856D8EB780

    One of Charlies blog references to check Runtime Jrun logs; this is often worthwhile so I would like to add emphasis to that by way of repetition. Read the log files coldfusion-event and coldfusion-out in \ColdFusion9\runtime\logs (or Jrun4\logs\ ). Examine these for possible hang causes with java.lang.OutOfMemoryError or java.lang.StackOverflowError messages.

    HTH, Carl.

    WebPexDevAuthor
    Inspiring
    October 24, 2011

    I'll try enabling logging as you have suggested.

    I believe as you have suggested the problem may be in one of the java generations. As I mentioned, the server logs don't show any memory or stack errors when the machine goes unresponsive. Using Seefusion I've watched the memory climb on the machine from a constant 80% to up to 98% while still running and stay there indefinetly.. Interestingly, Seefusion shows that active requests, queries, and pages per second all remain fairly constant and relativly modest. Using Seefusion to force a GC recovers no memory. Our application makes heavy use of CFCs and OO structures. It makes me think that something (or many things) somehwere are being retained in memory and not being properly released.

    Any thoughts about how to confirm or deny this?

    ps We don't use client variables

    Legend
    October 24, 2011

    I did a talk last year at CFMeetup - a great resource hosted care of Charlie - on CF JVM and logging related matters. Think you may benefit by reviewing the session:

    http://experts.adobeconnect.com/p55663036/

    Hope helps again, Carl.

    Charlie Arehart
    Community Expert
    Community Expert
    October 20, 2011

    You can find out what the requests are “stuck doing” using the stack trace feature which is available in SeeFusion (or FusionReactor, or the CF Server Monitor), and which can tell you the line of code that a CF page is running at any moment. Since you’ve ruled out many of the other common things (outofmemory errors in the logs, etc.), this seems your best next bet. See what all the hung requests are doing (particularly as a given request remains hung and repeated stack traces show the same line of code.)

    I discuss this more in some resources:

    http://www.carehart.org/blog/client/index.cfm/2009/6/24/easier_thread_dumps

    http://www.carehart.org/blog/client/index.cfm/2010/10/15/Lies_damned_lies_and_CF_timeouts

    (see the section “The underlying solution: stack tracing”)

    http://carehart.org/presentations/#stack

    Hope that helps. Of course, you can also enlist the help of someone who does such CF server troubleshooting for a living. I do (http://www.carehart.org/consulting/), as do others, which I list as a category in my CF411 site: http://www.cf411.com/cfconsult. Hope that’s helpful.

    /charlie

    /Charlie (troubleshooter, carehart. org)
    December 10, 2012

    Shameless plugs

    Charlie Arehart
    Community Expert
    Community Expert
    December 11, 2012

    Akersha, since your “shameless plugs” comment is in indicated as being reply to my one comment in this thread (from Oct 2011), are you referring to the fact that I mention there are folks who can help with such troubleshooting, including myself? You really regard that as “shameless”? When I offer a link to several other companies who do it also? Would you have preferred I remained silent on it, and leave the readers to dig all over the net to find who might be able to help?

    Not all questions can be easily answered in back and forths on forums or mailing lists. Some people would rather get more immediate help. Giving them resources to consider is not shameless.

    /charlie

    /Charlie (troubleshooter, carehart. org)