CF8 keeps crashing

Report · Oct 09, 2007

I have 2 servers running CF8 for a few days, and if everything was ok on my test box, the productions servers keep crashing.
The 2 servers share the same site using a hardware load balancer. The database is on a SQL Server 2005 cluster, and the files are stored on a NAS cluster.
The servers are Dell 1850 mono Xeon 3.8 ghz / 4 gb of ram / 73GB scsi disks (raid 1)
The site receive about 10,000 to 15,000 visitors each days, which represents around 50,000 seen pages per day

I tried serveral parameters :
JVM : min/max heapsize from 512mb to 1400 mb / maxPermSize until 256 mb.
Maximum number of simultaneous Template requests : from 10 to 30.
Maximum number of running JRun threads : from 30 to 100.
...
but it does not seem to have noticeable effects. I tried to spy my code using the coldfusion monitor to find a special bottleneck or something in my code that causes the failure... but didn't find anytihng noticeable.
Evertything was working quite fine on CF7 (I upgraded) and crashs with CF8.

some of the errors attached.

Any suggestion ?

Report · Oct 09, 2007

Generally speaking "java.lang.OutOfMemoryError: unable to create new native thread" means you're asking code to do something to the stack that causes one of the memory pools to run out of memory.

I've not tinkered with CF 8's onboard diags yet. But you can turn on verbose GC and that'll help you identify which memory segement in the stack is running dry. You can then try tweaking the parameters for that memory segement in your java.args line.

Turn on verbose GC with:
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

Output from this will be in the related server.out log file.

I'd also recommend running the latest iteration of your version of Java. Many times issues like this are made worse (sometimes) or better between iterations.

Make sure you turn off verbose GC when you're done.

Report · Oct 10, 2007

I trying the verbose output,... (but as it needs a restart, i waited the next crash... which just happenned at the moment)

I was there when the crash occurs, and the error was the "The Metrics service is not available. This exception is usually caused by service startup failure. Check your server configuration" one. This is the error reported in the web browser when you try to load a cfm page but there is nothing in the exception log.

It's difficult to use the monitoring to watch the status of the server when the crash occurs because it stops when CF crashs, and resets all the data when CF restarts...
The only thing I can tell is that I had some alerts of the JVM memory consumption the hours before it crashed (I set an alert when the JVM reachs 900 mb for a 1024 mb maxheapsize)... but also the "altert recovered" few minutes after the initial alert (the jvm quickly go under 900 after the overrun)

Report · Oct 11, 2007

The server crashed again. I checked the exception log... and there is absolutely nothing :
the crashed occured around 6:45pm
the last error is a missing template :
"Error","jrpp-102","10/10/07","18:41:44",,"File not found: /espace.cfm The specific sequence of files included or processed is: \\10.1.17.243\microapp\ma_externe\revendeur\revendeur-ma\espace.cfm'' "
coldfusion.runtime.TemplateNotFoundException: File not found: /espace.cfm
at coldfusion.filter.PathFilter.invoke(PathFilter.java:89)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.CfmServlet.service(CfmServlet.java:175)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:284)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

The server was kept crashed around an hour to check different things, but the only other message in the log is a "The Metrics service is not available..." but when the server was restarted. So I think these "metrics" error just occurs when there is cfm pages calls when the CF server is starting and is not totally ready to serve the pages.

Report · Oct 12, 2007

On thing I noticed in your first message was this:

tried serveral parameters :
JVM : min/max heapsize from 512mb to 1400 mb / maxPermSize until 256 mb.
Maximum number of simultaneous Template requests : from 10 to 30.
Maximum number of running JRun threads : from 30 to 100.

Try a heapsize of 1024/384

You really need to be careful with just dialing around the max template requests and JRun threads. More is not automagically better. You need to stick "close" to the Adobe recommended 6-8 threads per CPU. This is a YMMV setting based on what your applications are doing.

If you were able to get verbose gc turned on, do you see anything in your server-out.log file (or something similarly named)? That's going to tell you where the stack is running out of memory and armed with that information you can configure your java.args accordingly.

Report · Oct 15, 2007

"Try a heapsize of 1024/384" : what do you exactly mean ?
Minimum JVM Heap Size = 384 & Maximum JVM Heap Size = 1024 ?

I set "Maximum number of simultaneous Template requests" to 5 for the last 3 days, but it does not solves the problem. The server crashed yesterday for example.

Here is a cfroot\runtime\logs\coldfusion-out.log abstract of the moment when it crashed

Report · Oct 15, 2007

It used to be that the metric service error was an error that was safe to ignore. But I don't know if that is the case with CF8.

Your permgen value seems kinda low (default). Consider giving this option a try:

--XX:MaxPermSize=256m

It will increase the size from the default 64MB to 256MB. You'd add it to the java.args section. Back up your old settings because if you configure that incorrectly, CF will fail to start.

By heap size I mean: -Xmx1024m -Xms384m as max and min values for your entire heap size.

Is this error something you can fix?

File not found: /espace.cfm

Report · Oct 15, 2007

Thanks for your suggestions.

For the "-XX:MaxPermSize=256m" ... it is already what is in my current config for many days.
The complete "JVM Arguments" field in CF Admin :
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=256m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dthis.computer=frontal1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

Is there something in the error log that makes you think that it is set to 64MB ?

I'll try 384 min/1024 max when the next crash will occur.

the file not found error is a bad referenced link in google (don't know how it happened) There is now a redirection page, but it seems insufficiant. I will add a real file to avoid this error (who knows, if all these problems come from lots of 404 :-).

Report · Oct 15, 2007

quote:

Originally posted by: obouillaud
Thanks for your suggestions.

For the "-XX:MaxPermSize=256m" ... it is already what is in my current config for many days.
The complete "JVM Arguments" field in CF Admin :
-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=256m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Dthis.computer=frontal1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC

Is there something in the error log that makes you think that it is set to 64MB ?

I'll try 384 min/1024 max when the next crash will occur.

the file not found error is a bad referenced link in google (don't know how it happened) There is now a redirection page, but it seems insufficiant. I will add a real file to avoid this error (who knows, if all these problems come from lots of 404 :-).

I doubt they do. But it cleans up the logs...

Here's where I'm seeing the 64MB size:

PSPermGen total 63104K, used 63082K [0x037d0000, 0x07570000, 0x137d0000)
object space 63104K, 99% used [0x037d0000,0x0756a870,0x07570000)

Report · Oct 15, 2007

So it seems that there is a problem with the "-XX:MaxPermSize=256m" argument. Should it be "-XX:+MaxPermSize=256m" (the + sign ?)... Maybe not, because all the docs on the net use ""-XX:MaxPermSize=..."

The other thing to consider is that this parameter is the MAX PermSize, not the default PermSize. Maybe 64MB is sufficient and Java does not need to increase the amount of memory for the PermSize...

I tried another setting (maxpermsize=192m) and restarted one server (the less crowded...)
In the coldfusion-out.log file I now can see :
PSPermGen total 30336K, used 30246K [0x03860000, 0x05600000, 0x0f860000)
object space 30336K, 99% used [0x03860000,0x055e9bd8,0x05600000)

Which seems to confirm the fact that the "PSPermGen" use a "variable" size... 30mb when the server starts...

5 minutes later here is the value :
PSPermGen total 41856K, used 41756K [0x03860000, 0x06140000, 0x0f860000)
object space 41856K, 99% used [0x03860000,0x06127138,0x06140000)

I know it may vary from user to user, but can you post your complete JVM settings ? for example do you have the "-XX:+UseParallelGC" parameter ? does it come from the upgrades (cfmx 6.0->6.1->7.0->8.0) as we are using CF since cf 4.5 or is it the default setting ?

I'll try "-XX:+UseConcMarkSweepGC" as it seems a good garbage collector for my use...

Report · Oct 16, 2007

This is my java.args line:

java.args=-server -Xmx1024m -Xms384m -Dsun.io.useCanonCaches=false -Xbootclasspath/a:"{application.home}/servers/cfusion1/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/webchartsJava2D.jar" -XX:MaxPermSize=384m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -DJINTEGRA_NATIVE_MODE -DJINTEGRA_PREFETCH_ENUMS -XX:NewSize=192m -XX:PermSize=32m

It has been refined over the years and has served us very very well in all of our environments.

The commands could have changed in Java 6. Might want to double verify that they are valid for Java 6. The above is for Java 1.4.2 since we've not made the jump to 8 yet.

I would suggest that 64MB isn't enough, since you're running out. And by the looks of your logs, and the spaces being at 99%, that would be the case.

Report · Oct 17, 2007

I'm trying the "UseConcMarkSweepGC" and the "XX:NewSize=192m -XX:PermSize=32m"
we'll see what happens...

Report · Oct 17, 2007

doesn't seem to have positive changes

The errors of this night are following.
- new heapsize problems... but it does not crashed the server.
- I also had massive "corrupt table null" errors this night (I already had some a few days ago but this time I have a lot of these ones). I think it's "cache" related, there is another thread about this on this forum. Adobe said they solved this bug, but there is still no hotfix !!!

Report · Oct 17, 2007

New crash (on the other server) :

Report · Apr 11, 2008

I would love to hear the resolution, if one was found, as well...

My company is having similar issues too, "coldfusion java.lang.
OutOfMemoryError ... Out of swap space?". We have 4 load-balanced CF8
Standard Edition servers. After days (sometimes longer) of CF running from a
clean start, we are getting these errors. It's seemingly random.

We are also seeing, "coldfusion.server.ServiceFactory$ServiceNotAvailableException: The Metrics service is not available."

TIA

Report · Jul 25, 2008

Just wanted to throw in here that for at least on of the errors referenced above, I had a client who saw it go away with an upgrade to 8.01 ( http://www.adobe.com/support/coldfusion/downloads_updates.html#cf8) and/or its 8.01 Cumulative Hotfix 1 ( http://kb.adobe.com/selfservice/viewContent.do?externalId=kb403622). I can't recall which it was, but both are worth trying if you're still stuck.

Besides 8.01 upgrading the JVM a bit (from 1.6.0_01 to 1.6.0_04), the CHF included mention of some related to image handling, and the specific error (like above) that I saw fixed was getting the "out of swap" errors which specifically reference "jbyte in C:\BUILD_AREA\jdk6_01\hotspot\src\share\vm\prims\jni.cpp" in the out log message (as does one of the above). Note that the error clearly shows this was running on the 6.01 hvm.

The PID output shared above was also the same as what my client was seeing, in that the first line of the stack trace showed reference to the following on its first line: com.sun.medialib.codec.jpeg. That suggested that things had to do with image manipulation. I can't recall now if the code was using CFIMAGE or cfx_imagecr.

Anyway, sorry I can't be more definitive. Some of the details are hazy for me now in retrospective. And I know that some of the "out of swap" space errors others are getting are not related to or resolved by this 8.01 upgrade.

/Charlie (troubleshooter, carehart. org)

Report · Oct 17, 2007

obouillaud wrote:
> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> # java.lang.OutOfMemoryError: requested 26664960 bytes for jbyte in
> C:\BUILD_AREA\jdk6_01\hotspot\src\share\vm\prims\jni.cpp. Out of swap space?
> #
> # Internal Error (414C4C4F434154494F4E0E494E4C494E450E4850500017), pid=2528,
> tid=4572

This is a bit of a long shot, but can you install JDK 1.6 on your server
and run the jmap.exe utility to keep track of where all the memory goes?
Just dump it a few times shortly after you started CF, set it on a
scheduled task and then compare with the last dump before a CF crash.
You could also try to set the -XX:+HeapDumpOnOutOfMemoryError JVM option
and then analyse with the jhat tool (but I have not been very successful
with that).

Jochem

--
Jochem van Dieten
Adobe Community Expert for ColdFusion

Report · Dec 04, 2007

Hi, we are experiencing the same problem described in this thread. 6 front-end servers equipped with CF are running over a similar infrastructure and settings described in the first post with a NAS and a SQL server 2005 cluster, and they keep to crash with the "no more trheads available...." error. Maybe somthing is fixed with the latest cumulative hotfix or maybe an Adobe engineer has found a solution for your problem. Could you please let me know about this?
Thanks in advence

Report · Dec 04, 2007

you've mentioned looking @ the server monitor - did you do memory profiling at all? ensuring that GC is running as it should - looking for high # threads/ high memory usage per thread/query/etc

a few front end servers that we use have just been dedicated to CF8 - the servers use about 16GB ram among 10 instances - I do (rarely) see the 'java.lang.OutOfMemoryError' and 'permgen null' errors, but this is only when the particular instance is under extreme load and trying to handle higher amount of threads than it can -and GC 'locks' up. If traffic to the particular instance is dropped - it usually cleans up withing a couple minutes. Otherwise requiring a restart of the service.

another idea, what ver of JVM you run with CF7? since you say it ran fine on CF7, have you tried rolling to a different version of the jre? prior to CF8, we used JDK 1.4.2_12 - 14 due to issues we ran into running newer JRE.

Below are my 'usual' java arguments for no particular reason -

java.args=-Xmx1024m -server -Xms1024m -Dsun.io.useCanonCaches=false -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxPermSize=256m -XX:PermSize=64m -XX:NewSize=48m

hope you figure it out soon! I really enjoy using some of the newer features of CF8

Report · Mar 25, 2008

Did this problem ever get resolved. I am having similiar issues,,,

If so, what was it?

- Aex

Report · Jan 07, 2009

Er - anyone find a fix? We are trying new/larger maxpermsize, reducing the work cfimage has to do etc - even bringing up a new server. We had to rollback features because our servers became unstable.

Is there a more stable image library out there? Would be shocked if Adobe didn't have something stable for ColdFusion, but there it is.

We have the latest patches.

Report · Jan 16, 2009

Hi crania,

I encountered the same problem, and we have increase memory, change server... without solution.

The server crash 4/5 times a day... So we have implemented a program to restart jrun when a long respond time occurs...

I have made a modification since two days and the server haven't crash since ...

- Decrease timeout for session variables on coldfusion administrator and cfapplication tag (i have reduce the time from 8 hours du 10 minutes).

Increase memory Maximum JVM Heap Size (MB) to 1024 and change MaxPermSize to 512m :

-server -Dsun.io.useCanonCaches=false -XX:MaxPermSize=512m -XX:+UseParallelGC -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib

Regards,
Benoit