Copy link to clipboard
Recently upgraded production servers to CF16u5, and now experiencing high memory usage leading to crash.
Have 4 instances in cluster with sticky sessions and replication. They are running on Unix Solaris Sparc with Apache 2.4.29, mod JK 1.2.41.
Upgraded from CF11u10 where they ran fine and used the cfsetting migration tool. Immediately after upgrading the instances would start up but would not serve pages to include the CFadmin page. Found that there was a file "port.properties" with a Shutdown_port value that was being consumed and would have to be recreated multiple times while starting in order for it to serve pages.
Now that I have gotten them to actually work, monitoring them in FusionReactor they are showing high memory usage in heap. GC's are happening regularly but right around 2 and a half days the memory gets maxed out and they start bouncing disconnected in FusionReactor. The CFadmin page will still serve but it is extremely slow and I am not getting any outofmemory errors in the logs. The only resolution is to restart the instance which requires re-creating the port.properties file multiple times for it to consume and then start once again.
This cycle seems to repeat itself just shy of 3 days uptime. If I don't manually restart the instances they will crash, SVCADM will restart them but they won't serve because of the port.properties file that just disappears.
To make this entire thing even stranger probably the second time this happened one of the instances restarted and has been up ever since with no high memory usage doesn't crash and is serving. I have cross referenced all settings in server.xml, web.xml, and CFadmin settings between the one that is working and the other 3 and can't find any differences.
There isn't a whole lot on the web about this port.properties file. The value in it is also located in server.xml.
Just wondering what this could possibly be if anyone has any similar experiences.
Copy link to clipboard
Kevin, you should be able to solve this. It's not that somehow this is a leak in CF2016, I can almost guarantee. It's just a matter of finding what's really up. I do it with folks daily (here and elsewhere). There's a lot to consider. If it was easy, you wouldn't be having to ask here, right? 🙂
1) First, since you say you migrated settings from CF11 to CF2016, can you answer simply whether you confirmed that your heap settings in 2016 are the same as they were in 11 (in the CF Admin, or in the jvm.config file)? I know you said they are the same among the 3 CF2016 instances, with one "not having a problem", but there may be some other reason for that one "not having a problem".
If you perhaps no longer have your CF11 settings, what is your heap size in CF2016?
2) And your title says "crashing in FusionReactor", but it's not crashing "in FR". FR simply runs within the CF instance. It's the CF instance that is crashing or hanging.
3) Now, you say you are experiencing "high memory usage". Can you clarify? Do you mean that the memory graph on the top right of the FR Metrics>Web metrics page shows the heap used being a high percent of your max? That does not necessarily mean there is a problem. if you click the "garbage collection" button (just below the CPU graph on that page), does the memory drop? If so, then the "high" memory was just about the JVM being lazy and it would have done that same GC itself at some point.
And I know you say you see GCs happening, but all GCs are not the same. You may have frequent "minor" GCs, but clicking that button requests a "major" GC. BTW, if it doesn't go down at all, it could be that your CF JVM args indicate to disableexplicitGCs, in which case only the JVM can decide when to do them.
4) But let's move on to what may be really amiss. Do you have something reporting "outofmemory"? On screen or in a CF log? You would look at the logs before your last "crash". I say that in quotes, because it may be that CF was up and running, just no longer running new requests, often because too many were running and hung--and in that case you need to find why those requests hung. The FR logs (always on, and kept by default for 30 days) and FR CP alert emails (if you enabled them) can really help see what was happening prior to the crash.
Also, in case there may be some connection between the issue and your FR version, can you confirm what it is (using the FusionReactor>About menu option in the top left corner, of your CF FR instance, not the FRAM instance at 8087.)
5) I realize all of this may be too much to take in (and discuss, and elaborate on) in the forums here. I hope I gave you some starting points, and if you have quick answers perhaps we (me and others here) can get you sorted. I will point out that I did a whole FR webinar on "post crash troubleshooting", one of many at http://www.fusion-reactor.com/webinars.
And of course, if this is pressing and you don't want to wait for back and forth, there are folks (like myself) who do this sort of troubleshooting via consulting. I list such folks in a category of my CF411.com site, specifically http://www.cf411.com/cftrouble. I will add that I can help with this interactively (safely, remotely, via screen-sharing, without you giving me direct access) and with a satisfaction guarantee (you won't pay for time you don't find valuable), and often seemingly sticky challenges like this can be solved in less than an hour (with you also learning better how to use the available diagnostic resources, to solve problems like this and others that may happen later.)
But again, I'm happy to try to help here, as well, and others will surely join in as well.