Every couple of weeks the ColdFusion Launcher Application will go up and max out the CPU on the server. Site will then begin to timeout. Restarting the ColdFusion server services is required to resolve the issue. The is happening on two separate web servers. Both are running CF 2016 and Windows Server 2012 R2.
You have some bad / expensive code somewhere.
Something in the code is causing this CPU to increase and grind to a halt. You will need to use something like the Server Monitor or Fusion Reactor to see what is going on when this happens and what requests are happening.
Once you can see what page is causing the issue, you can then look at the code to see why this is happening.
+1 what haxbh said
Andrew you could also monitor the CF Java plus tomcat to know those are well not having a problem leading to CPU being maxed out. You can use free tools like JMC which is part of Oracle JDK to check on CF Java plus tomcat when CF has JMX (java management extensions) enabled.
I have gotten in touch with support and have provided thread dump and heap dump to them. I am waiting for them to get back to me.
Support requested that I apply the latest CF updates and rerun the site connectors. Then they wanted a fresh thread and heap dump to analyze. Unfortunately this was on a production system so I have to go through the process of applying the updates to dev and test then running thru QA before I can apply them to production. This process take about a week so I had to go ahead and restart the services on the Prod system to get rid of the issue.
No I have to wait for the CPU to max out again before running the thread and heap dumps. This usually only happens ever couple of weeks.
We are experiencing the same issue. After about 6 days under production load we see abnormal high CPU, requests take longer and over performance degrades. Restarting the CF immediately fixes the issue. CPU profiles return to normal under the same load.
We are also running CF 2016 update 3 and Windows Server 2012 R2. A cluster of 6 nodes, the nodes begins to spike with in a few hours of each other as they all get restarted at the same time and take the same distributed amount of traffic.
This occurred production, and I was unable to take a heap dump at the time because the site was failing.
Have you had any new updates on the issue?
Unfortunately our issue seems more sporadic and can take up to three weeks before it happens. I am waiting now for the next occurrence to happen so I can send thread and heap dumps to support.
It could be anything, you need to have full-time monitoring installed. We use FusionReactor, consider that. Its free for 2 weeks and then you can do month by month after that.
I had something similar happen and it was that the jvm.config default XX:MaxMetaspaceSize=192m was not large enough. I increased mine to 512m and that helped.
Also play around with using XX:+UseConcMarkSweepGC (*instead of -XX:+UseParallelGC) -XX:+CMSParallelRemarkEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark in your jvm.config.
Here's some of the args I use:
java.args=-server -Xms5g -Xmx10g -XX:ReservedCodeCacheSize=128m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark
The CPU spiked again over the weekend. I was able to get a snapshot from within server monitor and also a heap dump on the server. I have sent both the Adobe CF support. I will add updates as I get them.
We're having same issue. ACF 2016 running on Windows 2012. Will try some of the suggestions provided by Neo Rye but one instance we keep strictly for scheduled tasks and as is a new dev box there are no tasks yet. Still it'll jump to 100% cpu within a couple of weeks of reboot.
CF support reviewed our heap and thread dumps. We don't have any memory leaks. They did find where the worker thread for monitoring services is getting blocked frequently so I changed the server monitoring IP in the jetty.xml from 0.0.0.0 to 127.0.0.1.
Also they had me increase the -xmx value in the jvm.config from 1 to 2 gigs and change the Garbage Collection setting from Parallel to G1.
So far everything is working since the change but only time will tell.
I have this same issue on my two ColdFusion 2016 servers but I will add the following. Our security guys run a scan of my servers twice weekly. The jump in CPU utilization that I see on my servers is directly related to these scans. I asked them to stop doing them for a few weeks and the issue stopped. Then as soon as they restarted the scans the issue returned. One of the scans runs every Sunday. Before the scan the CPU was running in the 3-7% range. Being a Sunday with our user off for the weekend we had nobody using any of the websites/applications. Following completion of the scan the CPU utilization has jumped to 88% and is holding in that area.
I am 100% convinced that the scan which attempts to access various ColdFusion scripts and functions is not releasing them so they remain in some sort of active state and thus do not release the CPU.
Note: To get by our issue until a fix is determined we have a reboot of the server scheduled for each Sunday evening.
JFCaroll, I realize this is 2 years late in coming, but your issue (of this cpu spike being tied to the security scans) is precisely what's solved by the suggestion Andrew got from Adobe, where he "changed the server monitoring IP in the jetty.xml from 0.0.0.0 to 127.0.0.1."
He wrote that a year before you (in 2017), and said "time will tell", but note he never updated us. Many others have indeed found a resolution to that SPECIFIC cause of high CPU in CF. In fact, it was discussed in a bug report opened in 2016 (https://tracker.adobe.com/#/view/CF-4141711), which has more details for any who may come across and want to consider this.
Also, you'll see comments from me there indicating that this problem was addressed by Adobe with a fix in CF2018 update 2, released in Feb 2019. So it SHOULD be less likely to hassle people in the past year. But I add this in case anyone comes across this post in 2020 and beyond looking for solutions to CF CPU issues.
All that said, there can be MANY other reasons for high CPU in CF, and some of them (and tips for diagnosing them) have been shared in previous comments here. There are just too many possible explanations and solutions to lay all out all possibilities here.
I will say that if someone hits such a problem and doesn't want to spin their wheels digging around the internet or sending dumps back and forth, I have been able to solve 90% of the CF CPU problems presented to me the past 13 years. More about my rates, approach, and satisfaction guarantee at carehart.org/consulting. (I know some hate "sales pitches". I provide hundreds of answers per month here, and only occasionally mention my services, and then only when it seems most expedient for certain problems.)
To add to what Charlie said ... my company also had the same issue. One of the 4 nodes in the cluster kept spiking high CPU and becoming unresponsive. At the time we were using CF9. After some investigation, what turned out to be the culprit on that "bad" server node was that somebody had turned on server monitoring and left it on in PROD. Once we switched monitoring off, the problem disappeared.
With CF2018, server monitoring is separated from the actual CF servers. It should be installed on its own dedicated server to prevent the extra burden that monitoring places on the CPU. So I would say, check your server configuration and see if you have a problem with server monitoring first before entertaining other causes.
Good point, and that (someone having the CFSM set to do "memory tracking") was indeed one of the common causes of memory problems, which could lead then to CPU problems. As you say, it's removed from CF2018, but those on earlier versions should still consider it (I always would, when assessing things).
Then again, what one CAN do is find out things like whether memory (heap use, or metaspace) IS high, and if so you will usually see lots of garbage collections happening (easier to see with different tools), and again THAT will lead to excessive CPU. One would normally see that if the CFSM "memory tracking" was on for too long.
Or if there is some OTHER "use of CPU", we would want to find what is causing THAT high CPU, and again there are different tools to help with that.
Like I said, just so many possible examples to consider. And while sometimes some one specific issue will be "the answer" for some folks, often some OTHER problem may be the cause, and that's where diagnosing the problem is generally far more productive than hoping to find that one "magic bean". 🙂 But that's not to knock your sharing what worked for you. Perhaps others will be compelled to add in what they have seen.