Copy link to clipboard
Copied
Each time we restart the instance, even without any user traffic at all, cpu will eventually jump to 24-25% and stay there.
In some cases this will double, which then makes the site using the instance slow and unresponsive
"Thread-27" #95 daemon prio=5 os_prio=0 cpu=2798078.13ms
State: RUNNABLE
at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:323)
This thread has consumed 2.7 million milliseconds (46 minutes) of CPU time in IOCP polling.
Any help greatly appreciated.
Thanks
Forrest
Copy link to clipboard
Copied
It's reasonable to conclude you've found a bug, but instead there may be an environmental contributor that's unique to you. (FWIW, I've not heard of this but perhaps Adobe or someone else has.)
And that said, you've done a great job identifying some key diagnostics, but I'd press for more.
In addition to pmt tracking of requests, see its tracking of cfthread threads (created by cfthread within cfml). Sometimes those can be running amok, often unnoticed.
Along the same lines, have you checked the pmt's tracking of jdbc activity? One reason I ask is that such cfthread threads could be doing queries, which entail network ii.
And while you saw NO requests running at the time you looked, you may still want to confirm whether there was a spike in request activity, anytime between your cf restart and now.
Further, while you may not see requests running amok, it may not be their NUMBER/FREQUENCY but their NATURE. Sadly the pmt doesn't log every request (like FusionReactor does), but your web server logs would, of course. It can just be like looking for a needle in a haystack, since the web server logs track ALL requests regardless of type. (Note that cf can be configured to log each cf request, which at least limits the log to only those.)
As for your thread dump, a few more thoughts. First, let's clarify that the "Thread-27" using the cpu is NOT related to the cfthread threads I asked about above. Those would be "cfthread-nm", with the nn being a number.
Further, a thread dump is a point in time resource. That tracking of total time (since cf came up) is rarely valuable, though it may well be useful in this one case. What would be MORE useful is to see thread use over time, and more specifically what Java objects are being used in threads over time. That's generically referred to as "profiling".
And while the pmt can be told to profile a request or a cfthread thread, it can be told to profile ALL threads of ALL types. That can be useful, beyond what you found. This is something fusionreactor can do very easily, and jvm tools can do it as well (though for many folks, setting those up for a prod system is often too challenging).
Anyway, let us know if you learn more, or if you have thoughts on what I've offered or that others may.
Copy link to clipboard
Copied
I think that, at this point, the single most important question/suggestion is: Have you upgraded ColdFusion 2023 to the latest update level, namely, Update 16?
Copy link to clipboard
Copied
This morning it is using over 60% with no active site users at all almost immediately after a reestart of the instance. I have dealt with sites with heavy traffic and slow db queries etc, but with zero traffic what would cause the coldfusion process to continually use 60% CPU? Normally at least for me if CPU locks at a certain level there is an issue with heap or CF running out of memory. That is not the case here. Something is running and I can't seem to find what it is. Some days it will run all day no cpu spike. Others it will spike at 24% then later it doubles almost exactly. So I suspect there is one particular action or script that triggers it but I sure can't find it.
I have checked cfthread and jdbc activity during spikes and PMT again shows nothing out of the ordinary.
Copy link to clipboard
Copied
No we have not updated CF 2023. The last time we tried using the CF Admin panel it messed up all or most of the installed packages and had to restore. Afraid to risk it since this is a production server.
Copy link to clipboard
Copied
What is your present ColdFusion update level? It is likely that the issue you're facing will go away when you update ColdFusion.
There are ways to mitigate the risk of updating ColdFusion 2023 on the production server:
So you just have to install Update 16 on the test environment -- and test it. Depending on how urgently you want to resolve this issue and on how much risk you're willing to take, you may:
Copy link to clipboard
Copied
I hate to admit it but we are on Update 8. Like I said the last attempt failed completely even when following instructions, instance would not restart, and even after restore and restart some packages were broken like pmtagent and pdf packages. IF I can find specific step by step instructions on safely updating the core and all packages I would consider another attempt, but only if i knew I could put things back the way they were. I found that uninstallers usually work but only if the install actually finished, lol. Any advice appreciated as I know we need to update. I do have a test server here, and of course the update here worked flawlessly.
Copy link to clipboard
Copied
I hate to admit it but we are on Update 8.
By @forrest_3294
You really MUST update ColdFusion 2023. It is a risk worth taking. After all, the current issue is just as worrisome as any you might get from updating. But then, updating has a tremendous upside.
As I said earlier, it is even likely that upgrading to the latest update will resolve the current issue. If it doesn't. it will still put the issue in greater perspective.
Copy link to clipboard
Copied
Please do a thread dump when the issue occurs, and share it with us. The following message
"Thread-27" #95 daemon prio=5 os_prio=0 cpu=2798078.13ms
State: RUNNABLE
at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:323)
shows that there is a persistent live thread (an asynchronous Input/Output task) between ColdFusion and Windows. The only thing I can think of at the moment is: database drivers.
Review whether any of the database drivers you're using requires an update. Is any of the drivers making a persistent connection?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Also in process explorer if I pause the thread using the most cpu cycles, cold fusion pages no longer load on the site until I resume it.
Copy link to clipboard
Copied
The file cf_threads_1.txt cannot be displayed.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
For some reason it does not work, passes virus scan but won't let us view it here. Tried zip but not allowed
Copy link to clipboard
Copied
DSN is SQL server. Same settings it has always had. It is set to maintain connections.
Copy link to clipboard
Copied
DSN is SQL server. Same settings it has always had. It is set to maintain connections.
By @forrest_3294
That is unlikely to be the cause, as it does not engage with the Windows Operating System. Whatever is causing the issue does engage with Windows. in any case, that's my diagnosis.
Copy link to clipboard
Copied
@forrest_3294 , I have been looking further into this issue. It is now clear to me that databases are not involved. I have a new idea.
You will be relieved to know that the idea does not involve updating ColdFusion 2023 (However, updating is still vital!).
The new idea is to check whether the cause is the Java version on which ColdFusion runs. My reasoning is as follows.
The thread you’re seeing:
at sun.nio.ch.Iocp.getQueuedCompletionStatus(Native Method)
at sun.nio.ch.Iocp$EventHandlerTask.run(Iocp.java:323)
is inside the Java Virtual Machine (JVM)’s native NIO layer, and not in ColdFusion’s Java code. That means it’s the JVM that’s invoking the Windows API GetQueuedCompletionStatus() in a loop.
So, my new suggestion is:
Copy link to clipboard
Copied
Akamai non conosco. In tutta la tua applicazione come usi i richiami HTTP, con new http() o cfhttp()?
Copy link to clipboard
Copied
Old codebase so mostly uses cfhttp tags. Do you think cfhttp calls may not be closing or timing out? We do call a lot of external apis. Thanks
Copy link to clipboard
Copied
@forrest_3294 , Did you see my last suggestion?
I repeat:
Hence, you will likely solve the problem by installing Java SE 17.0.16, and running ColdFusion 2023 on it. This solution is less risky than updating ColdFusion, which you're hesitant to do. One added bonus is that you need to update the Java version anyway (for obvious security reasons).
Should the Java update give any problems, which is highly unlikely, then it's as simple as ABC to revert to the original state. All you then have to do is revert to the original java.home setting in /[CF_INSTANCE]/bin/jvm.config.
Copy link to clipboard
Copied
Old codebase so mostly uses cfhttp tags. Do you think cfhttp calls may not be closing or timing out?
By @forrest_3294
Answer: Yes, that is a possibility.
ColdFusion’s cfhttp tag uses Java components that ultimately depend on Java NIO's non-blocking or pooled connections. Therefore, if your application:
then you can easily end up with high network I/O load. As IOCP threads are the ones doing the work, their CPU time will then accumulate.
Should that be the case, the recommended solutions would be (in order of relevance):
10 JDK-8291638 core-libs/java.net Keep-Alive timeout of 0 should close connection immediately
11 JDK-8291637 core-libs/java.net HttpClient default keep alive timeout not followed if server sends invalid value​
https://bugs.openjdk.org/browse/JDK-8291638
https://bugs.openjdk.org/browse/JDK-8291637
Find more inspiration, events, and resources on the new Adobe Community
Explore Now