before I start here is some basic info about our production environment:
OS: CentOS Linux 7
CF: ColdFusion 2018 (2018.0.10.320417)
Apache httpd: 2.4.6
We have installed ColdFusion 2018 lately and noticed that FusionReactor shows a lot of TIMED_WAITING threads called `pool-<number>-thread-1` which ramp up and don't seem to be closed:
since this looks worrying to me I'm here to ask if anyone knows if this is normal behaviour or if we need to take action before our server crashs like this one.
I took a look at some thread dumps of these `pool-*-thread-1`-threads but they don't give me a clue where they come frome and what they were supposed to be used for.
I don't know much about Java threads yet and thus can not debug this right away. Maybe anyone here has an advice where to start?
btw: at first I thought these threads are the ones which are configured in `workers.properties` and `server.xml` but they are set to max 250 whereas those pool-threads are thousands - anyways here is our workers.properties for reference:
worker.cfusion.connection_pool_timeout=60 worker.cfusion.connection_pool_size=250 worker.cfusion.max_reuse_connections=250
and the connector tag
<Connector connectionTimeout="60000" maxThreads="250" ...
Copy link to clipboard
Jabo, this is indeed a problem that has hit others, like that thread you linked to shows, and as I have also experienced with folks I've helped directly (in my consulting).
And I just offered a comment on that other thread that I will offer now here (since some may not bother going there, and you yourself may not have had any reason to know I added the comment there since writing the above):
"...just last week I heard of a bug fix that seems may POSSIBLY be related to this, at least to a blossoming of threads in CF2018. Unfortunately, the bug report is not public, but one CAN request the bug fix from Adobe. I have asked them to clarify if it's about "pool" threads specifically, but have not heard back since last week.
If you or others want to either give the fix a try, or ask Adobe for more info, the hotfix file name you'd request is hf201800-4207395.jar. (If they would want to share the file or a link to it here, I will leave that to them.)
With hofixes like this, you would drop it into the lib/updates folder under cfusion (or whatever folder name represents any other CF instance you are running), and do NOT remove any other already there, then restart CF. It should pull that new fix in, adding it to any other update already there. And then, if a later update would incorporate this fix, that update will itself remove this hf jar from the folder for you.
If anyone tries it, I hope you will report back on how things go. I have shared the same news with the clients who experiecned it, and I hope to remember to report back if I hear news from them.
Or if you or anyone else may have since tried and found resolution some other way, please do let us know
Same goes for you, Jabo, or anyone else seeing this here. I would LOVE to see this resolved for everyone concerned.
Hi Charlie, first of all thank you so much for your offer to help us investigating this issue. It is so frustrating to restart Coldfusion every two to three hours, when the system becomes completely unresponsive. We only have one instance running but thank you for your hint to double-check the installation of the hotfix. To be honest, I didn't compare the filenames, it seems they obviously sent me the wrong patch (hf201800-4207069.jar, not hf201800-4207395.jar). We are still waiting to get the correct file. Provided that the fix comes soon, I will post the results tomorrow morning.
Happy to help, Dirk, and thanks for the kind regards.
About that other hotfix, that's another issue of course ( https://tracker.adobe.com/#/view/CF-4207069 ), and it's been indicated as helping folks with issues of getcomponentmetadata taking a long time, which some are saying was crippling Mura apps on CF2018 (so I am waiting also to hear from folks there about whether it did help that).
But it's definitely not the same problem as this, about threads (especially pool threads) growing at ever increasing rates, fixed by hf201800-4207395.jar. Again, sadly the Adobe bug report for this threads issue was deemed a private one and so we in the public can't see it to know more. But we can hope it will be incorporated into any next update to CF2018, like perhaps update 11.
Am looking forward to hearing what you may find.
It looks like that the hotfix solved the issue. We installed it a day ago and now everything looks good so far.
I asked the support team about WHAT caused the surge of threads as it obviously does not affect a lot of installations. Unfortunately I just got a very short and useless answer:
"The issue was that the cf threads were growing exponentially and with this patch we have fixed that."
Great to hear, and thanks for sharing. Sad to hear that we can't get more out of the Adobe folks on this. We're asking a simple, reasonable question.
Anyway, at least we can infer the answer to what I had asked (and hoped), as to whether this fix affected those with blossoming POOL threads. That was not clear, but since that's what you experienced and this fixes that, now we know.
I would still very much like to know the CAUSE of this, both to know why it affects so few, and especially if it may affect folks on cf2016 (or earlier), for whom a hoyfix may not be offered.
I just got this answer from Adobe support about WHAT caused the issue:
The issue is was due to cflogin tag. Cflogin tag was creating additional threads exponentially. We have resolved it in the patch shared.
This is comprehensible: Durning the night nothing happened and exactly at 8 am a surge of threads appeared when people start working.
Thanks for the update, Dirk. I can confirm (for the sake of readers) that I had another client in the past who had a similar "blossoming" of threads--in their case these very "pool" threads I mentioned in my first comment, and in their case too we narrowed things down to cflogin processing.
They ended up just removing it, to solve the problem (didn't wait for any fix from Adobe, nor did they report it that I know of). But it's "good" to hear that this issue is now fixed by that hotfix. Glad to have shared it, and looking forward to Adobe incorporating it into a new general update soon.
We have a similar issue to Dirk, but we do not use CFLOGIN. Rising pool threads over time, slowly. We have had to increase the number of threads in our connector to a very large number to allow the server to stay online, which is far from ideal.
We applied the Adobe hotfix to see if it helped, but alas it did not. On this blog - https://blog.tier1app.com/2014/09/30/memory-leak-in-java-executor/ - there is a statement that the Java/TomCat Executor could be buggy, and that "parked" threads are never properly released. This article is from 2014.
Can someone tell me (Adobe?) if that issue has been resolved? Was it a bug? The article says to use the shutdown() method in the Executor, as this will release the parked threads. Does anyone know how we can call that method on our Executor so we can try? We use the default Executor - the internal thread pool (https://docs.oracle.com/javase/10/docs/api/java/util/concurrent/ThreadPoolExecutor.html). We have no <Executor> entry in server.xml configured.
Advice appreciated as always.
That 2014 post doesn't identify a tomcat bug. It identifies a Java coding example that lead to a blossoming of pool- threads.
As for any hope that we as cf users can fix this based on the Java "shutdown" code offered, I find that highly unlikely.
These cases of pool thread leaks seem entirely caused by something inside of cf (or a library it leverages). That could be tomcat, sure, but the cf login issue discussed earlier had nothing to with tomcat.
Your issue could be anything, really. I doubt it's caused by any code on your end, but some code (that only Adobe can help with, as a bug fix, I'd expect ) or perhaps some config in cf that we MAY be able to change (once we know what the problem is).
The challenge of course is determining the cause.
Great finding, @tribule !
The Executor issue, if confirmed, will be relevant to many modules in ColdFusion.
Please create a ColdFusion bug ticket for this.
That shouldn't take time. I think it is sufficient to copy-paste that one paragraph above.
For your process to be able to call shutdown() on an Executor, it has to have been injected into the task. That would entail some severe re-engineering. Certainly not at the level of CFML code, but in the ColdFusion engine.
However, such heavy-lifting might be unnecessary. We might be able to configure Executor behaviour in server.xml:
<Server> <Service name="Catalina"> <!-- The connectors can use a shared executor, you can define one or more named thread pools--> <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"maxThreads="150" minSpareThreads="4"/> </Service> </Server>
Thanks for the reply.
I already previously added an <Executor> as a test, and it made no difference. It was defined as:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="150"/>
This has been commented out now at the advice of CF support (Vikram).
Do we need a different Executor defined when we have just one Connector pool? I thought Executor's were for specifically maintaining multiple connection pools?
The connectors we have now are:
<Connector connectionTimeout="20000" port="8500" protocol="HTTP/1.1" redirectPort="8451"/>
<Connector connectionTimeout="60000" maxThreads="5000" minSpareThreads="20" port="8018" packetSize="65535" protocol="AJP/1.3" redirectPort="8451" secret="E1C11B81-1234-4C03-9877-841072C7A0FC" tomcatAuthentication="false"/>
The minSpareThreads was added at CF support's advice, did not help with the rising threads either.
> I already previously added an <Executor> as a test, and it made no difference. It was defined as:
> <Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="150"/>
> This has been commented out now at the advice of CF support (Vikram).
I would heed Vikram's advice. As Adobe ColdFusion Engineer he is better acquainted with the underlying Java libraries than we are. In fact, I have never had to manually configure this Executor.
> Do we need a different Executor defined when we have just one Connector pool?
My gut feeling is that we don't need to define a different Executor. Notwithstanding the fact that the element <Executor> is commented out in server.xml. When ColdFusion starts it initializes an executor by default. 🙂
In ColdFusion 2021 you can see evidence of this in the server.log:
"Information","ajp-nio-127.0.0.1-8014-exec-3","03/05/21","13:54:13","","Initializing executor pool with core pool size of 25 max pool size of 50 and keep alive time of 2,000"
> I thought Executor's were for specifically maintaining multiple connection pools?
I think ColdFusion uses - and reuses - the threads in the pool for various tasks besides maintaining connections.
If you load jConsole and look at the "Mbeans" tab, you can see all of the threadpool settings and methods (via ThreadPool -> UtilityExecutor -> Operations) and even call the various methods - screenshot below. I can call the shutdown() method on our threadpool to test what it does. I allowed threads to build up for 24 hours. There were threads that were very old and seemed to be doing nothing important (a guess on my part, agreed). I called the shutdown() method and not a single thread was released/removed. It seems our threads are "orphaned", such than any interrupt cannot reach them or purge them. I will try again in a couple of days when there are even more really old "timed wait" pool threads to test on. Thought this might be interesting anyway - great little tool, jConsole and it's bundled with the JDK.
Many commentators on this subject have said that timed wait threads are nothing to worry about, but surely threads should not rise indefinitely without some form of automatic housekeeping? Would people agree that a week old pool thread should have been purged?
People would agree with your last statement, yes. The pileup of threads is a problem to be solved.
And sure, jconsole and seeing jmx metrics and executing their methods can be compelling. But you guys are still focused on tomcat and ajp and http connectors. I think you're barking up the wrong tree. Again, the threads you have piling up (and as discussed by others in this thread going back to 2020) are pool-threads. Not ajp-exec-nio- (or bio-) nor http- threads.
You are calling that shutdown method on the ajp connector/executor you found, right? I would argue that why it made no difference. Do you find an executor for anything related to any other kind of pool? Sadly, you may not readily find what jmx metrics go with what thread types.
But here's something to consider: readers may note that tribule's screenshot show he's using FusionReactor. Note that FR offers a ui for looking at jmx metrics (in all but the FR Standard version), found under the Metrics menu.
And one cool feature is that you can have it expand all the many types of metrics and their trees. Since it's a web ui, you can then use your browser find feature to look for any other executors to seek their shutdown methods.
I'll just add also that it's not clear what may come of randomly shutting down pools, so do proceed with caution.
Again, it seems the ultimate solution will come from an Adobe fix to the problem. But sure, we can try to find what may help get them there, especially if they are unable to replicate the problem. That seems the case.
Finally, there is yet another feature of FR that may help with solving this. Tribule and I are due to meet Monday to try to use that to see if it may help us find/resolve the problem. We'll report back if it does (and I'd share them more on how it helped, also. For now, it would be too much to write, especially since it's not clear if it will help.)
Hi Charlie. I had a good look at all of the tree and that was the only one with methods the same as in the documentation for the Executor, but yes it may be a different thing altogether. I find myself barking up many trees at the moment, but at least it gets me a bit more understanding.
Btw in the MBean tree, the "Threadpool" section for NIO and AJP are separate from UtilityExecutor. Ram was saying that our stacktrace showed we were using "java.util.concurrent.ThreadPoolExecutor" and I was wondering if "java.util" referred to the UtilityExecutor. Also the NIO/AJP sections in the MBean tree have their own shutdownExecutor methods. Guesswork on my part, but I thought it was worth a try at least.
OK (on your first point) and understandable (on the second).
Let's see if we may make different progress tomorrow, to share for folks here.