Copy link to clipboard
Copied
Having some issues with multiple Ubuntu servers recently upgraded from CF2018 to CF2023, all using jdk-17.0.9, thread counts are growing until they hit their limits with cfsearch errors:
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:811) at org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96) at org.apache.http.impl.client.HttpClientBuilder.build(HttpClientBuilder.java:1238) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:337) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:348) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:286) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:273) at org.apache.solr.client.solrj.impl.HttpSolrClient.<init>(HttpSolrClient.java:204) at org.apache.solr.client.solrj.impl.HttpSolrClient$Builder.build(HttpSolrClient.java:968) at coldfusion.tagext.search.SolrUtils._getSolrClient(SolrUtils.java:1740) at coldfusion.tagext.search.SolrUtils.getSolrClient(SolrUtils.java:1688) at coldfusion.tagext.search.SolrUtils.getSolrDocCount(SolrUtils.java:270) at coldfusion.tagext.search.SearchTag.doSolrSearch(SearchTag.java:378) at coldfusion.tagext.search.SearchTag.doSearch(SearchTag.java:252) at coldfusion.tagext.search.SearchTag.doStartTag(SearchTag.java:190) at coldfusion.runtime.CfJspPage._emptyTcfTag(CfJspPage.java:5088) at cfsearch2ecfc1011696301$funcPERFORMSEARCH.runFunction
The thread count before the errors start depends on the DefaultTasksMax setting on the server, where it's lower the errors start earlier, some have over 15,000 threads with a higher limit.
Any advice on this is much appreciated.
Copy link to clipboard
Copied
To help those who may aid in diagnosis, please check the values for the "solr server" as indicated
in the cf admin: do they (server and port) point to a solr implementation on the same machine as cf? Or another? And do you know if it is the one implemented by the cf2023 installer? Or if by a separately downloaded cf addons-installer, and if so was that the 2023 version?
You can also check for and provide here the solr version info (and what Java it's using) via a helpful solr admin ui that many never notice: form a url by using that solr server name and portl, such as http:servername:port, and visit that. It should show a link for solr, and clicking that should show a useful solr admin ui.
I do realize that someone recognizing your problem readily may NOT need this info, but while we await that, it's something you can do that MAY prove helpful.
Copy link to clipboard
Copied
Thanks for your reply Charlie.
Solr is running on the same machines as CF, and was installed with the CF2023 add-ons installer. Solr version is 8.11.2, it's Java runtime is 17.0.6+9-LTS-190.
The solr admin is good for showing memory usage but I haven't seen any stats for thread usage unfortunately.
Copy link to clipboard
Copied
OK on all that. To be clear, I wasn't proposing that the Solr Admin would HELP with the thread usage problem (though there are ways it might). I was merely proposing using it as a means to know a) what does CF point to, b) does that respond as a web site, and c) what did it report for the version info. I asked all that for the sake of others who may help.
Finally, as for finding out what the thread usage (in CF) is about, I would propse that you might find more about that from other tools, whether JVM tools (like jconsole, jvmd, jvisualvm, jmc, etc.) or a tool like FusionReactor (the wonderful CF monitor--which can monitor MORE than just CF, to include that addon/Solr service).
But even then, such tools may not readily tell us WHY CF is using the threads, just by viewing them. But by viewing them over history and connecting the rise in threads to CF request processing, one may be able to connect dots. FR would be especially valuable for that. And while I can help one do that, it's not the sort of work to be done via a forum thread like this. It would be best done as an online screen-share session, which I can offer via my consulting (carehart.org/consulting).
But I realize you may feel this is a bug in CF, and as such you may prefer to wait for Adobe to respond and help--whether with a ready answer if they already undersand the problem as reported by others, or if they might want to arrange for additional diagnostics from you. Again, I asked what I did above to anticipate that somewhat.
I'll say finally that if you get no reply from them here, consider either sending an email to them (cfinstall@adobe.com) or open a bug report at tracker.adobe.com. Those are sure to get their attention, whereas here they may or may not reply (they often do, but there's no guarantee.)
Hope all that's helpful, to you or others who may find this thread.
Copy link to clipboard
Copied
Thanks again Charlie, all helpful suggestions. Understood, the solr admin has been useful to keep an eye on memory usage, seeing plenty of headroom there helped confirm it's a thread count issue. After your query on the java version I have tried changing solr on some servers to use the same version as CF, 17.0.9+11-LTS-201, the threads are still growing on those servers at this stage.
On the diagnostic tools I ran jcmd pid PerfCounter.print with the com.adobe.coldfusion.bootstrap.Bootstrap -start pid on a server with a high thread count, these were some thread related values:
java.threads.daemon=16430
java.threads.live=16466
java.threads.livePeak=16467
java.threads.started=41106
sun.gc.tlab.alloc=91293227
sun.gc.tlab.allocThreads=16426
sun.gc.tlab.fills=17473
sun.gc.tlab.gcWaste=5151448
sun.gc.tlab.maxFills=114
sun.gc.tlab.maxGcWaste=145790
sun.gc.tlab.maxRefillWaste=17176
sun.gc.tlab.maxSlowAlloc=62
sun.gc.tlab.refillWaste=142189
sun.gc.tlab.slowAlloc=174
And on the new pid after a CF restart:
java.threads.daemon=37
java.threads.live=61
java.threads.livePeak=62
java.threads.started=172
sun.gc.tlab.alloc=94603928
sun.gc.tlab.allocThreads=31
sun.gc.tlab.fills=83
sun.gc.tlab.gcWaste=43102762
sun.gc.tlab.maxFills=16
sun.gc.tlab.maxGcWaste=5370018
sun.gc.tlab.maxRefillWaste=10858
sun.gc.tlab.maxSlowAlloc=0
sun.gc.tlab.refillWaste=26881
sun.gc.tlab.slowAlloc=0
I have contacted Adobe for assistance and will post any updates here.
Copy link to clipboard
Copied
Just an update on this issue, thread dumps show most of the threads are of type "Connection evictor" which also appears to be the type being started in the error above (java.base/java.lang.Thread.start(Thread.java:811) at org.apache.http.impl.client.IdleConnectionEvictor.start(IdleConnectionEvictor.java:96)). Here's an example:
"Connection evictor" #430 daemon prio=5 os_prio=0 cpu=0.73ms elapsed=235.12s tid=0x00007f31d408e600 nid=0xd47 waiting on condition [0x00007f31a0cfe000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base@17.0.9/Native Method)
at org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
at java.lang.Thread.run(java.base@17.0.9/Thread.java:842)
Locked ownable synchronizers:
- None
Others have experienced this problem with Solr when upgrading versions:
https://www.mail-archive.com/solr-user@lucene.apache.org/msg142950.html
https://www.mail-archive.com/solr-user@lucene.apache.org/msg149643.html
Suggested here is changing how Solr clients are created to prevent creation of multiple clients:
Copy link to clipboard
Copied
While those other resources may seem promising, beware they could also be rabbit holes (not so clearly related as may seem). Until you/adobe may get to the bottom of things, I have different questions:
Copy link to clipboard
Copied
Thanks for your reply Charlie, answers to your questions:
1. The Solr Host Name on the Solr Server page in the CF admin is localhost, 127.0.0.1 also works and the port matches the Solr ui I've been using.
2. All cfsearch calls are successful until the thread limit is hit and the errors start.
3. I've just installed fusionreactor on the staging server I've been testing on, it also shows the TIMED_WAITING threads are the ones that accumulate, however the stack traces for threads of cfsearch all show the thread state WAITING:
"ajp-nio-127.0.0.1-8122-exec-2" Id=194 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4ada38b4
java.lang.Thread.State: WAITING
at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@4ada38b4
at java.base@17.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
at java.base@17.0.9/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3465)
at java.base@17.0.9/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3436)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
at java.base@17.0.9/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:141)
at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1114)
at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176)
at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.base@17.0.9/java.lang.Thread.run(Thread.java:842)
The ramps in this screenshot are when a test cfsearch is being hit once per second.
Copy link to clipboard
Copied
Hi Charlie & Happy New Year! Another update on this, I logged it as a bug after my last comment here and it was verified this week: https://tracker.adobe.com/#/view/CF-4220122
Thanks again for your advice above.
Copy link to clipboard
Copied
Hi Chris, We are having the exact same issue. Have Adobe got anywhere with it?
Many thanks
Ollie
Copy link to clipboard
Copied
Hi Ollie, I've only been going by the details on the bug which haven't changed for about a month since it was verified and given major priority. I had a support ticket open until shortly after that when Adobe asked to archive it, saying their dev team was working on it.
Copy link to clipboard
Copied
Hi Chris,
Thanks for the update. Hopefully they will get it sorted soon. Constantly having to watch open threads isn't ideal.
Many thanks
Ollie
Copy link to clipboard
Copied
Is this issue perhaps caused by Solr using an incorrect Garbage Collection setting? I reported a related JVM issue earlier today: https://tracker.adobe.com/#/view/CF-4221322
Suggestions:
(1) Change the GC setting in C:\ColdFusion2023\cfusion\jetty\jetty.lax from -XX:-UseParallelGC to -XX:+UseParallelGC . ;
(2) You will see that the maximum heap size (Xmx) in C:\ColdFusion2023\cfusion\jetty\jetty.lax is set to a default of -Xmx512m. So, I would configure the maximum heap size for ColdFusion 2023 to at least 4 times that. For example, to -Xmx2048m or higher.
Copy link to clipboard
Copied
Thanks for the suggestions BKBK, I've just tested those settings in the linux equivalent file to jetty.lax - cfusion/jetty/cfjetty but unfortunately the threads keep growing.
When I opened a ticket with Adobe in December they suggested trying -XX:+UseG1GC and -Xmx4096m but the issue persisted with those settings too.
Copy link to clipboard
Copied
Hi @-Chris ,
Thanks for the update. What are the following values?
- Operating System RAM;
- Xms and Xmx settings for ColdFusion;
Copy link to clipboard
Copied
Hi @BKBK, we've tested multiple servers with this issue with the following combinations:
8GB RAM with -Xms256m -Xmx1024m in jvm.config
16GB RAM with -Xms256m -Xmx1024m in jvm.config
16GB RAM with -Xms8192m -Xmx10240m in jvm.config
Copy link to clipboard
Copied
Hi Chris, thanks for the update. The suggestion I gave consists of two complementary parts: heap size in jetty.lax and ColdFusion's heap size. You have indeed discussed each.
However, my point is about the combination of both. In particular, I suggested you should make sure the heap size for ColdFusion is at least 4 times theat in jetty.lax. So I would suggest a test of the following combined scenario:
Remember to restart all the services after making the changes.
Copy link to clipboard
Copied
Hi BKBK, thanks for those details, I updated the values in the cfjetty and jvm.config files as suggested and rebooted the server before loading a cfsearch test page 100 times. The system threads increased from 265 to 470 during those page loads, adding approximately 2 threads per cfsearch, and stayed at 470 after the tests stopped.
Copy link to clipboard
Copied
Hi Chris, thanks for your explanation. Also, hats-off for your comprehemsive tests and reports, which will be of help to many fellow developers in future.
Copy link to clipboard
Copied
Chris (and Ollie), I still think there's value in looking into the solr server rather than cf. Cf's the CLIENT, and while that MAY be where the problem is, it could instead be in the server.
You had said you were using the version implemented by the addons service. (Is that the case for you, Ollie?) And was that indeed the cf2023 addons installer? It doesn't have to be, though it's generally wise.
More important, as for monitoring solr itself, I'd proposed in early Dec here in this thread that you consider implementing fusionreactor ON solr (that addons service) itself, or using jvm tools to monitor it. You replied about having implemented it in cf instead. (And since you have cf and solr on the same machine, note that the licensing of fr is per server, not per app.)
I'm pressing this because we can't know if the problem is indeed on the cf end or the solr end: the hanging evictor threads in cf may be caused by something hanging up or acting differently in the solr side.
Finally, as I offered in that reply, if you (or Ollie or anyone experiencing this issue) might want help diagnosing it, I'm available to help via remote screenshare on a consulting basis.
I realize you've opened a ticket with Adobe, and as I said if they might solve things with you/for you, great. If not, perhaps I can help. You won't pay for time you don't find to be valuable. Recall so that Adobe said in the ticket that they couldn't recreate the problem. This is why I wonder if there may be something we find on your end that they couldn't/didn't replicate.
Copy link to clipboard
Copied
Thanks @Charlie Arehart, solr was installed with the CF2023 add-ons installer. From memory I also tested using a CF2023 installer that had solr included, with the same result.
I'll look into doing some solr monitoring when time permits, but have been hoping for an Adobe bugfix for this. Although they hadn't replicated it in their first reply to the bug, after I provided some more info the bug status changed to 'to fix' with reason code BugVerified. The status has since changed to 'to test' with reason code NeedMoreInfo. I haven't had any further updates from Adobe, since they asked to archive my ticket and advised they were working on it.
I'll also just mention how easy it is to replicate, with a fresh install of CF2023 and solr, adding a solr collection from the CF admin and a test page with cfsearch, populating the collection is not required. I've seen the same result in Ubuntu 18, 20 and 22, monitoring the thread growth at a terminal with htop.
Copy link to clipboard
Copied
Chris, I appreciate how you feel this is easily replicated, with the implication being I try it myself. But both yesterday when I replied and now, I'm seeing this on my phone so cannot. And as much as l might hope to try on a computer, the day got away from me yesterday and may today.
FWIW, bkbk has not confirmed replicating it, but it's not clear if he's tried. And while we can infer from the ticket that Adobe has, since the status has changed it's also not clear. So I'm just saying that while you'd think anyone/everyone should be hitting it, there may well be something more unique about your situation than currently meets the eye. (I do appreciate you could assert that everyone IS hitting it and just doesn't know it.)
Let's hope we learn more, whether from more investigation from Adobe, or any of us, nor you if you opt to explore using FusionReactor to watch things from the solr side (it can give you insights into requests, threads, resources, and much more, and has a 14-day free trial).
Copy link to clipboard
Copied
Hi Charlie, the replication steps were intended for anyone who might like to try or check their servers for this issue, apologies I didn't mention that above. There have been a few others running into it here and on the bug, perhaps there aren't many yet using CF2023 on Ubuntu with enough cfsearch hits between CF restarts to find it. The bug status has changed back to 'To Fix' with reason code BugVerified since my last reply.
Copy link to clipboard
Copied
Yes, I understood that your offer of the replication steps was not for me but for everyone. 🙂
Yes, I agree that the number of people running CF2023 on Ubuntu using cfsearch (who would know how to even notice threads piling up) is a tiny number. (Heck, the number of people using CF2023 is smaller than all other versions combined, as it takes time for people to move en masse. And the number running CF on Linux at all is clearly a subset of all CF users--and it's not clear which is larger: CF running on Windows, Linux, Mac, or Solaris, though I have a guess. And the number of people using CF's solr integration at all on all CF versions is also a tiny percent. Add all those together, and yes, you end up with the "tiny number" I referred to.)
Finally, yes, the fact that it's marked now bugverified and tofix is excellent news. That certainly implies Adobe has recreated it, and hopefully soon they'll offer a fix (even before am update includes it). This has happened before--with CF having increasing threads, unrelated to cfsearch--and the fix immediately resolved the problem.
I do appreciate how frustrating the situation has been for those experiencing/noticing it. I never meant in any of my replies to belittle or depreciate the problem: my focus has always been to propose ways to confirm more diagnostics. That's what I do all day each day, with paying customers or in the community.
But again, often I've been seeing these responses in this thread when I awoke and was looking at my phone, so could not try even to recreate the problem. Today is the FIRST day I am typing a reply in this thread on my computer, and I would have tried finally now to recreate the problem...but since it seems Adobe now has, I'll move on to other matters. 🙂
Thanks for everyone who has contributed here. And if anyone learns that Adobe DOES offer a fix (before the next update), please do share the news, as some may well find that here. And I'd recommend you do that as a new "reply" rather than as a response to another reply, since the threaded nature of this forum means that some will never see some responses to some replies. But a top-level reply stands out differently. (Sadly, new replies don't appear at the top, so people do need to wade all the way down to the bottom to see what may be the "latest" reply, unless they notice the option for "jump to latest reply".)
Copy link to clipboard
Copied
we are now over 6 months, and Adobe have not realeased a fix?