Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Possible Memory Leak - ColdFusion 2023 + Java 17.0.12

New Here ,
Oct 08, 2024 Oct 08, 2024

Ever since we upgraded from ColdFusion 2021 to ColdFusion 2023 we have been dealing with out of memory issues. ColdFusion will run fine for roughly 24-30 hours, then we will start seeing CPU spikes to 100% every 30 seconds. Garbage collection can't free up enough memory so ColdFusion eventually crashes and we have to restart the server.

 

Things we have tried that don't seem to help:

 

- Downgrading to 17.0.11

- Tweaking the min and max heap sizes

- Tweaking the caching settings

- Changing the garbage collector algorithm to G1GC

- Tweaking our websites to cache queries for a shorter period of time (1 hour down to 15 minutes down to 5 minutes)

 

Here are our current settings:

 

Min Heap: 8192

Max Heap: 8192

Garbage Collector: UseParallelGC

Cached Templates: 1000

Cached Queries: 5000

 

We do have Fusion Reactor installed on all of our servers but this is like trying to find a needle in a haystack. I really don't know what I should be looking at.

 

Here is a most recent screenshot from 2 days ago that shows the ventual demise on one of our servers.

 

web-03.png

 

I am really at my wit's end here. If this isn't a memory leak I don't know what the heck it is. If anyone has any recommendations on what to try next I would appreciate it.

9.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
replies 100 Replies 100
Community Expert ,
Oct 14, 2024 Oct 14, 2024

I compared your java.args setting to mine. The following flags are present in mine, but missing from yours:

 

--add-exports=java.base/sun.util.calendar=ALL-UNNAMED 
--add-exports=java.desktop/sun.awt.image=ALL-UNNAMED 
--add-exports=java.desktop/sun.awt=ALL-UNNAMED 
--add-exports=java.desktop/sun.java2d=ALL-UNNAMED 
--add-opens=java.base/java.security=ALL-UNNAMED 
--add-opens=java.base/java.time=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/sun.security.pkcs=ALL-UNNAMED 
--add-opens=java.base/sun.security.rsa=ALL-UNNAMED 
--add-opens=java.base/sun.security.util=ALL-UNNAMED 
--add-opens=java.base/sun.security.x509=ALL-UNNAMED 
-Dcoldfusion.iframe.allowedprotocols=file,ftp 
-Dcoldfusion.mail.oauth2=true 
-Dcoldfusion.number.allowdotsuffix=true 
-Dcoldfusion.searchimplicitscopes=true 
-Dcoldfusion.xml.allowPathCharacters=true 
-Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true 
-Dfile.encoding=UTF-8 
-Dhttps.protocols=TLSv1.2,TLSv1.3 
-Djava.awt.headless=true 
-Djdk.tls.client.protocols=TLSv1.2,TLSv1.3 
-Djdk.util.zip.disableZip64ExtraFieldValidation=true 
-Dlog4j2.formatMsgNoLookups=true 
-Duser.region=us 
-Duser.timezone=Europe/Amsterdam 

 

The question I wish to share with you and the forum is: Could any of these help with the memory problems?

 

In connection to this, see Adobe's document, "ColdFusion performance issues and troubleshooting".

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 14, 2024 Oct 14, 2024

BK, Charlie, Scott,

 

I think we figured out what's going on here. It was slow, but non stop bot traffic. I didn't spot it at first because it was so slow and measured. Once we starting digging into the reporting we have it was obvious it was a huge part of the problem. We made a few adjustments to the application via the robots.txt file and the application.cfc file and I am happy to report that things are looking much better.

 

- Sessions are down 40% from 10,000 to 6,000

- Memory heap has settled WAY down

- Overall RAM usage on the server is also WAY down

 

This is from the ColdFusion 2021 server a few minutes ago. You can see on the heap itself when the changes were made.

 

Screenshot 2024-10-14 175820.pngScreenshot 2024-10-14 180313.png

 

I'm going to continue to monitor everything over the next 24-48 hours just to make sure but I think we got it.

 

If you want I can share some of the details of what we did. Just let me know.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 14, 2024 Oct 14, 2024

Congratulations, @davecordes . The FusionReactor displays do indeed look much better.

 

It would be great to share what you did. It will certainly help fellow developers in future.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Oct 14, 2024 Oct 14, 2024

Excellent! Glad you got it sorted out. Those bots can really waste a lot of resources!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 15, 2024 Oct 15, 2024

Further good news: Update 11 of ColdFusion 2023, released today, has upgraded ColdFusion's PostgreSQL driver from version 42.5.1 to version 42.7.3.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 15, 2024 Oct 15, 2024

Thanks for the info BK!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Oct 31, 2024 Oct 31, 2024

Looking at your sessions graphs it seems you have a lot of j2ee sessions. Do you really need it for your applications? Did you tried disabling "Use J2EE session variables" from ColdFusion administrator?
In this article from Adobe you can find the differences between J2EE sessions and ColdFusion sessions: https://helpx.adobe.com/coldfusion/kb/difference-coldfusion-j2ee-session-management.html

Hi have your same issue and updated from CF2018 to CF2023 in these days, but I realized that for my applications I don't need J2EE session management. I'm monitoring the server but from today it seems much butter in memory management.

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 05, 2024 Nov 05, 2024

Can you share what kind of adjustment you did in the Application.cfc?

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 05, 2024 Nov 05, 2024

Basically, I just blocked certain bot traffic when they were filtering our ecommerce catalogs. That's what was causing most of the problems on our website. It looks like this.

 

<!--- Block bot traffic from filtering --->
<cfif IsDefined("URL.aid") AND ListLen(URL.aid)>
	<cfif FindNoCase("bingbot/2.0",CGI.HTTP_USER_AGENT)>
		<cfheader statuscode="403">
		<cfreturn false><!--- BLOCKED --->
	</cfif>
</cfif>

Also consider adding the same thing in your robots.txt file.

User-agent: *
Disallow: *aid=*
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 05, 2024 Nov 05, 2024

I have the same issues, someone could help to understand what is wrong?
I have a server with CF2018 running without problems and now with CF2023 server cannot stay live for more than 6 hours, then it becomes unresponsive because of the heap becoming full.

Screenshot_31-10-2024_155337_10.0.2.252.jpeg
This is how it looks when become unresponsive.

And this is the memory situation

Screenshot_5-11-2024_161826_10.0.2.252.jpeg

 The old gen memory grows and the eden space becomes thinner. I tried also zGC but the result is the same. Is there a way to make this thing working?

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 05, 2024 Nov 05, 2024

It looks like you're getting spikes in JDBC activity when the CPU spikes occur. Could it be a poor performing query?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2024 Nov 05, 2024

freeman, I think your problem is not what you presume--but I still hope I can offer a couple of things to help.

 

First, the screenshots do not match your assertions: the heap does not show being "full" (top right corner of first graph, top left corner of second). And while the oldgen (second column/third row in second graph) is indeed growing, it's barely half its max.

 

Second, you say these screenshots are what you see when CF "becomes unresponsive". The first screenshot shows being taken at your 16:53:30 CET. As FR was able to show you this screen, then it would seem CF was responsive (as this FR on-prem UI runs WITHIN the JVM along with CF). 

 

Perhaps as important, the right side of the top left 4 graphs (of that first screenshot) show there is NO activity (no running or incoming requests or queries) in the 15 seconds before this point in time--though the top right heap graph and bottom right CPU graphs show they are being populated right up to the time of the screenshot. 

 

If that inactivity is what you're referring to (perhaps you or customers are making requests which "hang"), I would contend that your problem is not "in CF" nor "in the jvm" (and not related to the heap or the GCs). Instead, it seems CF is no longer being GIVEN requests to run...and for that I would suggest a possible culprit is that web server connector (or the web server, whether IIS or Apache), or perhaps something in front of those.

 

I'll repeat: if FR was RESPONSIVE and these graphs were updating during this time, then CF was "fine".  If instead you may say "oh, no. once I took the screenshot I then found that FR was itself no longer updating", that would be very different.

 

But then I would ask then whether you are accessing FR through its builtin web server port (like 8088) or using FR's capability to use a modified url without a port (yourdomain/fusionreactor.cfm/findex.htm) to respond by way of your web server (like IIS or Apache). If the latter case, then if the web server or connector stopped letting traffic into CF it would also stop letting that modified URL through as well.

 

FWIW I'll note that your second graph DOES show that the "code heap" graph is approaching its max. That is controlled (indirectly) by a JVM arg called ReservedCodeCacheSize, which defaults to 240m on most modern OS's and JVM versions...and your max of 116 is consistent with that default (which is split inobviously among the 3 codeheap memory spaces). In any case, hitting that max won't make CF hang: it would just prevent JIT compilation in the JVM--which might have some impact, but not what you're seeing.

 

You should take look as well at your coldfusion-error.log file, to see if it contains any sort of outofmemory error. just before this hangup That WOULD indicate if CF was indeed hitting any such heap or other memory error. My sense is you'll find none.

 

Finally, if indeed this issue you're having proves to be unrelated to the original post's contention of a "memory leak in cf2023" (which turned out instead to be an impact of excessive automated traffic), you may want to consider creating a new discussion. This one's few dozen replies are now split over 4 "pages" in this forum's UI...and that will make it ever harder for you to get casual readers to see (let alone, reply) to your issue. Not a command, just a suggestion.


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2024 Nov 05, 2024

Hi @xfreeman89x , The first printscreen does show a worrying scenario. The ColdFusion application's memory usage reaches the maximum heap size (Xmx) several times within a period of one minute, I can therefore imagine that the application then became unresponsive. 

 

I cannot see anything that suggests that the garbage collector is to blame. More likely, your application is maintaining just too many live objects on the heap.

 

So I would suggest you keep using the default collector, ParallelGC. Your problem shares many similarities with that of @davecordes. So, implement the suggestions given in this thread. They will help you rule out the red-herrings and identify the likely root causes. Think, for example, of cached objects, live objects held in scopes. and so on.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 06, 2024 Nov 06, 2024

Hi BK, hi Charlie,
thank you for your tips! I'm trying to resolve with different solution but without any success for now.
This is the situation with ParallelGC. I don't think it is going better. Very soon the GC will be so frequent that the server will be unresponsive because of the CPU load. I cannot undestand this behaviour. Why does the old gen memory become higher and higher?

Screenshot_6-11-2024_153029_10.0.2.252.jpeg

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 06, 2024 Nov 06, 2024

Because you have some code that's creating objects that can't be gc'ed. Changing gc algorithms is not the solution.

 

And your move to cf2023 is no more likely the cause for you than it was for Dave, who originally opened this discussion.

 

Perhaps like him you might simply be suffering from unexpected new rates of bot traffic. Your first screenshot shows you getting about 30 cf requests per second. Is that to be expected? If you still have your cf2018 available, and was it running fr, within the past 30 days? If so, then you could view the fr archived metrics to see its depiction of requests per second when traffic was going through that cf2018 instance.

 

You could also look at your daily, weekly, and monthly fr reports (if you configured fr to send you email), which also track the count of requests against cf. Did those counts change over time?

 

If none of those prove helpful, have you read all the discussions in these 4 pages of messages, offering various diagnostic ideas?


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Nov 06, 2024 Nov 06, 2024

Hi Charlie,
this is the situation now

Screenshot_6-11-2024_212643_10.0.2.252.jpeg

CPU is rising and server is slower then before. And this happened after less than a day of uptime. I made also a diff of memory head dump before and after a GC and it seems that the object in memory are increasing instead of decreasing.

Screenshot_6-11-2024_213250_10.0.2.252.jpeg


I have same kind of server with ColdFusion 2018, the same load (it is attached to the same load balancer with same applications) and the same settings and all works fine. The only difference is that CF2018 uses G1GC, but I tried also in CF2023 and the result is still a giant collection of old gen objects.

Screenshot 2024-11-06 at 21-35-56 Memory Overview - FusionReactor - production.cf2018.png

 What is happening? Why those objects live forever in the old gen? How can I find if my code is the problem or if it is a CF issue? 
To my eyes this appear to be a memory leakage issue but I don't know how to reproduce to make Adobe aware of this.

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024
quote

CPU is rising and server is slower then before. And this happened after less than a day of uptime. I made also a diff of memory head dump before and after a GC and it seems that the object in memory are increasing instead of decreasing.

...
I have same kind of server with ColdFusion 2018, the same load (it is attached to the same load balancer with same applications) and the same settings and all works fine. The only difference is that CF2018 uses G1GC, but I tried also in CF2023 and the result is still a giant collection of old gen objects.

... 

What is happening? Why those objects live forever in the old gen? How can I find if my code is the problem or if it is a CF issue? 
To my eyes this appear to be a memory leakage issue but I don't know how to reproduce to make Adobe aware of this.


By @xfreeman89x

 

@xfreeman89x , most of your descriptions, remarks and questions actually repeat those of @davecordes . As a forum, we explored his situation in detail, offering many suggestions and solutions.

 

So I would suggest that you follow the discussions right from the beginning. Go through everything, patiently. You will likely find clues, or even solutions, for your own situation.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024

@xfreeman89x , do you think your situation is different from @davecordes's? There is of course the alternative of starting a new thread of your own.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 24, 2024 Nov 24, 2024

@xfreeman89x , how did things go with your move from ColdFusion 2018 to ColdFusion 2023? If the issues were unresolved, could you please start a new thread?

 

That will help the forum reach a solution faster. Also, the issues you faced are of interest to many fellow developers in the forum. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024

As far as I know, there aren't any true memory leaks in CF because CF is built in Java, which doesn't allow you to introduce memory leaks. And you partially answer your own question by pointing out how many old gen objects you have on the Java heap. Changing GC settings isn't going to do anything about old gen objects, because they're not garbage. So to ultimately solve your problem, you have to figure out how to reduce the number of old gen objects, or accommodate your environment to handle larger numbers of them.

 

To that end, I'm going to echo @BKBK and @Charlie Arehart and probably a bunch of others in asking you to block bots, which can easily create a lot of long-lived Java objects as part of the Session scope. This may fix your problem, but even if it doesn't, it should help you narrow the search.

 

Dave Watts, Eidolon LLC
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 30, 2024 Nov 30, 2024

@xfreeman89x , can you let us know how things worked out for you?

 

You'd reported here on Nov 6 that you were having heap problems, similar to Dave Cordes (who started this thread). We pointed to various aspects of what had been shared with Dave, but it's not clear if those helped, or if you found any new info to share. (In the end, he found his problem was really about new, excessive bot traffic--so it was NOT about CF2023 after all.)

 

Of course, it's not clear to us if your problem even remains, now a few weeks on.  🙂 Can you confirm? Even if only to report that the problem remains unresolved for you? And if so, are you just restarting CF every couple of days to "solve" the problem? 

 

One other thing: you've indicated in another forum post that you're running on an AWS AMI.  Indeed, one reason I'm asking about the state of things for you is that I know someone else who is running on an AMI, and they ARE on update 11, yet they are experiencing unexpectedly high heap that can't be GCed (so grows to the heap limit and eventually CF must be restarted). As you're on an AMI as well, maybe there's another thing specific to AMI's that could be investigated (or perhaps someone will identify a new known issue).

 

(Speaking of issues unique to AMI's, and as xfreeman89x already knows, those using them should be aware of another issue: those on an AMI--on AWS OR Azure--will find if they DO apply the latest (Oct 2024) CF update, CF will fails start properly after that. This is fixed with an Adobe hotfix, as discussed by him in another forum post here. You will need to ask Adobe for that hf202300-4224138.jar, at cfsup@adobe.com . This is all one more reason I wonder if there's yet some other memory issue that may be affecting those on AMIs. BTW, this is not to be confused with a DIFFERENT memory issue, reported in another thread by @sdsinc_pmascari and resolved by a code change about calls TO cloud services--rather than merely running CF *on* an AMI.)

 

Finally, xfreeman89x, if your response to this might be more than just a simple one (especially if your problem remains, or you would ask any questions), I REALLY think it would be best if you would open a new thread here in the forums. This discusison of your problem is buried 4 pages down in the replies to Dave Cordes' original post. Folks may have a hard time seeing it, let alone sorting out the thread references, indentation, etc.  If you DO do that, please offar a link here as a reply, and interested folks can follow along there.

HTH.


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 02, 2025 Apr 02, 2025

Hi all,
after about 4 months of emails with Adobe support, I’ve finally gotten to the bottom of the issue we discussed in this thread. In my case there was a memory leak issue related to Ehcache with SQL Server, which caused many references to remain in memory, like the ones you can see in the image below. The image depicts the "leak suspect report" of Eclipse MAT of a typical heap dump. As you can see there is a large retained heap related to coldfusion.sql.Executive and coldfusion.sql.Executive and net.sf.ehcache.store.chm.SelectableConcurrentHashMap.

image (1).png


Thanks to the patience of the Adobe agent who diligently forwarded various heap dumps and server logs to the engineering team, we were able to obtain a hotfix that resolved the issue, bringing memory usage to a very stable state. Below, you can see the difference between the behavior before and after applying the patch.

Before the patch:


After the patch:

image (3).png

The internal bug that was raised is CF-4224890.

I don't know if the situation is the same of @davecordes and I don't know if he or anyone other runs the same configuration with EHCache + SQL Server but finally, after 4 months, we’re ready to migrate to CF2023.

Best regards,
Salvatore

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 02, 2025 Apr 02, 2025

Thanks for offering that post-mortem, Salvatore. I'll add that while he got a fix for that for cf2023, once he told me about it I'd been helping someone with a similar problem on cf2021, and the Adobe folks created a fix for that as well.

 

So bottom line: anyone facing excessive heap use when using cfquery caching (not even limited to sql server) should reach out to cfsup@adobe.com and ask for the hotfix jar file for that tracker ticket id, indicating your cf version. For now they've asked us to do that vs sharing the fix files publicly. 


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 02, 2025 Apr 02, 2025

@xfreeman89x , thanks for sharing that.

Apparently, the picture "Before the patch" didn't come through. Could you please publish it?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Apr 02, 2025 Apr 02, 2025

@BKBK sorry! Here is the picture I missed. The one before the hotfix.

image.png

 

Salvatore

Salvatore Cerruto
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources