• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Possible Memory Leak - ColdFusion 2023 + Java 17.0.12

New Here ,
Oct 08, 2024 Oct 08, 2024

Copy link to clipboard

Copied

Ever since we upgraded from ColdFusion 2021 to ColdFusion 2023 we have been dealing with out of memory issues. ColdFusion will run fine for roughly 24-30 hours, then we will start seeing CPU spikes to 100% every 30 seconds. Garbage collection can't free up enough memory so ColdFusion eventually crashes and we have to restart the server.

 

Things we have tried that don't seem to help:

 

- Downgrading to 17.0.11

- Tweaking the min and max heap sizes

- Tweaking the caching settings

- Changing the garbage collector algorithm to G1GC

- Tweaking our websites to cache queries for a shorter period of time (1 hour down to 15 minutes down to 5 minutes)

 

Here are our current settings:

 

Min Heap: 8192

Max Heap: 8192

Garbage Collector: UseParallelGC

Cached Templates: 1000

Cached Queries: 5000

 

We do have Fusion Reactor installed on all of our servers but this is like trying to find a needle in a haystack. I really don't know what I should be looking at.

 

Here is a most recent screenshot from 2 days ago that shows the ventual demise on one of our servers.

 

web-03.png

 

I am really at my wit's end here. If this isn't a memory leak I don't know what the heck it is. If anyone has any recommendations on what to try next I would appreciate it.

Views

1.9K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 14, 2024 Oct 14, 2024

Copy link to clipboard

Copied

I compared your java.args setting to mine. The following flags are present in mine, but missing from yours:

 

--add-exports=java.base/sun.util.calendar=ALL-UNNAMED 
--add-exports=java.desktop/sun.awt.image=ALL-UNNAMED 
--add-exports=java.desktop/sun.awt=ALL-UNNAMED 
--add-exports=java.desktop/sun.java2d=ALL-UNNAMED 
--add-opens=java.base/java.security=ALL-UNNAMED 
--add-opens=java.base/java.time=ALL-UNNAMED 
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/sun.security.pkcs=ALL-UNNAMED 
--add-opens=java.base/sun.security.rsa=ALL-UNNAMED 
--add-opens=java.base/sun.security.util=ALL-UNNAMED 
--add-opens=java.base/sun.security.x509=ALL-UNNAMED 
-Dcoldfusion.iframe.allowedprotocols=file,ftp 
-Dcoldfusion.mail.oauth2=true 
-Dcoldfusion.number.allowdotsuffix=true 
-Dcoldfusion.searchimplicitscopes=true 
-Dcoldfusion.xml.allowPathCharacters=true 
-Dcom.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true 
-Dfile.encoding=UTF-8 
-Dhttps.protocols=TLSv1.2,TLSv1.3 
-Djava.awt.headless=true 
-Djdk.tls.client.protocols=TLSv1.2,TLSv1.3 
-Djdk.util.zip.disableZip64ExtraFieldValidation=true 
-Dlog4j2.formatMsgNoLookups=true 
-Duser.region=us 
-Duser.timezone=Europe/Amsterdam 

 

The question I wish to share with you and the forum is: Could any of these help with the memory problems?

 

In connection to this, see Adobe's document, "ColdFusion performance issues and troubleshooting".

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 14, 2024 Oct 14, 2024

Copy link to clipboard

Copied

BK, Charlie, Scott,

 

I think we figured out what's going on here. It was slow, but non stop bot traffic. I didn't spot it at first because it was so slow and measured. Once we starting digging into the reporting we have it was obvious it was a huge part of the problem. We made a few adjustments to the application via the robots.txt file and the application.cfc file and I am happy to report that things are looking much better.

 

- Sessions are down 40% from 10,000 to 6,000

- Memory heap has settled WAY down

- Overall RAM usage on the server is also WAY down

 

This is from the ColdFusion 2021 server a few minutes ago. You can see on the heap itself when the changes were made.

 

Screenshot 2024-10-14 175820.pngScreenshot 2024-10-14 180313.png

 

I'm going to continue to monitor everything over the next 24-48 hours just to make sure but I think we got it.

 

If you want I can share some of the details of what we did. Just let me know.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 14, 2024 Oct 14, 2024

Copy link to clipboard

Copied

Congratulations, @Dave Cordes . The FusionReactor displays do indeed look much better.

 

It would be great to share what you did. It will certainly help fellow developers in future.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Oct 14, 2024 Oct 14, 2024

Copy link to clipboard

Copied

Excellent! Glad you got it sorted out. Those bots can really waste a lot of resources!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 15, 2024 Oct 15, 2024

Copy link to clipboard

Copied

Further good news: Update 11 of ColdFusion 2023, released today, has upgraded ColdFusion's PostgreSQL driver from version 42.5.1 to version 42.7.3.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 15, 2024 Oct 15, 2024

Copy link to clipboard

Copied

Thanks for the info BK!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Oct 31, 2024 Oct 31, 2024

Copy link to clipboard

Copied

Looking at your sessions graphs it seems you have a lot of j2ee sessions. Do you really need it for your applications? Did you tried disabling "Use J2EE session variables" from ColdFusion administrator?
In this article from Adobe you can find the differences between J2EE sessions and ColdFusion sessions: https://helpx.adobe.com/coldfusion/kb/difference-coldfusion-j2ee-session-management.html

Hi have your same issue and updated from CF2018 to CF2023 in these days, but I realized that for my applications I don't need J2EE session management. I'm monitoring the server but from today it seems much butter in memory management.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

Can you share what kind of adjustment you did in the Application.cfc?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

Basically, I just blocked certain bot traffic when they were filtering our ecommerce catalogs. That's what was causing most of the problems on our website. It looks like this.

 

<!--- Block bot traffic from filtering --->
<cfif IsDefined("URL.aid") AND ListLen(URL.aid)>
	<cfif FindNoCase("bingbot/2.0",CGI.HTTP_USER_AGENT)>
		<cfheader statuscode="403">
		<cfreturn false><!--- BLOCKED --->
	</cfif>
</cfif>

Also consider adding the same thing in your robots.txt file.

User-agent: *
Disallow: *aid=*

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

I have the same issues, someone could help to understand what is wrong?
I have a server with CF2018 running without problems and now with CF2023 server cannot stay live for more than 6 hours, then it becomes unresponsive because of the heap becoming full.

Screenshot_31-10-2024_155337_10.0.2.252.jpeg
This is how it looks when become unresponsive.

And this is the memory situation

Screenshot_5-11-2024_161826_10.0.2.252.jpeg

 The old gen memory grows and the eden space becomes thinner. I tried also zGC but the result is the same. Is there a way to make this thing working?

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

It looks like you're getting spikes in JDBC activity when the CPU spikes occur. Could it be a poor performing query?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

freeman, I think your problem is not what you presume--but I still hope I can offer a couple of things to help.

 

First, the screenshots do not match your assertions: the heap does not show being "full" (top right corner of first graph, top left corner of second). And while the oldgen (second column/third row in second graph) is indeed growing, it's barely half its max.

 

Second, you say these screenshots are what you see when CF "becomes unresponsive". The first screenshot shows being taken at your 16:53:30 CET. As FR was able to show you this screen, then it would seem CF was responsive (as this FR on-prem UI runs WITHIN the JVM along with CF). 

 

Perhaps as important, the right side of the top left 4 graphs (of that first screenshot) show there is NO activity (no running or incoming requests or queries) in the 15 seconds before this point in time--though the top right heap graph and bottom right CPU graphs show they are being populated right up to the time of the screenshot. 

 

If that inactivity is what you're referring to (perhaps you or customers are making requests which "hang"), I would contend that your problem is not "in CF" nor "in the jvm" (and not related to the heap or the GCs). Instead, it seems CF is no longer being GIVEN requests to run...and for that I would suggest a possible culprit is that web server connector (or the web server, whether IIS or Apache), or perhaps something in front of those.

 

I'll repeat: if FR was RESPONSIVE and these graphs were updating during this time, then CF was "fine".  If instead you may say "oh, no. once I took the screenshot I then found that FR was itself no longer updating", that would be very different.

 

But then I would ask then whether you are accessing FR through its builtin web server port (like 8088) or using FR's capability to use a modified url without a port (yourdomain/fusionreactor.cfm/findex.htm) to respond by way of your web server (like IIS or Apache). If the latter case, then if the web server or connector stopped letting traffic into CF it would also stop letting that modified URL through as well.

 

FWIW I'll note that your second graph DOES show that the "code heap" graph is approaching its max. That is controlled (indirectly) by a JVM arg called ReservedCodeCacheSize, which defaults to 240m on most modern OS's and JVM versions...and your max of 116 is consistent with that default (which is split inobviously among the 3 codeheap memory spaces). In any case, hitting that max won't make CF hang: it would just prevent JIT compilation in the JVM--which might have some impact, but not what you're seeing.

 

You should take look as well at your coldfusion-error.log file, to see if it contains any sort of outofmemory error. just before this hangup That WOULD indicate if CF was indeed hitting any such heap or other memory error. My sense is you'll find none.

 

Finally, if indeed this issue you're having proves to be unrelated to the original post's contention of a "memory leak in cf2023" (which turned out instead to be an impact of excessive automated traffic), you may want to consider creating a new discussion. This one's few dozen replies are now split over 4 "pages" in this forum's UI...and that will make it ever harder for you to get casual readers to see (let alone, reply) to your issue. Not a command, just a suggestion.


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 05, 2024 Nov 05, 2024

Copy link to clipboard

Copied

Hi @xfreeman89x , The first printscreen does show a worrying scenario. The ColdFusion application's memory usage reaches the maximum heap size (Xmx) several times within a period of one minute, I can therefore imagine that the application then became unresponsive. 

 

I cannot see anything that suggests that the garbage collector is to blame. More likely, your application is maintaining just too many live objects on the heap.

 

So I would suggest you keep using the default collector, ParallelGC. Your problem shares many similarities with that of @Dave Cordes. So, implement the suggestions given in this thread. They will help you rule out the red-herrings and identify the likely root causes. Think, for example, of cached objects, live objects held in scopes. and so on.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 06, 2024 Nov 06, 2024

Copy link to clipboard

Copied

Hi BK, hi Charlie,
thank you for your tips! I'm trying to resolve with different solution but without any success for now.
This is the situation with ParallelGC. I don't think it is going better. Very soon the GC will be so frequent that the server will be unresponsive because of the CPU load. I cannot undestand this behaviour. Why does the old gen memory become higher and higher?

Screenshot_6-11-2024_153029_10.0.2.252.jpeg

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 06, 2024 Nov 06, 2024

Copy link to clipboard

Copied

Because you have some code that's creating objects that can't be gc'ed. Changing gc algorithms is not the solution.

 

And your move to cf2023 is no more likely the cause for you than it was for Dave, who originally opened this discussion.

 

Perhaps like him you might simply be suffering from unexpected new rates of bot traffic. Your first screenshot shows you getting about 30 cf requests per second. Is that to be expected? If you still have your cf2018 available, and was it running fr, within the past 30 days? If so, then you could view the fr archived metrics to see its depiction of requests per second when traffic was going through that cf2018 instance.

 

You could also look at your daily, weekly, and monthly fr reports (if you configured fr to send you email), which also track the count of requests against cf. Did those counts change over time?

 

If none of those prove helpful, have you read all the discussions in these 4 pages of messages, offering various diagnostic ideas?


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 06, 2024 Nov 06, 2024

Copy link to clipboard

Copied

Hi Charlie,
this is the situation now

Screenshot_6-11-2024_212643_10.0.2.252.jpeg

CPU is rising and server is slower then before. And this happened after less than a day of uptime. I made also a diff of memory head dump before and after a GC and it seems that the object in memory are increasing instead of decreasing.

Screenshot_6-11-2024_213250_10.0.2.252.jpeg


I have same kind of server with ColdFusion 2018, the same load (it is attached to the same load balancer with same applications) and the same settings and all works fine. The only difference is that CF2018 uses G1GC, but I tried also in CF2023 and the result is still a giant collection of old gen objects.

Screenshot 2024-11-06 at 21-35-56 Memory Overview - FusionReactor - production.cf2018.png

 What is happening? Why those objects live forever in the old gen? How can I find if my code is the problem or if it is a CF issue? 
To my eyes this appear to be a memory leakage issue but I don't know how to reproduce to make Adobe aware of this.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024

Copy link to clipboard

Copied

quote

CPU is rising and server is slower then before. And this happened after less than a day of uptime. I made also a diff of memory head dump before and after a GC and it seems that the object in memory are increasing instead of decreasing.

...
I have same kind of server with ColdFusion 2018, the same load (it is attached to the same load balancer with same applications) and the same settings and all works fine. The only difference is that CF2018 uses G1GC, but I tried also in CF2023 and the result is still a giant collection of old gen objects.

... 

What is happening? Why those objects live forever in the old gen? How can I find if my code is the problem or if it is a CF issue? 
To my eyes this appear to be a memory leakage issue but I don't know how to reproduce to make Adobe aware of this.


By @xfreeman89x

 

@xfreeman89x , most of your descriptions, remarks and questions actually repeat those of @Dave Cordes . As a forum, we explored his situation in detail, offering many suggestions and solutions.

 

So I would suggest that you follow the discussions right from the beginning. Go through everything, patiently. You will likely find clues, or even solutions, for your own situation.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024

Copy link to clipboard

Copied

@xfreeman89x , do you think your situation is different from @Dave Cordes's? There is of course the alternative of starting a new thread of your own.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 07, 2024 Nov 07, 2024

Copy link to clipboard

Copied

LATEST

As far as I know, there aren't any true memory leaks in CF because CF is built in Java, which doesn't allow you to introduce memory leaks. And you partially answer your own question by pointing out how many old gen objects you have on the Java heap. Changing GC settings isn't going to do anything about old gen objects, because they're not garbage. So to ultimately solve your problem, you have to figure out how to reduce the number of old gen objects, or accommodate your environment to handle larger numbers of them.

 

To that end, I'm going to echo @BKBK and @Charlie Arehart and probably a bunch of others in asking you to block bots, which can easily create a lot of long-lived Java objects as part of the Session scope. This may fix your problem, but even if it doesn't, it should help you narrow the search.

 

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation