Copy link to clipboard
Copied
A quick post in case anyone else ever runs in to this memory issue.
Our CF servers seemed to have a "memory leak", in that the memory being used (Windows) by the CF service would gradually grow until it was maxing out the entire server.
Long-story-short, we found (with the folks over at xByte cloud hosting) in a thread dump that we had tens of thousands of hung (waiting) threads with the name: sdk-ScheduledExecutor
Turns out, that thread is created by the AWS SDK for Java, which we are not using. Not surprisingly, though, ColdFusion does use a version of this SDK for their cloud services functionality. We use that extensively when interacting with S3. At first we thought there could be a bug in CF or in the SDK if it's spawning all these threads. But, when pouring over Adobe's docs we found the following little tidbit on this page:
Oh! The getCloudService() function must be in a shared scope? In true Adobe fashion, none of the examples they give on that page actually show this requirement in practice!
Anyway, our devs had missed that, entirely. All our functions that interacted with S3 were starting out by creating the service object with getCloudService().
Turns out, as far as we can tell, every time that function is called, CF fires up a 'sdk-ScheduledExecutor' thread. The thread does the work you give it, then sits in a waiting state for another job that, in our case, would never come. So, after a while, we'd accumulate 1000s!
Solution: make sure 'getCloudService()' is called in a shared scope and reuse that object!
Going forward...and maybe I've missed something.... it would be helpful to have a way to close, or kill, this serviceObject. As it stands, once you fire it up, it runs forever?
Cheers!
Will do, Charlie.
For all those who come here looking for help with a ColdFusion memory issue, the solution for us, in this case, was to be sure to only put getCloudService() objects into shared scopes. Do not create that object 'as needed' in a local scope.
Copy link to clipboard
Copied
@sdsinc_pmascari , thanks for sharing your findings. Your advice is quite instructive.
To answer your question, since the object that results from the getCloudService() call is stored in application-scope, it will not run forever. It will abide by the applicationTimeout value. That means, it will no longer be alive when the application times out.
Copy link to clipboard
Copied
Interesting, regarding object timing out when in the application scope. Yes, of course I would expect this to be true.
But, perhaps I'm misunderstanding something, so allow me to postulate....
What we had been doing was putting the object created by getCloudService() into the local variable scope. Our thinking was that object would only be alive when called upon during the intial page load. But, it spawned a thread that never died even though the original object no longer existed when the page processing completed.
Which makes me wonder... When the application times out, does CF actually go kill those threads, or does it fire up some new ones when the applicaiton re-initializes? Thus, leaving the original threads continuing to run?
This makes me wonder if loading this object to a server scope might make more sense to prevent to buildup of unused threads? Or, is that thinking fraught with peril?
Copy link to clipboard
Copied
Interesting stuff, indeed. Thanks for sharing, Paul.
1) First, as for your asking about the considerations relative to application timeouts, we should note that they might rarely or even never happen: the default is 2 days, and even if one lowers that (for an app or all), the duration is of course not "since the app was created" but rather "since the app was last used". And as some apps get traffic all the time, even if only from bots or monitoring calls, those might never timeout.
But sure, the server scope would seem a fine choice (assuming there are no app-specific characteristics to the object instance saved there).
2) And it will surely be interesting to see if time may show there to be more (to all this you've found) than meets the eye.
3) Also, did you confirm there was indeed a great reduction in how far memory now falls to, when you force a gc? That would confirm this was the cause of a seeming memory leak.
I get that reducing the high thread count is alone compelling, of course.
Copy link to clipboard
Copied
Garbage collection did almost nothing to help when the server memory was maxed. Admittedly, I am not an expert in this area, but a forced gc gave us no relief.
We are now 48+ hours after putting all getCloudService() objects into shared scopes and memory levels are back to "normal" across the board and stable.
In Fusion Reactor, I can see just a handful of threads handling our S3 actions. Previously, even trying to load the Thread Visualizer would bring the server to its knees due to the number of threads. One thread dump showed 65,000+ threads!
Copy link to clipboard
Copied
Paul, to clarify: I was not at all proposing that doing a GC at the time of the high memory would have "given relief". What I was asking is how you find heap use to be now (with "all being well"). So thanks for clarifying that 'memory levels are back to "normal"'.
I only mentioned doing a GC because you might have looked and found the heap "still seeming high" (now, with all well), but if you were to do a GC (click the button on the FR "system metrics" page, for example) and the heap were then to drop substantially, it would show that the JVM was just being lazy about doing garbage collection itself (and so such "seeming high" heap use was just a temporary thing at that time) .
Finally, it was clear from your previous reply that the number of threads had been high and now was not. Again, I was trying to connect all this to your original concern of high heap use. And now that you've said it's down, that would suggest that indeed there was more to the hgi thread count than merely "being so many": instead, it would seem that some aspect of those threads "remaining alive" was also causing some aspect of the heap to "remain in use".
What matters is that your code change has solved both problems. That's great, and if it seems this can be a valuable lesson learned for everyone. Again, thanks.
Copy link to clipboard
Copied
... allow me to postulate....
What we had been doing was putting the object created by getCloudService() into the local variable scope. Our thinking was that object would only be alive when called upon during the intial page load. But, it spawned a thread that never died even though the original object no longer existed when the page processing completed.
Which makes me wonder... When the application times out, does CF actually go kill those threads, or does it fire up some new ones when the applicaiton re-initializes? Thus, leaving the original threads continuing to run?
By @sdsinc_pmascari
What I think is that each object created in local scope spawns one or more threads. However, though the object is in local scope, the threads so created may live for the duration of the application. That is, I think the threads will only cease to exist when the application times out or is restarted, whichever occurs first.
You can confirm or refute this yourself. For example, by examining your application's threads using a tool such as FusionReactor, ColdFusion Performance Monitoring Toolset or VisualVM.
Anyway, there are good design reasons why:
I can imagine why such a thread is in WAITING state. Namely, because the thread is a worker on stand-by. As such, it is poised to go into action when needed. So, the application does actually need those threads to be alive. The issue is: not that many.
Hence the recommendation to store the object in a shared scope, such as application. The threads will then be spawned just once for the entire duration of the application (rather than thousands of times as in the case of local-scoped objects).
I imagine creating the object as being analogous to bringing forth and opening a can of worms. You may throw away the can afterwards (garbage-collection) , but the worms will still be around. So, if your application must do this, the most efficient way will be to do it once, using application scope.
When the object is local, as is currently the case in your application, a new object is created each and every time the CFM page or component is launched. Hence new threads are spawned each and every time. In other words, a new can of worms is produced and opened each time. My hypothesis is that that is how your application ended up with thousands and thousands of waiting threads.
This makes me wonder if loading this object to a server scope might make more sense to prevent to buildup of unused threads? Or, is that thinking fraught with peril?
By @sdsinc_pmascari
Yes, loading the object in server scope is fraught with peril. That leads to a design where objects could be propagated to applications that are unaware of the objects or don't need them. It would also lead to an increase in coupling between applications and could trigger race conditions.
To prevent a buildup of unused threads, load the objects in application scope instead. I hope the above arguments are convincing for this choice.
Copy link to clipboard
Copied
I should like to share two reports of an Amazon S3 issue similar to yours. It occurs in an environment completely different from ColdFusion:
https://github.com/aws/aws-sdk-java-v2/issues/3746
https://github.com/aws/aws-sdk-java-v2/issues/4991
It strengthens my hypothesis that:
Now, on to a related subject: the time-to-live of an object in a bucket. You can configure how Amazon S3 manages such an object during its lifetime. The documentation "ColdFusion and Amazon S3" shows you how to configure the Lifecycle Rules that define how Amazon S3 manages objects during their lifetime.
Copy link to clipboard
Copied
Thank you both for your comments. It has been very helpful.
We will keep all this in mind as we move forward.
Copy link to clipboard
Copied
My pleasure, @sdsinc_pmascari.
Copy link to clipboard
Copied
I hope you'll please update us on any subsequent findings or conclusions you come to, so that others finding this thread can make sense of all that was being shared.
Copy link to clipboard
Copied
Will do, Charlie.
For all those who come here looking for help with a ColdFusion memory issue, the solution for us, in this case, was to be sure to only put getCloudService() objects into shared scopes. Do not create that object 'as needed' in a local scope.