• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Clustered CF Instances Hang, Difficult to Restart

Participant ,
Jul 26, 2021 Jul 26, 2021

Copy link to clipboard

Copied

Hello all.  Seems We have run into another issue with our newly deployed ColdFusion 2021 servers.  We experienced a similar error back with CF8, and earlier update levels of CF10.  It seems  the issue was resolved in later updates of CF10, but now with CF2021, this issue seems to be back again.

 

We have three physical bare metal servers, each running four clustered instances of coldfusion.  All of these are behind a fortinet load balancer.  While the CF instances on each box are nammed the same (cfusion1 - cfusion4) each box's CF cluster is on a different port to eliminate multicast confusion between machines.  We have changed channelSendOptions to 6 in the server.xml files in order to reduce the number of "Session Already Invalidated" error messages in the coldfusion-error.log files.

 

While we dont have much problem restarting instances when the server has been removed from the load balancer, we do see difficulty restarting instances even under moderate load.   The CF instance will appear to start fine form the command line, however the instance never starts taking traffic, and the  CFIDE/administrator for that instance will not load.  Upon trying to stop the hung instance, we get an error:

 

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion stop
Stopping ColdFusion 2021 server, please wait
Jul 22, 2021 10:37:43 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started.
ColdFusion 2021 server has been stopped

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

 

There doesn't appear to be an useful information in the coldfusion-error.log, nor the logs of it's peers.

 

Do I need to make any adjustments to the tomcat cluster timeouts perhaps?  Amd I missing some other type of best practice when clustering CF instances?   Any suggestions on how to troubleshoot this further?

 

Thanks for any advice, 
-Tony

 

 

Views

1.8K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 31, 2022 Jan 31, 2022

Copy link to clipboard

Copied

Hi BKBK - 

 

After speaking with management, they have decided to simply move to BackupManager and consider the issue closed, or at least temporarily remediated.  They dont feel the impact to production, nor the expendature of time is worth persuing when a viable solution seems to exist.  So I guess for now, that's  that.

It looks like someone from Adobe picked up my bug report and has asked for some additional information.  I'll point them towards this thread.   If they need additional info that wont disrupt production, I'll be happy to provide whatever I can.  If  they come to any conclusion, Ill post back here for closure.

 

Thank you all again for  your help in the matter.  I wish we had found the smoking gun, but at least it seems a workaround has resolved the big issue for us.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 31, 2022 Jan 31, 2022

Copy link to clipboard

Copied

Hi @GuitsBoy ,

 

Thanks for the update.

 

In your present situation, BackupManager is indeed preferable to DeltaManager. The choice is backed up ( no pun 🙂 ) by Christopher Schultz of the Tomcat team, no less. There is some comfort in that.

 

You, @Charlie Arehart , @Dave Watts and I have given the problem a good bash. It's now up to the ColdFusion Team. I trust that, with their extensive resources and Tomcat know-how, they will be able to find an answer soon.

  

For now, thanks for your generous supply of information on this issue and for your collaboration. I learned a lot from our discussions. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 31, 2022 Jan 31, 2022

Copy link to clipboard

Copied

I HAVE BEEN SUMMONED FROM MY CAVE AND AM COMPELLED TO GIVE YOU ONE ANSWER. THAT ANSWER IS, DO NOT USE CF'S BUILT-IN CLUSTERING, BECAUSE IT HAS BEEN CONDEMNED BY THE GODS ABOVE AND BELOW.

 

No, seriously, don't use that. It's a bundled feature that can be easily substituted with one of the many best-of-breed products out on the market that solves just this one problem in a much better way. If I recall correctly, CF's cluster management is peer-to-peer, so the more peers you have, the less well it'll perform. (Maybe this has changed recently, I don't know, the overall object lesson still stands, because this is a problem long solved with every other clustering/LB product.)

 

Dave Watts, Eidolon LLC 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 31, 2022 Jan 31, 2022

Copy link to clipboard

Copied

LATEST
 

I HAVE BEEN SUMMONED FROM MY CAVE AND AM COMPELLED TO GIVE YOU ONE ANSWER. THAT ANSWER IS, DO NOT USE CF'S BUILT-IN CLUSTERING, BECAUSE IT HAS BEEN CONDEMNED BY THE GODS ABOVE AND BELOW.

 

 

Dave Watts, Eidolon LLC 

Your opportunity,

ColdFusion Team.

Challenge the odds,

Defy the Gods.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 09, 2022 Jan 09, 2022

Copy link to clipboard

Copied

Tony, it's been almost six months. All this back and forth among us all (in recent weeks even) has not solved your problem, as I feared could happen. Some problems just can't be easily solved this way, and what some consider a "stock implementation" with "little traffic" which "should just work" often turns out to have unexpected explanations for why things do not work. Most important, sometimes there are diagnostics that may be necessary to enable to really understand what's going on.

 

But there are just too many settings and diagnostics to consider to lay them all out here in the forums, as well as to then CONSIDER the implications of WHAT those diagnostics may show, let alone WHAT alternative settings values could be considered and HOW those may affect still other settings.

 

All this is why I said from the outset that for some challenges there's just no substitute for having direct remote assistance from an experienced troubleshooter. And even if the person asking for help or their mgt might presume their existing staff were just as experienced as anyone else (or more) to solve things, sometimes the help comes simply from having someone look at things objectively, perhaps asking questions or posing possibilities that had not been considered or were presumed not to possibly be the issue. 

 

Again I'll say I do this with folks daily, with a 99% satisfaction rate (on avg each year, for 15 years, as tracked by time marked "refunded"). And even when I may work on a problem with some aspect I may have "never seen before", I'm still usually able to solve it. And even when capable people may have spent days, weeks, or months trying on their own, I am nearly always able to solve problems quickly, often less than an hour.

 

And as for when it might somehow take longrr, I've mentioned my satisfaction guarantee: you won't pay for time you don't find valuable-- even if we solve it but it took "longer than mgt was willing to pay for". I just so lament seeing problems that plague someone for months when we may solve it in hours or even minutes.

 

Please remind your mgt of the satisfaction guarantee. It seems they have nothing to lose and only your time, sanity, and server stability to gain. 🙂 See my consulting page for more, including my online calendar to grab a slot even tomorrow. Or email or call me at the contact info offered and we can even meet today, Sunday, if that's somehow preferable.


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 10, 2022 Jan 10, 2022

Copy link to clipboard

Copied

Hi Charlie.   Ive brought up your name no fewer than ten times to both my director and our CTO in the past six months.  Both of them are quite familiar with your name, and while they both like the idea, it never seems to go beyond that.   Please believe me, Ive made mention a number of times, but for reasons unknown to me, they drag their feet on  the issue.   Perhaps mainly because this is not a day to day problem.  The servers are quite stable and sound.  Its only when we need to restart the instance for JVM config changes or their ilk,  that difficulties arise.  And even then, its just a matter of hammering away at it until it eventually comes up cleanly.  Frustrating, yes, but it doesnt interrupt business continuity, so it's apparently deems unimportant.

 

For what it's worth, we have this identical issue dataing back to CF8 and CF10, and both times a random CF update managed to fix the issue.  They may simply be waiting for someone who spends their entire day dealing with CF to figure out these issues with Adobe, and just wait for an update.  As it is, in these past six months, weve only needed to restart these instances a handful of times.  My guess is that it's simply not causing enough pain to make it a priority.

 

I will continue to suggest your services with every opportunity.

 

Thanks, 
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 10, 2022 Jan 10, 2022

Copy link to clipboard

Copied

Ok,. Tony. The reasoning for letting it linger make economic sense to someone, I guess. To be clear, though, this not at all a common problem. If they're thinking "let's let someone else help Adobe find the problem", I fear that will be a fruitless wait.

 

On the other hand, since it's SO reproducible for you, it really seems we'd have little challenge finding the cause. I'd even go further and say that we may not even need to WAIT for another outage, if there may be diagnostics available that we could assess any time after such a restart. Or maybe we could enable some, so that the next time it happens there may be more.

 

Again, it may seem I'm preaching to the choir (you), but my real goal here is to try to share thoughts to help you oversome what may be other reasons they're letting it linger. 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 10, 2022 Jan 10, 2022

Copy link to clipboard

Copied

 

The servers are quite stable and sound.  Its only when we need to restart the instance for JVM config changes or their ilk,  that difficulties arise.  And even then, its just a matter of hammering away at it until it eventually comes up cleanly.  

 


By @GuitsBoy

 

Could you share the contents of the jvm.config file?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation