Skip to main content
Inspiring
July 26, 2021
Question

Clustered CF Instances Hang, Difficult to Restart

  • July 26, 2021
  • 3 replies
  • 10717 views

Hello all.  Seems We have run into another issue with our newly deployed ColdFusion 2021 servers.  We experienced a similar error back with CF8, and earlier update levels of CF10.  It seems  the issue was resolved in later updates of CF10, but now with CF2021, this issue seems to be back again.

 

We have three physical bare metal servers, each running four clustered instances of coldfusion.  All of these are behind a fortinet load balancer.  While the CF instances on each box are nammed the same (cfusion1 - cfusion4) each box's CF cluster is on a different port to eliminate multicast confusion between machines.  We have changed channelSendOptions to 6 in the server.xml files in order to reduce the number of "Session Already Invalidated" error messages in the coldfusion-error.log files.

 

While we dont have much problem restarting instances when the server has been removed from the load balancer, we do see difficulty restarting instances even under moderate load.   The CF instance will appear to start fine form the command line, however the instance never starts taking traffic, and the  CFIDE/administrator for that instance will not load.  Upon trying to stop the hung instance, we get an error:

 

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion stop
Stopping ColdFusion 2021 server, please wait
Jul 22, 2021 10:37:43 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started.
ColdFusion 2021 server has been stopped

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

 

There doesn't appear to be an useful information in the coldfusion-error.log, nor the logs of it's peers.

 

Do I need to make any adjustments to the tomcat cluster timeouts perhaps?  Amd I missing some other type of best practice when clustering CF instances?   Any suggestions on how to troubleshoot this further?

 

Thanks for any advice, 
-Tony

 

 

    This topic has been closed for replies.

    3 replies

    Charlie Arehart
    Community Expert
    Community Expert
    January 9, 2022

    Tony, it's been almost six months. All this back and forth among us all (in recent weeks even) has not solved your problem, as I feared could happen. Some problems just can't be easily solved this way, and what some consider a "stock implementation" with "little traffic" which "should just work" often turns out to have unexpected explanations for why things do not work. Most important, sometimes there are diagnostics that may be necessary to enable to really understand what's going on.

     

    But there are just too many settings and diagnostics to consider to lay them all out here in the forums, as well as to then CONSIDER the implications of WHAT those diagnostics may show, let alone WHAT alternative settings values could be considered and HOW those may affect still other settings.

     

    All this is why I said from the outset that for some challenges there's just no substitute for having direct remote assistance from an experienced troubleshooter. And even if the person asking for help or their mgt might presume their existing staff were just as experienced as anyone else (or more) to solve things, sometimes the help comes simply from having someone look at things objectively, perhaps asking questions or posing possibilities that had not been considered or were presumed not to possibly be the issue. 

     

    Again I'll say I do this with folks daily, with a 99% satisfaction rate (on avg each year, for 15 years, as tracked by time marked "refunded"). And even when I may work on a problem with some aspect I may have "never seen before", I'm still usually able to solve it. And even when capable people may have spent days, weeks, or months trying on their own, I am nearly always able to solve problems quickly, often less than an hour.

     

    And as for when it might somehow take longrr, I've mentioned my satisfaction guarantee: you won't pay for time you don't find valuable-- even if we solve it but it took "longer than mgt was willing to pay for". I just so lament seeing problems that plague someone for months when we may solve it in hours or even minutes.

     

    Please remind your mgt of the satisfaction guarantee. It seems they have nothing to lose and only your time, sanity, and server stability to gain. 🙂 See my consulting page for more, including my online calendar to grab a slot even tomorrow. Or email or call me at the contact info offered and we can even meet today, Sunday, if that's somehow preferable.

    /Charlie (troubleshooter, carehart. org)
    GuitsBoyAuthor
    Inspiring
    January 10, 2022

    Hi Charlie.   Ive brought up your name no fewer than ten times to both my director and our CTO in the past six months.  Both of them are quite familiar with your name, and while they both like the idea, it never seems to go beyond that.   Please believe me, Ive made mention a number of times, but for reasons unknown to me, they drag their feet on  the issue.   Perhaps mainly because this is not a day to day problem.  The servers are quite stable and sound.  Its only when we need to restart the instance for JVM config changes or their ilk,  that difficulties arise.  And even then, its just a matter of hammering away at it until it eventually comes up cleanly.  Frustrating, yes, but it doesnt interrupt business continuity, so it's apparently deems unimportant.

     

    For what it's worth, we have this identical issue dataing back to CF8 and CF10, and both times a random CF update managed to fix the issue.  They may simply be waiting for someone who spends their entire day dealing with CF to figure out these issues with Adobe, and just wait for an update.  As it is, in these past six months, weve only needed to restart these instances a handful of times.  My guess is that it's simply not causing enough pain to make it a priority.

     

    I will continue to suggest your services with every opportunity.

     

    Thanks, 
    -Tony

    Charlie Arehart
    Community Expert
    Community Expert
    January 11, 2022

    Ok,. Tony. The reasoning for letting it linger make economic sense to someone, I guess. To be clear, though, this not at all a common problem. If they're thinking "let's let someone else help Adobe find the problem", I fear that will be a fruitless wait.

     

    On the other hand, since it's SO reproducible for you, it really seems we'd have little challenge finding the cause. I'd even go further and say that we may not even need to WAIT for another outage, if there may be diagnostics available that we could assess any time after such a restart. Or maybe we could enable some, so that the next time it happens there may be more.

     

    Again, it may seem I'm preaching to the choir (you), but my real goal here is to try to share thoughts to help you oversome what may be other reasons they're letting it linger. 

    /Charlie (troubleshooter, carehart. org)
    GuitsBoyAuthor
    Inspiring
    December 20, 2021

    Seems this issue has still not been fixed.  I installed Update3 today to fix the log4j vulnerability and have EXTREME difficulty getting all my instances to come back up again.  It takes as many as ten tries to sucessfully restart an instance.  I'm fairly confident I had opened a bug on this, but I didnt see it listed.  Perhaps it was under a different (work) account.  Either way, I re-submitted the bug under this account.

     

    Just for my own sanity, can someone look over what modifications need to be made from the stock server.xml config for my particular environment?

     

    To sum up, we have three bare metal RHEL8 servers, (node1 - node3).  Each server has coldfusion enterprise installed, and has four additional instances each (cfusion1 - cfusion4).

     

    Here's a sample server.xml cluster block:

    <Cluster channelSendOptions="6" className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
    <Channel className="org.apache.catalina.tribes.group.GroupChannel">
    <Membership address="228.0.0.1" port="45501" className="org.apache.catalina.tribes.membership.McastService" dropTime="3000" frequency="500"/>
    <Receiver selectorTimeout="5000" address="auto" autoBind="100" port="4011" className="org.apache.catalina.tribes.transport.nio.NioReceiver" maxThreads="6"/>
    <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
    <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
    </Sender>
    <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
    <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"/>
    </Channel>
    <Valve filter="" className="org.apache.catalina.ha.tcp.ReplicationValve"/>
    <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
    <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
    </Cluster>

     

    Both the Membership Address and the Membership Port have been changed to unique entries for each cluster / physical server. (Total of 3 values)

     

    The receiver port has been changed to be unique across all twelve instances.

     

    By default, all I should really have to change is the Membership Port to be unique for each of teh three clusters.  I shouldnt need to change the Membership Address, nor the Receiver port, but I am trying to avoid any potential port conflicts.  Still, this doesnt seem to help at all.

     

    The only thing that does seem to help is to make each Membership Port unique for each of teh 12 instances, however I believe this puts each of the 12 instances onto it's own individual cluster of 1, and there would be no failover or session replication.  However when we do have the instances isolated, they seem to happily restart every (or almost every) time.

     

    I cant imagine we're the only people to run CF with this topology, and the fact that it's still broken after six months is really frustrating.  

    Does anyone else have any suggestions?  Thank you,

     

     

     

     

    Charlie Arehart
    Community Expert
    Community Expert
    December 21, 2021

    Yes, I have a suggestion. But first, I do have a question. You refer to going to update 3, but can you please clarify whether you had previously been on update 2? or were you on something earlier?

     

    I ask becuase if you look at the technote for update 2, there is a mention of "known issues", one of which is this:

    "When the multicast port is busy in your environment, there will be errors in logs after restarting the instances that are part of a cluster. To resolve the issue, in the ColdFusion Admin, change the multicast port in the Cluster Manager page"

     

    I realize it refers to "errors" and you are instead finding that the instance will not start.  We can't know for sure (from this alone) whether this does or does not relate to a problem like you have.

     

    And you may say, "ok, but we said we moved to update 3". I heard that. But update 3 ONLY fixed the log4j issue. It did not fix ANY problems (bugs or known issues) in update 2. As such, if you hit this problem in update 2, you will hit it still in update 3. Even if you were on update 1 or earlier, though, it's not clear if the lack of mention of this in the technote for update 1 means it was NOT known, only that it may not have been documented

     

    And I will admit that when I saw this in the technote for update 2, I wondered what Adobe may have meant about "change the multicast port". Change it to what? How to know if it should be changed to some port that IS or IS NOT already in use. But I wanted to point this out for you in seeking "any suggestions". 

     

    Give it some consideration, and try changing it to something else if you well, and let us know how it goes, either way.

     

    And if you may feel that you need more insight into this, rather than ask here (which may or may not get a response from Adobe on this), send an email to cfsup@adobe.com, and ask them to elaborate on what this point about really meant for you. That's a simpler objective question compared to instead asking them for help on your specific problem (though of course, you could consider asking them that also...I just recommend you ask that separately, as that may be much harder to get help for than this first simpler question.)

     

    Hope that helps.

    /Charlie (troubleshooter, carehart. org)
    GuitsBoyAuthor
    Inspiring
    December 21, 2021

    Thanks Charlie.  We certainly were on update 2, as well as a hotfix for the broken CFReports / Jasper issue you also helped me find a workaround for.  Incidentally, update3 blows away any hotfixes or patches, and they need to be reapplied after update 3, which is how I found myself back in this whole mess today.

    Since we have  three physical servers, each with its own cluster, and separate multicast ports, we have already changed the port from default on at least two of the three boxes.  We noticed the issue persists across all three machines.  It seems like if that were the fix, it would have already have been in place, unless the port needs  to be changes again AFTER update 2 was in place.

     

    Quite honestly, I'm a little unfamiliar with in what capacity adobe actually offers support on their product.  I was once told they offer basic installation support, but nothing beyond plain vanilla install, and no customization.  Considering the licenses are nearly 10K a pop, its a little disappointing they dont even reply to my emails.  But perhaps I'm asking too much or sending it to the wrong place.  Ill try the email you suggested.

    Thanks again, 
    -Tony

    GuitsBoyAuthor
    Inspiring
    July 26, 2021

    Sorry, I should note that this is a Red Hat Enterprise Linux 8.4 OS, running ColdFusion2021 Update 1, and using the shipped CF JRE.

    BKBK
    Community Expert
    Community Expert
    July 29, 2021

    Doesn't sound good.

    What appears in coldfusion-error.log and server.log? Could you share the contents?

    Community Expert
    July 30, 2021

    Sure.   Ill try to pull some info out from the restarts we did while under load.

     

    Here is the coldfusion-error.log during one such shutdown where I got the error:

     

    # /opt/ColdFusion2021/cfusion1/bin/coldfusion stop
    Stopping ColdFusion 2021 server, please wait
    Jul 22, 2021 10:37:43 PM com.adobe.coldfusion.launcher.Launcher stopServer
    SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started.
    ColdFusion 2021 server has been stopped

     

     

    Jul 22, 2021 10:34:36 PM org.apache.coyote.AbstractProtocol stop
    INFO: Stopping ProtocolHandler ["http-nio-8501"]
    Jul 22, 2021 10:34:36 PM org.apache.coyote.AbstractProtocol stop
    INFO: Stopping ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
    Jul 22, 2021 10:34:41 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
    INFO: The Apache Tomcat Native library which allows using OpenSSL was not found on the java.library.path: [/opt/ColdFusion2021/cfusion1/bin/../lib:/opt/ColdFusion2021/cfusion1/bin/../lib/_linux64::/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]
    Jul 22, 2021 10:34:41 PM org.apache.coyote.AbstractProtocol init
    INFO: Initializing ProtocolHandler ["http-nio-8501"]
    Jul 22, 2021 10:34:42 PM org.apache.coyote.AbstractProtocol init
    INFO: Initializing ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
    Jul 22, 2021 10:34:42 PM org.apache.catalina.core.StandardService startInternal
    INFO: Starting service [Catalina]
    Jul 22, 2021 10:34:42 PM org.apache.catalina.core.StandardEngine startInternal
    INFO: Starting Servlet engine: [Apache Tomcat/9.0.41]
    Jul 22, 2021 10:34:42 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
    INFO: Cluster is about to start
    Jul 22, 2021 10:34:42 PM org.apache.catalina.tribes.transport.ReceiverBase bind
    INFO: Receiver Server Socket bound to:[/10.10.240.111:4001]
    Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.util.UUIDGenerator <clinit>
    INFO: Creation of SecureRandom instance for UUID generation using [DRBG] took [8,013] milliseconds.
    Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
    INFO: Setting cluster mcast soTimeout to [500]
    Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[4]
    Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14416877, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14383954, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=14350371, securePort=-1, UDP Port=-1, id={62 125 69 96 -17 114 72 18 -73 41 -81 6 1 -15 118 38 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Done sleeping, membership established, start level:[4]
    Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[8]
    Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
    INFO: Created a buffer pool with max size:[104857600] bytes of type: [org.apache.catalina.tribes.io.BufferPool15Impl]
    Jul 22, 2021 10:34:52 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Done sleeping, membership established, start level:[8]
    Jul 22, 2021 10:34:55 PM org.apache.catalina.util.SessionIdGeneratorBase createSecureRandom
    WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [2,581] milliseconds.
    Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager startInternal
    INFO: Register manager [localhost#] to cluster element [Engine] with name [Catalina]
    Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager startInternal
    INFO: Starting clustering manager at [localhost#]
    Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
    INFO: Manager [localhost#], requesting session state from [org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14421378, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={}, domain={}]]. This operation will timeout if no session state has been received within [60] seconds.
    Jul 22, 2021 10:34:55 PM java.io.ObjectInputFilter$Config lambda$static$0
    INFO: Creating serialization filter from !org.mozilla.**;!com.sun.syndication.**;!org.apache.commons.beanutils.**
    Jul 22, 2021 10:35:00 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
    INFO: Manager [localhost#]; session state sent at [7/22/21 10:34 PM] received in [5,210] ms.
    Jul 22, 2021 10:35:20 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
    INFO: Received member disappeared:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14447694, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}]]
    Jul 22, 2021 10:35:38 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=1005, securePort=-1, UDP Port=-1, id={-69 96 -18 100 -111 -42 73 90 -67 73 78 -44 -48 23 -31 17 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:36:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
    INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14469480, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]] message. Will verify.
    Jul 22, 2021 10:36:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
    INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14469480, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:36:27 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=1005, securePort=-1, UDP Port=-1, id={99 -46 79 6 30 126 66 -17 -93 -20 98 18 126 -46 109 -122 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:36:55 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
    INFO: Received member disappeared:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=14475665, securePort=-1, UDP Port=-1, id={62 125 69 96 -17 114 72 18 -73 41 -81 6 1 -15 118 38 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}]]
    Jul 22, 2021 10:37:15 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=1005, securePort=-1, UDP Port=-1, id={-95 52 111 -107 -57 105 66 -3 -73 15 -76 125 67 -64 90 125 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:38:48 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
    INFO: The Apache Tomcat Native library which allows using OpenSSL was not found on the java.library.path: [/opt/ColdFusion2021/cfusion1/bin/../lib:/opt/ColdFusion2021/cfusion1/bin/../lib/_linux64::/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]
    Jul 22, 2021 10:38:48 PM org.apache.coyote.AbstractProtocol init
    INFO: Initializing ProtocolHandler ["http-nio-8501"]
    Jul 22, 2021 10:38:48 PM org.apache.coyote.AbstractProtocol init
    INFO: Initializing ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
    Jul 22, 2021 10:38:48 PM org.apache.catalina.core.StandardService startInternal
    INFO: Starting service [Catalina]
    Jul 22, 2021 10:38:48 PM org.apache.catalina.core.StandardEngine startInternal
    INFO: Starting Servlet engine: [Apache Tomcat/9.0.41]
    Jul 22, 2021 10:38:48 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
    INFO: Cluster is about to start
    Jul 22, 2021 10:38:48 PM org.apache.catalina.tribes.transport.ReceiverBase bind
    INFO: Receiver Server Socket bound to:[/10.10.240.111:4001]
    Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.util.UUIDGenerator <clinit>
    INFO: Creation of SecureRandom instance for UUID generation using [DRBG] took [8,011] milliseconds.
    Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
    INFO: Setting cluster mcast soTimeout to [500]
    Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[4]
    Jul 22, 2021 10:38:57 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Done sleeping, membership established, start level:[4]
    Jul 22, 2021 10:38:57 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[8]
    Jul 22, 2021 10:38:58 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
    INFO: Done sleeping, membership established, start level:[8]
    Jul 22, 2021 10:39:01 PM org.apache.catalina.util.SessionIdGeneratorBase createSecureRandom
    WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [2,567] milliseconds.
    Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager startInternal
    INFO: Register manager [localhost#] to cluster element [Engine] with name [Catalina]
    Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager startInternal
    INFO: Starting clustering manager at [localhost#]
    Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
    INFO: Manager [localhost#]: skipping state transfer. No members active in cluster group.
    Jul 22, 2021 10:39:02 PM org.apache.catalina.core.ApplicationContext log
    INFO: ColdFusionStartUpServlet: ColdFusion: Starting application services
    Jul 22, 2021 10:39:02 PM org.apache.catalina.core.ApplicationContext log
    INFO: ColdFusionStartUpServlet: ColdFusion: VM version = 11.0.1+13-LTS
    Jul 22, 2021 10:39:07 PM java.io.ObjectInputFilter$Config lambda$static$0
    INFO: Creating serialization filter from !org.mozilla.**;!com.sun.syndication.**;!org.apache.commons.beanutils.**
    Jul 22, 2021 10:39:08 PM org.apache.catalina.ha.session.JvmRouteBinderValve startInternal
    INFO: JvmRouteBinderValve started
    Jul 22, 2021 10:39:08 PM org.apache.coyote.AbstractProtocol start
    INFO: Starting ProtocolHandler ["http-nio-8501"]
    Jul 22, 2021 10:39:08 PM org.apache.coyote.AbstractProtocol start
    INFO: Starting ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
    Jul 22, 2021 10:39:08 PM com.adobe.coldfusion.launcher.Launcher run
    INFO: Server startup in 19576 ms
    Jul 22, 2021 10:39:35 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
    INFO: Created a buffer pool with max size:[104857600] bytes of type: [org.apache.catalina.tribes.io.BufferPool15Impl]
    Jul 22, 2021 10:39:35 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
    INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=1005, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:39:44 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
    INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=4507, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]] message. Will verify.
    Jul 22, 2021 10:39:44 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
    INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=4507, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]]
    Jul 22, 2021 10:39:44 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

     

     

     

     

     

     

    And here's the same timeframe from server.log

     

    "Information","main","07/22/21","18:34:10",,"Starting logging..."
    "Information","main","07/22/21","18:34:10",,"Starting license..."
    "Information","main","07/22/21","18:34:12",,"Enterprise Edition enabled"
    "Information","main","07/22/21","18:34:12",,"Starting crypto..."
    "Information","main","07/22/21","18:34:12",,"Installed JSafe JCE provider: Version 6.21 Crypto-J 6.2.1, EMC Corporation. JsafeJCE Security Provider (implements RSA, DSA, ECDSA, Diffie-Hellman, ECDH, A
    ES, DES, Triple DES, DESX, RC2, RC4, RC5, PBE, MD2, MD5, RIPEMD160, SHA1, SHA224, SHA256, SHA384, SHA512, HMAC-MD5, HMAC-RIPEMD160, HMAC-SHA1, HMAC-SHA224, HMAC-SHA256, HMAC-SHA384, HMAC-SHA512, HMACD
    RBG, HASHDRBG, CTRDRBG, FIPS186PRNG, SHA1PRNG, MD5PRNG; RFC 3394, RFC 5649 AES Key Wrap; X.509 CertificateFactory; PKCS12, PKCS15 KeyStore; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathValidator
    s; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathBuilders; LDAP, Collection CertStores)"
    "Information","main","07/22/21","18:34:12",,"Starting security..."
    "Information","main","07/22/21","18:34:12",,"Starting scheduler..."
    "Information","main","07/22/21","18:34:12",,"Starting WatchService..."
    "Information","main","07/22/21","18:34:12",,"Starting sql..."
    "Information","main","07/22/21","18:34:13",,"Starting runtime..."
    "Information","main","07/22/21","18:34:14",,"Starting client..."
    "Information","main","07/22/21","18:34:14",,"Starting archive..."
    "Information","main","07/22/21","18:34:14",,"Starting CloudConfig..."
    "Information","main","07/22/21","18:34:14",,"Starting VendorCredential..."
    "Information","main","07/22/21","18:34:14",,"Starting rest..."
    "Information","main","07/22/21","18:34:14",,"Starting registry..."
    "Information","main","07/22/21","18:34:14",,"Package adminapi started..."
    "Information","main","07/22/21","18:34:14",,"Package administrator started..."
    "Information","main","07/22/21","18:34:14",,"Package redissessionstorage started..."
    "Information","main","07/22/21","18:34:14",,"Package debugger started..."
    "Information","main","07/22/21","18:34:14",,"Package zip started..."
    "Information","main","07/22/21","18:34:14",,"Package image started..."
    "Information","main","07/22/21","18:34:14",,"Package caching started..."
    "Information","main","07/22/21","18:34:14",,"Package cfmongodb started..."
    "Information","main","07/22/21","18:34:14",,"Package mail started..."
    "Information","main","07/22/21","18:34:14",,"Package spreadsheet started..."
    "Information","main","07/22/21","18:34:15",,"Package axis started..."
    "Information","main","07/22/21","18:34:15",,"Package chart started..."
    "Information","main","07/22/21","18:34:15",,"Package feed started..."
    "Information","main","07/22/21","18:34:15",,"Package print started..."
    "Information","main","07/22/21","18:34:15",,"Package search started..."
    "Information","main","07/22/21","18:34:15",,"Package document started..."
    "Information","main","07/22/21","18:34:15",,"Package presentation started..."
    "Information","main","07/22/21","18:34:15",,"Package eventgateways started..."
    "Information","main","07/22/21","18:34:15",,"Package dotnet started..."
    "Information","main","07/22/21","18:34:15",,"Package pmtagent started..."
    "Information","main","07/22/21","18:34:15",,"Package htmltopdf started..."
    "Information","main","07/22/21","18:34:15",,"Package awslambda started..."
    "Information","main","07/22/21","18:34:15",,"com package will not be deployed as it is not installed."
    "Information","main","07/22/21","18:34:15",,"Package saml started..."
    "Information","main","07/22/21","18:34:15",,"Package awss3 started..."
    "Information","main","07/22/21","18:34:15",,"Package awss3legacy started..."
    "Information","main","07/22/21","18:34:15",,"Package azureblob started..."
    "Information","main","07/22/21","18:34:15",,"Package pdf started..."
    "Information","main","07/22/21","18:34:15",,"Package websocket started..."
    "Error","Thread-24","07/22/21","18:34:15",,"Connect to 127.0.0.1:8993 [/127.0.0.1] failed: Connection refused (Connection refused) http://127.0.0.1:8993/PDFgServlet/"
    "Information","main","07/22/21","18:34:15",,"WebSocket server listens on port: 8576"
    "Information","main","07/22/21","18:34:15",,"Package orm started..."
    "Information","main","07/22/21","18:34:15",,"Package ormsearch started..."
    "Information","main","07/22/21","18:34:15",,"Package ajax started..."
    "Information","main","07/22/21","18:34:15",,"Package derby started..."
    "Information","main","07/22/21","18:34:15",,"Package oracle started..."
    "Information","main","07/22/21","18:34:15",,"Package mysql started..."
    "Information","main","07/22/21","18:34:15",,"Package db2 started..."
    "Information","main","07/22/21","18:34:15",,"Package sybase started..."
    "Information","main","07/22/21","18:34:15",,"Package postgresql started..."
    "Information","main","07/22/21","18:34:15",,"Package sqlserver started..."
    "Information","main","07/22/21","18:34:15",,"odbc package will not be deployed as it is not installed."
    "Information","main","07/22/21","18:34:15",,"Package scheduler started..."
    "Information","main","07/22/21","18:34:15",,"Package ftp started..."

    "Information","main","07/22/21","18:34:15",,"Package awssqs started..."
    "Information","main","07/22/21","18:34:15",,"Package awssns started..."
    "Information","main","07/22/21","18:34:15",,"Package azureservicebus started..."
    "Information","main","07/22/21","18:34:15",,"Package awsdynamodb started..."
    "Information","main","07/22/21","18:34:15",,"Package report started..."
    "Information","main","07/22/21","18:34:15",,"Package exchange started..."
    "Information","main","07/22/21","18:34:15",,"Package sharepoint started..."
    "Information","main","07/22/21","18:34:15",,"ColdFusion started"
    "Information","main","07/22/21","18:34:15",,"ColdFusion: application services are now available"
    "Information","Thread-10","07/22/21","18:34:17",,"A same serial number has been found on another ColdFusion server. The server may be out of compliance."
    "Information","Thread-27","07/22/21","22:34:36",,"ColdFusion stopped"
    "Information","main","07/22/21","22:39:02",,"Starting logging..."
    "Information","main","07/22/21","22:39:02",,"Starting license..."
    "Information","main","07/22/21","22:39:04",,"Enterprise Edition enabled"
    "Information","main","07/22/21","22:39:04",,"Starting crypto..."
    "Information","main","07/22/21","22:39:04",,"Installed JSafe JCE provider: Version 6.21 Crypto-J 6.2.1, EMC Corporation. JsafeJCE Security Provider (implements RSA, DSA, ECDSA, Diffie-Hellman, ECDH, AES, DES, Triple DES, DESX, RC2, RC4, RC5, PBE, MD2, MD5, RIPEMD160, SHA1, SHA224, SHA256, SHA384, SHA512, HMAC-MD5, HMAC-RIPEMD160, HMAC-SHA1, HMAC-SHA224, HMAC-SHA256, HMAC-SHA384, HMAC-SHA512, HMACDRBG, HASHDRBG, CTRDRBG, FIPS186PRNG, SHA1PRNG, MD5PRNG; RFC 3394, RFC 5649 AES Key Wrap; X.509 CertificateFactory; PKCS12, PKCS15 KeyStore; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathValidators; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathBuilders; LDAP, Collection CertStores)"
    "Information","main","07/22/21","22:39:04",,"Starting security..."
    "Information","main","07/22/21","22:39:04",,"Starting scheduler..."
    "Information","main","07/22/21","22:39:04",,"Starting WatchService..."
    "Information","main","07/22/21","22:39:04",,"Starting sql..."
    "Information","main","07/22/21","22:39:05",,"Starting runtime..."
    "Information","main","07/22/21","22:39:06",,"Starting client..."
    "Information","main","07/22/21","22:39:06",,"Starting archive..."
    "Information","main","07/22/21","22:39:06",,"Starting CloudConfig..."
    "Information","main","07/22/21","22:39:06",,"Starting VendorCredential..."
    "Information","main","07/22/21","22:39:06",,"Starting rest..."
    "Information","main","07/22/21","22:39:06",,"Starting registry..."
    "Information","main","07/22/21","22:39:06",,"Package adminapi started..."
    "Information","main","07/22/21","22:39:06",,"Package administrator started..."
    "Information","main","07/22/21","22:39:06",,"Package redissessionstorage started..."
    "Information","main","07/22/21","22:39:06",,"Package debugger started..."
    "Information","main","07/22/21","22:39:06",,"Package zip started..."
    "Information","main","07/22/21","22:39:06",,"Package image started..."
    "Information","main","07/22/21","22:39:06",,"Package caching started..."
    "Information","main","07/22/21","22:39:06",,"Package cfmongodb started..."
    "Information","main","07/22/21","22:39:06",,"Package mail started..."
    "Information","main","07/22/21","22:39:06",,"Package spreadsheet started..."
    "Information","main","07/22/21","22:39:07",,"Package axis started..."
    "Information","main","07/22/21","22:39:07",,"Package chart started..."
    "Information","main","07/22/21","22:39:07",,"Package feed started..."
    "Information","main","07/22/21","22:39:07",,"Package print started..."
    "Information","main","07/22/21","22:39:07",,"Package search started..."
    "Information","main","07/22/21","22:39:07",,"Package document started..."
    "Information","main","07/22/21","22:39:07",,"Package presentation started..."
    "Information","main","07/22/21","22:39:07",,"Package eventgateways started..."
    "Information","main","07/22/21","22:39:07",,"Package dotnet started..."
    "Information","main","07/22/21","22:39:07",,"Package pmtagent started..."
    "Information","main","07/22/21","22:39:07",,"Package htmltopdf started..."
    "Information","main","07/22/21","22:39:07",,"Package awslambda started..."
    "Information","main","07/22/21","22:39:07",,"com package will not be deployed as it is not installed."
    "Information","main","07/22/21","22:39:07",,"Package saml started..."
    "Information","main","07/22/21","22:39:07",,"Package awss3 started..."
    "Information","main","07/22/21","22:39:07",,"Package awss3legacy started..."
    "Information","main","07/22/21","22:39:07",,"Package azureblob started..."
    "Information","main","07/22/21","22:39:07",,"Package pdf started..."
    "Information","main","07/22/21","22:39:07",,"Package websocket started..."
    "Error","Thread-24","07/22/21","22:39:07",,"Connect to 127.0.0.1:8993 [/127.0.0.1] failed: Connection refused (Connection refused) http://127.0.0.1:8993/PDFgServlet/"
    "Information","main","07/22/21","22:39:07",,"WebSocket server listens on port: 8576"
    "Information","main","07/22/21","22:39:07",,"Package orm started..."
    "Information","main","07/22/21","22:39:07",,"Package ormsearch started..."
    "Information","main","07/22/21","22:39:07",,"Package ajax started..."
    "Information","main","07/22/21","22:39:07",,"Package derby started..."
    "Information","main","07/22/21","22:39:07",,"Package oracle started..."
    "Information","main","07/22/21","22:39:07",,"Package mysql started..."
    "Information","main","07/22/21","22:39:07",,"Package db2 started..."
    "Information","main","07/22/21","22:39:07",,"Package sybase started..."

    "Information","main","07/22/21","22:39:07",,"Package postgresql started..."
    "Information","main","07/22/21","22:39:07",,"Package sqlserver started..."
    "Information","main","07/22/21","22:39:07",,"odbc package will not be deployed as it is not installed."
    "Information","main","07/22/21","22:39:07",,"Package scheduler started..."
    "Information","main","07/22/21","22:39:07",,"Package ftp started..."
    "Information","main","07/22/21","22:39:07",,"Package awssqs started..."
    "Information","main","07/22/21","22:39:07",,"Package awssns started..."
    "Information","main","07/22/21","22:39:07",,"Package azureservicebus started..."
    "Information","main","07/22/21","22:39:07",,"Package awsdynamodb started..."
    "Information","main","07/22/21","22:39:07",,"Package report started..."
    "Information","main","07/22/21","22:39:07",,"Package exchange started..."
    "Information","main","07/22/21","22:39:07",,"Package sharepoint started..."
    "Information","main","07/22/21","22:39:07",,"ColdFusion started"
    "Information","main","07/22/21","22:39:07",,"ColdFusion: application services are now available"
    "Information","Thread-10","07/22/21","22:39:09",,"A same serial number has been found on another ColdFusion server. The server may be out of compliance."

     

     

     

    Thanks for any suggestions you might have, 
    -Tony


    I agree with @Charlie Arehart that there aren't any errors in your logs. It sounds to me like you just have a lot of session replication happening when you start up an instance, and that can take a long time. CF clustering is peer-to-peer, so any time any one of your instances changes the session state, that change has to go to every other server in the cluster, and there's a lot of traffic in all directions. I see something kind of similar with any other server where a bunch of data has to be loaded into memory before it can start doing work - I saw that back in the day with Adobe LiveCycle ES where it would take 20+ minutes to restart the service, and I've seen it with CF-based CMSs where a lot of data is loaded into memory at startup. So, my recommendation would be to either get used to it, or reduce the amount of user-specific data, or use an alternative to session replication like shared database (maybe client storage, maybe something you roll yourself).

     

    Dave Watts, Eidolon LLC

    Dave Watts, Eidolon LLC