Anyone here have any idea about how session replication communication works between CF cluster members? We just upgraded a site from CF11 to CF2021. Session replication was working in CF11. We believe we've configured it the same way in CF2021, but replication is not working. Stopping one of the two instances in our clusters results in a loss of the sticky sessions that were being routed to that instance.
What I expect to happen is that if a session is bound to instance 1, and I turn off instance 1, the session is seamlessly resumed on instance 2. But instead, the session is lost and the user must log in again.
The cluster has been configured using the Enterprise Manager, and Sticky Sessions and Session Replication are both enabled.
I compared the server.xml files from my CF11 and CF2021 setups and I found two differences CF2021 is missing the following from its <Cluster> tag:
1. <Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager"/>
2. <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
JvmRouteSessionIDBinderListener is no longer a thing: http://tomcat.apache.org/migration-8.html#Clustering
I tried adding the <Manager> to the CF2021 server.xml files and that did not change the behavior.
This is probably an indication of where things are going wrong:
prod2/logs/coldfusion-error.log:May 04, 2021 9:24:19 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
prod2/logs/coldfusion-error.log-INFO: Manager [localhost#]: skipping state transfer. No members active in cluster group.
Any suggestions are appreciated.
P.S. I saw this from https://tomcat.apache.org/tomcat-9.0-doc/cluster-howto.html
> The TCP port listening for replication messages is the first available server socket in range 4000-4100
so I tried running sudo tcpdump portrange 4000-4100 to see if there was any communications happening on that port, and it didn't capture any packets.
Apply update 1 of cf2021, from last month, if you have not. Does that solve things? It should. There was a known related issue that it solved. Please let us know either way.
I'm on version 2021.0.01.325996. I think that's Update 1.
@sbleon , it is unclear to me why you call this behaviour a bug.
Stopping one of the two instances in our clusters results in a loss of the sticky sessions that were being routed to that instance.
That is how sticky sessions are expected to work. The sessions stick to one particular instance. If the instance dies, then the sticky sessions will be lost.
What I expect to happen is that if a session is bound to instance 1, and I turn off instance 1, the session is seamlessly resumed on instance 2.
That is how session replication is expected to work. Which is contrary to how sticky sessions work.
I think that, to properly configure your sessions, you have to choose which one of the methods to use, but not both.
Choose sticky sessions if you want each session to be dedicated to a particular ColdFusion instance until it times out. ColdFusion will assign the session, as it sees fit, to the instance. It won't replicate any session.
Choose session replication if you want ColdFusion to distribute copies of each session among the ColdFusion instances in a cluster. Should one of the instances die, then the sessions that were being routed to that instance will be seamlessly resumed on a fellow cluster instance that is alive.
Oh, something else. I hope you have configured ColdFusion to use J2EE sessions.
BKBK, while I can understand why you would feel that's so (choose either sticky OR REPLICATION, but not both), I don't believe they are mutually exclusive.
What you may be recalling is that in cf10, the cf admin ui changed to reflect that, which seemed to be a mistake without justification. That was corrected in cf2016, so that again one can choose either, neither, or both.
And both (together) should work, with sticky meaning keep the user on an instance once there, while replication means if they end up on another instance (such as if their first instance goes down), their session should be available there.
So why have sticky? There are various reasons, such as if there is something else stateful about the app where keeping requests on the first instance is preferable (even if it can't be perfectly relied upon to always be available).
So back to the original issue, you do indeed make a good point in asking to ensure that J2ee sessions are enabled. That is of course necessary for that to work. I'd assumed sbleon had done that, but it's worth confirming. (The admin should warn if it's not so, when replication is selected, but I don't think it does. Am not on a computer now, to confirm.)
Finally, sbleon may have seen that Adobe has replied also in the bug report asking more. Let's see where this all goes. (I hope to test things also if I can get time.)
In my experience, they haven't been mutually exclusive, which is presumably why you get the option to pick both. You may want to use one or the other, but you may also want to use both.
Session replication is great when you don't want to lose a session, of course, but in my experience it's been pretty fragile. It's peer-to-peer, so if you have more than two servers in a session you can end up with a lot of traffic.
Sticky sessions prevent you from needing session replication unless you have an outage, but if you do have an outage you can find out whether session replication has been working!
Personally, I'd rather use a database for storing shared variables. That has its own potential problems, but it scales well even if it's slower.
Dave Watts, Eidolon LLC
@Charlie Arehart and @sbleon , thanks for your remark on sticky sessions and session replication not being mutually exclusive. I can imagine. After all, both are enabled by default when you create a cluster in the ColdFusion Administrator. So, I accept: you may configure both sticky sessions and session replication in ColdFusion.
The question is whether that is the best way to configure distributed session management. I would say it is not. I wish the Adobe documentation would say it more emphatically than just, "If your ColdFusion application uses session replication, sticky sessions are not typically required.".
So, @sbleon, I do believe you're on the right track. Since you want to replicate sessions, disable sticky sessions and enable session replication.
It turns out that this was a firewall issue. The firewall needed to be open for multicast traffic originating from the local public IP address. On my Ubuntu system, this looked like:
sudo ufw allow in proto udp to 18.104.22.168/4 from 22.214.171.124/32 comment 'local multicast for CF clustering'
with my server's local public IP instead of 126.96.36.199.
This firewall configuration was not necessary with our previous version, ColdFusion 11. If you are having trouble with clustering, perhaps try disabling your firewall temporarily to see if it's the problem.
Interesting and nice to know. 🙂 Thanks for sharing that.
I think it's possible to disable multicast for session replication in earlier versions of CF. I suspect it may be possible to do that with the latest version. I haven't really used session replication in a production environment in a long time, though, because it's always been fairly fragile and unreliable for me.
Dave Watts, Eidolon LLC
I also had to allow traffic from my local public IP to my local public IP so that the instances could communicate with each other to share session state:
sudo ufw allow from 188.8.131.52/32 to 184.108.40.206/32 comment 'local traffic (CF session replication)'