Skip to main content
Participant
September 28, 2018
Question

Coldfusion 2018 clustering and session replication not working

  • September 28, 2018
  • 1 reply
  • 2121 views

Setting up a couple new Coldfusion 2018 servers and will be using clustering for the first time and have run into some problems.

I am having trouble with session replication. Basically, session variables appear to be replicated between nodes in a cluster but are killed after a short while at random.

A little setup info:

  • 2 web servers (Windows Server 2012) behind load balancers
  • On each web server sits a Coldfusion cluster consisting of 2 local instances (still unclear if this is useful or not - will ask in separate question) and 2 remote instances (the remotes reference the local instances of each opposite server)
  • For simplicity, currently just testing on a single server with local Coldfusion instances - leaving the remotes out of the equation until I can get things working reliably locally
  • Using J2EE session variables
  • Coldfusion session timeout set to 2 hours

Here is what I did/experienced:

  • We have a web application that requires login and stores user information in the session upon login.
  • I made a small modification to the web app to show me which cluster instance has serviced my current request.
  • After setting up the cluster, I started the web application and logged in, noting the instance which displayed the login page.
  • Upon logging in, I was immediately returned to the login screen (app checks for user info in session and redirects to login if not found)
  • Debugging revealed that I was actually being logged in but after redirecting to some new page after login the user info would be gone from session.
  • Multiple login attempts in a row (same credentials, just tried over and over again and again) revealed that sometimes login would proceed just fine and I would get into the app. However, if I refreshed the page or went to another page, the session would be lost very soon but at random (within a few page refreshes).
  • In an attempt to simplify the problem to try and figure out what is going on, I created a simple .cfm that bypasses all the login stuff and does one thing: adds a simple string value to session and then dumps the session and instance name. ** I ran the script once, noted which instance was being used and that session contained my value. ** I then edited the script so it no longer set the session value. ** I then hit refresh over and over so I could confirm:
    1. That requests were being serviced by both instances in cluster
    2. That as I flip-flopped between instances, the session value was available all the time.
  • Again, the replication would work and for several refreshes I could see my session variable available on each instance...until it wasn't. After a random number of refreshes/seconds (between 2 - 10 refreshes say) the value would disappear.

I am at a loss to explain why this is happening. We considered using Redis as a session store to see if it helped but frankly, our team has no experience with it, it is clunky to get working in Windows and we really don't want any more moving pieces in our infrastructure if we can help it.

Any insight on what is occurring as well as advice for how to peer behind the scenes as it were and see what is going on with session replication would be greatly appreciated.

Thanks

This topic has been closed for replies.

1 reply

Community Expert
September 28, 2018

I'm not going to answer your question right now. Instead, I'm going to ask a different question: do you really need session replication? Would using sticky sessions at the load balancer solve your problem? That's a much more reliable solution in my experience. If you can't use sticky sessions - and there are valid reasons why that might be the case - I'll try to respond more fully when I can.

Dave Watts, Fig Leaf Software

Dave Watts, Eidolon LLC
Participant
September 28, 2018

Hi Dave,

Thanks for your reply. To answer your question: in short, I'm not sure.

We currently have a very simple architecture with one web server and one db server. The architecture we are moving to is new to us and we are basing decisions off of what we have gleaned from various sources.

Bottom line is that we were using session replication to ensure that should something go wrong with any of the CF instances, then a user could be pointed to a working instance and continue using the system uninterrupted.

Basically we are trying to achieve the following:

  • We want to improve performance/response. Distributing user load across multiple instances key here. Still unsure if having multiple instances local to a single server is of any benefit.
  • We want to provide failover capability with little disruption to users. Instance/server could go down without impacting the users.

If I turn on sticky sessions within Coldfusion, then true, I don’t seem to have any problems. However, I had assumed sticky sessions would be problematic  because if the instance/server to which a user is connected goes down, then it would force them to login again when they are inevitably shunted to another instance/server, marring the user experience. Also, once whatever issue with the down instance/server has been resolved, because sessions are sticky, all the traffic that has temporarily been moved to the functioning instance/server will not be redistributed - we'll have all users on one instance/server and none on the other.

You mentioned turning them on within the load balancers so maybe you are not referring to CF sticky sessions at all. I'm not sure how this can be managed at the load balancer but I would love your insights. When I think about it though, I wonder if the same problem wouldn't happen at load balancer level - if not replicated, sessions are lost and once optimal conditions are restored, traffic will still be 'stuck' to the server that wasn't down.

Thanks

Community Expert
October 2, 2018

I think you have a pretty good read of things. You're right, with sticky sessions only and not using session replication, you won't be providing failover. But you will be providing overall redundancy and (hopefully) improving performance and response time. I can't really guarantee even that, because you might have other bottlenecks that are preventing that - how database access works with your application, for example.

My own opinion is that most applications don't really need session replication, because the inconvenience to users is usually pretty minor. If you have a shopping cart application, this might not be the case. Otherwise, though, that's more of a business decision than a technical one.

And yes, I was referring to using sticky sessions at the load balancer. I'd personally much rather use it there than in the peer-to-peer CF session management stuff. There's no real upside to using sticky sessions within CF's own session management.

And, yes, if you bring up the failed server, existing users will still be processed by the other server. In most applications, this isn't a huge problem because they'll naturally drain off after a while. This is especially true if you keep session timeouts low (like you should anyway).

The benefits of session replication are pretty neat, but honestly unless you want to use Redis for it I wouldn't recommend it. The previous version of session replication is a little bit fragile and there isn't really a way to fix it in production if you have a problem. Also, it doesn't scale well beyond two servers. I understand your not wanting to add new pieces of infrastructure, but that's really what scaling out requires. Building a reliable multi-server infrastructure is generally a lot more complicated than building a single-server environment - if you have two servers instead of one, you're going to have more than twice the number of moving parts than you had with one.

Dave Watts, Fig Leaf Software

Dave Watts, Eidolon LLC