Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

CF websockets

Community Beginner ,
Feb 01, 2021 Feb 01, 2021

I'm running 2 Windows 2016 boxes with CF 2018 Ent installed.


We have a dedicated CF instance specifically to run as a websocket server.


It's setup to proxy via IIS.


We estimate that we could have 5,000 - 10,000 concurrent connections to it since our users could be subscribed to multiple channels.


After a fresh restart of CF and IIS, clients can connect to the websocket channel instantly and get a near instant success publish from the channel so you know you're successfully subscribed and updates will start coming in.


After a random amount of time after things are working well (minutes to days), clients will instantly connect but will no longer get the success publish until things are restarted again. This means they aren't truly subscribed to the channel anymore and no updates come through.

 

The instance resources look fine, good memory usage, garbage collection looks good, low cpu usage.

 

We've also been playing with the connection pool numbers.

server.xml max threads = 5000

 

config\wsproxy\1\bin\config.ini   ConnectionPoolSize=20

 

workers.properties = 

worker.wss.connection_pool_size=5000

worker.wss.connection_pool_timeout=60
worker.wss.max_reuse_connections=5000


We're knocking our heads against the wall with this for some time now and are hoping to get some help.

TOPICS
Server administration
3.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 02, 2021 Feb 02, 2021

Matt, while I wish I could propose some single tweak that would help, there are just too many variables. What I would say is that with a combination of better monitoring of things, as well as close assessment of those various configuration settings (to make sure there's not an issue that's unclear for what you have shared), it SHOULD be possible both to understand what's causing the failing updates/channel communication, and then what setting needs to be tweaked (whether in the connector and its config, the proxy and its config, CF, IIS, the JVM, or perhaps even something else).

 

If you're at all interested in a helping hand to assess all that, see my carehart.org/consulting page. I hate to drop that as the only solution I can offer, but for now it is. Perhaps soemonee else will have another suggestion if you prefer to wait for that. But if you want it solved, either we will or you won't pay for any of my time you don't find valuable.


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 18, 2022 Apr 18, 2022

We're having the same issue. It requires restart to work for a period of time (most daily) and we don't know where the setting is to fix this restart issue. Please share if you found a solution for this.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 18, 2022 Apr 18, 2022

We ended up creating a new CF instance that is used as a dedicated single WS server. 

 

We also transitioned a lot of it to pubnub. 

 

No real solution was ever found.  Thanks Adobe.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 18, 2022 Apr 18, 2022

To be clear, no, you should not need to restart anything. There's always a reason for that and almost always a better solution. And Tuan, you don't 

clarify what it is that you are restarting. Do you mean cf? The web server? The box they are running on? 

 

And Matt, you never responded to my offer of direct help (which was offered even potentially at no charge). While your workaround may have seemed easier--and I'm glad you're doing well with that alternative--I just want to say again that such problems should be solvable.

 

And I'll say the same to you, Tuan, with the same offer of direct help if interested-- especially if that workaround may not work as well for you. 

 

It's not clear in your respective cases what or where the problem may be. But to Matt's last comment, Adobe often gets the blame for issues which may not at all be of their making. Again, there are a lot of variables in such things. 

 

Let's find the problem and fix it, if we can. As the saying goes, it's better to light one candle than to curse the darkness.

 

(Or again perhaps this may catch someone's eye and they'll hop in with the perfect solution. If I had it, I'd offer it. Or perhaps they will ask the perfect question/s to drive you to the solution here in the forums alone. Again, I do when I can but in this case I sense it's just not that simple a problem. And since no one else has chimed in, in the couple of months since matt first wrote, that would seem to confirm my suspicion. ) 


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 14, 2023 Jun 14, 2023

I am aware that your post was from 2021, but can you tell us what patch of 2018 you're on?
Thank you.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 24, 2025 Apr 24, 2025

I know this is old but did you ever find a solution to this?

I am encountering this same issue with websockets, cf2021, latest patch w/IIS 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 24, 2025 Apr 24, 2025

https://www.pubnub.com/

 

I gave up on CF.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 24, 2025 Apr 24, 2025

I should say - I gave up on CF for websockets.  Let a dedicated service figure that out, CF does not do it well (imho).

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 24, 2025 Apr 24, 2025

Thanks for the quick response. I have actually started rebuilding our company's chat app in node js w/react cause it's an issue i can't figure out. I've seen the websocket just stop working entirely. subscriber ID becomes 0 and users are no longer able to accept or send through the websocket. At first, it would be like a once a week thing, but now it's almost daily. I have to restart the cfservice to get the websocket working correctly. Nothing jumped out to in PMT. I did notice that sometimes it may be high cpu being caused by IIS thread worker process (i think I may have an issue with app logic as all messages are blasted to every subscriber of the channel as opposed to my node approach of sending to specific subscriber IDs) but even then, we have a decently sized machine for the 40-50 concurrent users we have and it still disconnects. Im also using the proxy cause we had issues with the handshake with certain client-side firewalls blocking it. No idea where to go. The "web" machine has 96GBs of ram with CF allowed to take up to 64, if it needs it. It's rare to see it go higher than 28GBs

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 25, 2025 Apr 25, 2025

Michael (and Matt, Tuan, et al), I'll repeat the offer I'd made to Matt as the first response here in Feb 2021. I promise we will solve your problem or you won't pay for the assistance.

 

I can understand preferring to await someone else offering "the solution" (and I said that then, and again above in my reply to Tuan and Matt in 2022). No one else ever suggested the magic bullet to put down this ghost in the machine. And I don't recall they ever took me up on the offer of direct help. 

 

Michael, I appreciate that you've tried to assess things with the pmt--and that you feel the box is more than capable so that it should not fail. Yet it has. There will be an explanation.

 

We may be able to find and resolve it in less than a couple of hours, perhaps even less.  I do it daily, helping when even teams of smart folks may have struggled for days or weeks. That's not bragging; it's simply that they don't likely attack such knotty cf problems day in and day out, and so I might connect a dot they missed. 

 

And I can offer time today even, this morning--or next week or whenever. We can even arrange evenings or weekends if necessary (but it should not be) More on my rates, approach, satisfaction guarantee, online calendar and more at carehart.org/consulting 


/Charlie (troubleshooter, carehart. org)
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 28, 2025 Apr 28, 2025

@Charlie Arehart or @BKBK might be able to help you all out, but sometimes @mattf48248714 's solution is the best one: don't use CF for everything, let some dedicated service figure it out. CF has to provide a solution for everything. That solution might not do all the stuff you want. It might be discontinued, leaving you legacy code you need to rewrite. Sometimes it's best to learn additional new solutions and pick the best tool for the job, even if it's more work. I'm reminded of all the middling-quality CF-JavaScript integrations that depended on the Yahoo! YUI libraries ... which have been discontinued for a while. Anyway, I think websockets are complicated enough that they might warrant a separate product just for themselves if you can get it.

 

@Michael_Evolve , going back to your specific problem, I can pick up some potential problems. First, the words "concurrent users" are going to mean different things when you talk about HTTP/1.1 vs websockets, where you essentially have "always-on" connections. Second, sending all messages to all users instead of just their intended recipients is likely to cause issues, I think. I'd try to retrofit that to your CF websockets implementation if you can. If you can't, and no one can help you, maybe take a look at this library, which serves - and I quote - a "metric buttload" of websockets:

 

https://github.com/uNetworking/uWebSockets.js

 

 

 

Dave Watts, Eidolon LLC
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 29, 2025 Apr 29, 2025
LATEST

Hi @Dave Watts ,

 

You've made a big point there. Great advice, too. If the developer can find dedicated, specialist Websocket software that integrates with ColdFusion, then that will be the preferable solution. In fact Separation-of-Concerns and GRASP (Information Expert, Modularity, High-Cohesion) suggest that that is best-practice. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 28, 2025 Apr 28, 2025

@mattf48248714 , your description of the issue is exemplary. Even before I finished reading it for the first time, I had an idea of what could be the problem. I immediately thought of resource exhaustion. But I needed to look further into it.

 

First of all,  the IIS settings in workers.properties (worker.wss.connection_pool_size, worker.wss.connection_pool_timeout, worker.wss.max_reuse_connections) could matter. As you will see in a moment, one of your settings is problematic. But it seems like something else is causing the problem you're seeing (that is, clients connecting instantly but failing to subscribe later). 

 

The IIS settings control the AJP connector between IIS and ColdFusion. They determine, for example, how many AJP connections can be open and reused. So, for the overall traffic and general health of your server, you will need to use the optimal IIS settings.

 

Suggestions:

  • A connection_pool_size of 20 seems far too low. Set it to the same value as max_reuse_connections, namely, 5000.
  • Make sure the settings in workers.propoerties match the corresponding settings in server.xml
    Therefore,  
    worker.wss.max_reuse_connections=5000 / maxthreads="5000";
    worker.wss.connection_pool_timeout=60 / connectionTimeout="60000"

See https://www.petefreitag.com/blog/tuning-tomcat-iis-connectors/ 


Nevertheless, I now think that resource exhaustion is ithe main cause of the issue. 5000 - 10 000, that's a lot of concurrent connections. That is likely to exhaust resources as follows:

  • If clients connect or disconnect frequently or are idle for too long, the websocket server may not properly clean up old, defunct subscriptions or sockets. If so, then, sooner or later,  websocket's channel list will get polluted with these invalid or "ghost" subscribers.

  • ColdFusion's websocket implementation lacks explicit settings with which to fine-tune the configuration of sockets. Think of settings such as heartbeat or keepalive, Their absence can have consequences: "zombie" connections.
    On the client side, browsers, firewalls and proxies can silently drop TCP connections if these idle too long. ColdFusion wouldn't always notice that. It means your websocket channel may think clients are still subscribed, but they aren't really reachable anymore.
    This was reported on ColdFusion 2016 in 2018. Adobe did acknowledge it is a bug, but its status is still "To Fix". See https://tracker.adobe.com/#/view/CF-4203142 

 

Suggestions:

  • Reduce socketTimeout: Since your system is not hitting any (memory or CPU) resource limits, chances are that there are a lot of idle connections. Therefore, you might want to decrease the socketTimeout, so idle connections are terminated more quickly.
    You can do so by editing the file /lib/neo-websocket.xml. Use a value much lower than the default 300, say, 120.  
  • Experiment with an increased framesize and with a decreased framesize: FrameSize controls the maximum size of each websocket frame that can be sent or received over a websocket connection. Each frame represents a chunk of data transmitted over the connection.
    The default value in ColdFusion's websocket is 1024 KB. It is the setting  on the page Server Settings > Websockets in the ColdFusion Administrator. Equivalently, it is the element maxFrameSize in /lib/neo-websocket.xml.

    Larger frame sizes allow the websocket to handle larger payloads in one go. If your application has large websocket messages, then the connection issue you're seeing might be due to fragmentation and reassembly of too many messages that exceed the framesize. If so, increasing the framesize might help.

    Granted, increasing the frameSize may lead to higher memory usage, as large frames need to be buffered, transmitted, and processed. However, as you have said, you aren't hitting any memory limits. So it is worth experimenting by increasing the framesize. Say, to 4096 KB.

    On the other hand, since you're experiencing connection failures, decreasing the framesize can help. That is because smaller frames reduce the chance of overloading the traffic with too much data at once. In so doing, smaller frames are more efficient, especially when dealing with a large number of concurrent users, as in your situation. The one big assumption is that most of the messages are small. 
    If indeed so, then experiment by decreasing the framesize to, say, 256 KB.

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 28, 2025 Apr 28, 2025

BKBK, I don't believe the suggestions for the theory of resource exhaustions are applicable to this case.  @mattf48248714 's setup used the proxy, which would ignore those settings. CF admin hides those settings when you select "Use Proxy".

 

Michael_Evolve_0-1745873263164.png

 

I spent sometime with Charlie on looking at my specific issue which is similar to Matt's. I'm believing more and more that these "ghosts" connections are the culprit. In my case, a chat application, makes it so that each client must be connected to the socket at all times to receive messages. I think each time a client somehow disconnects, and reconnect, it opens a "new" connection and the old one is never cleaned up properly. I now have my application pool recycling at midnight (thanks @Charlie Arehart !) and that has helped me so far. It's been a few days since I've had to flat out restart the cf app service. I think now if i can figure out how to remove these "ghost" connections, it would be far more stable.

 

Michael_Evolve_1-1745873557555.png

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 29, 2025 Apr 29, 2025

@Michael_Evolve , 

 

You say, "BKBK, I don't believe the suggestions for the theory of resource exhaustions are applicable to this case", followed later by "I'm believing more and more that these "ghosts" connections are the culprit." That leads me to think that there are a few misunderstandings to clarify.

 

First of all, your two statements contradict each other. In my suggestion, a "ghost" connection may be one of the results of resource exhaustion.

 

Anyway, to be clear, my explanation is about "resource exhausion" (singular). And I didn't suggest it as a theory. More as a hypothesis, backed up by something practical for you to test.

 

That said, I am glad to hear that you have made some headway towards a solution. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 29, 2025 Apr 29, 2025
quote

  @mattf48248714 's setup used the proxy, which would ignore those settings. CF admin hides those settings when you select "Use Proxy".

 

Michael_Evolve_0-1745873263164.png

 

I spent sometime with Charlie on looking at my specific issue which is similar to Matt's. I'm believing more and more that these "ghosts" connections are the culprit. In my case, a chat application, makes it so that each client must be connected to the socket at all times to receive messages. I think each time a client somehow disconnects, and reconnect, it opens a "new" connection and the old one is never cleaned up properly. I now have my application pool recycling at midnight (thanks @Charlie Arehart !) and that has helped me so far. It's been a few days since I've had to flat out restart the cf app service. I think now if i can figure out how to remove these "ghost" connections, it would be far more stable.

 

 


By @Michael_Evolve


My suggestion amounts to this:

  • whatever websocket server you use - whether ColdFusion, Proxy or any other - you should experiment by changing the socketTimeout and the Frame-size when you suspect there are problems with connections.  
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources