Copy link to clipboard
Copied
I'm running 2 Windows 2016 boxes with CF 2018 Ent installed.
We have a dedicated CF instance specifically to run as a websocket server.
It's setup to proxy via IIS.
We estimate that we could have 5,000 - 10,000 concurrent connections to it since our users could be subscribed to multiple channels.
After a fresh restart of CF and IIS, clients can connect to the websocket channel instantly and get a near instant success publish from the channel so you know you're successfully subscribed and updates will start coming in.
After a random amount of time after things are working well (minutes to days), clients will instantly connect but will no longer get the success publish until things are restarted again. This means they aren't truly subscribed to the channel anymore and no updates come through.
The instance resources look fine, good memory usage, garbage collection looks good, low cpu usage.
We've also been playing with the connection pool numbers.
server.xml max threads = 5000
config\wsproxy\1\bin\config.ini ConnectionPoolSize=20
workers.properties =
worker.wss.connection_pool_size=5000
worker.wss.connection_pool_timeout=60
worker.wss.max_reuse_connections=5000
We're knocking our heads against the wall with this for some time now and are hoping to get some help.
Copy link to clipboard
Copied
Matt, while I wish I could propose some single tweak that would help, there are just too many variables. What I would say is that with a combination of better monitoring of things, as well as close assessment of those various configuration settings (to make sure there's not an issue that's unclear for what you have shared), it SHOULD be possible both to understand what's causing the failing updates/channel communication, and then what setting needs to be tweaked (whether in the connector and its config, the proxy and its config, CF, IIS, the JVM, or perhaps even something else).
If you're at all interested in a helping hand to assess all that, see my carehart.org/consulting page. I hate to drop that as the only solution I can offer, but for now it is. Perhaps soemonee else will have another suggestion if you prefer to wait for that. But if you want it solved, either we will or you won't pay for any of my time you don't find valuable.
Copy link to clipboard
Copied
We're having the same issue. It requires restart to work for a period of time (most daily) and we don't know where the setting is to fix this restart issue. Please share if you found a solution for this.
Copy link to clipboard
Copied
We ended up creating a new CF instance that is used as a dedicated single WS server.
We also transitioned a lot of it to pubnub.
No real solution was ever found. Thanks Adobe.
Copy link to clipboard
Copied
To be clear, no, you should not need to restart anything. There's always a reason for that and almost always a better solution. And Tuan, you don't
clarify what it is that you are restarting. Do you mean cf? The web server? The box they are running on?
And Matt, you never responded to my offer of direct help (which was offered even potentially at no charge). While your workaround may have seemed easier--and I'm glad you're doing well with that alternative--I just want to say again that such problems should be solvable.
And I'll say the same to you, Tuan, with the same offer of direct help if interested-- especially if that workaround may not work as well for you.
It's not clear in your respective cases what or where the problem may be. But to Matt's last comment, Adobe often gets the blame for issues which may not at all be of their making. Again, there are a lot of variables in such things.
Let's find the problem and fix it, if we can. As the saying goes, it's better to light one candle than to curse the darkness.
(Or again perhaps this may catch someone's eye and they'll hop in with the perfect solution. If I had it, I'd offer it. Or perhaps they will ask the perfect question/s to drive you to the solution here in the forums alone. Again, I do when I can but in this case I sense it's just not that simple a problem. And since no one else has chimed in, in the couple of months since matt first wrote, that would seem to confirm my suspicion. )
Copy link to clipboard
Copied
I am aware that your post was from 2021, but can you tell us what patch of 2018 you're on?
Thank you.
Copy link to clipboard
Copied
I know this is old but did you ever find a solution to this?
I am encountering this same issue with websockets, cf2021, latest patch w/IIS
Copy link to clipboard
Copied
Copy link to clipboard
Copied
I should say - I gave up on CF for websockets. Let a dedicated service figure that out, CF does not do it well (imho).
Copy link to clipboard
Copied
Thanks for the quick response. I have actually started rebuilding our company's chat app in node js w/react cause it's an issue i can't figure out. I've seen the websocket just stop working entirely. subscriber ID becomes 0 and users are no longer able to accept or send through the websocket. At first, it would be like a once a week thing, but now it's almost daily. I have to restart the cfservice to get the websocket working correctly. Nothing jumped out to in PMT. I did notice that sometimes it may be high cpu being caused by IIS thread worker process (i think I may have an issue with app logic as all messages are blasted to every subscriber of the channel as opposed to my node approach of sending to specific subscriber IDs) but even then, we have a decently sized machine for the 40-50 concurrent users we have and it still disconnects. Im also using the proxy cause we had issues with the handshake with certain client-side firewalls blocking it. No idea where to go. The "web" machine has 96GBs of ram with CF allowed to take up to 64, if it needs it. It's rare to see it go higher than 28GBs
Copy link to clipboard
Copied
Michael (and Matt, Tuan, et al), I'll repeat the offer I'd made to Matt as the first response here in Feb 2021. I promise we will solve your problem or you won't pay for the assistance.
I can understand preferring to await someone else offering "the solution" (and I said that then, and again above in my reply to Tuan and Matt in 2022). No one else ever suggested the magic bullet to put down this ghost in the machine. And I don't recall they ever took me up on the offer of direct help.
Michael, I appreciate that you've tried to assess things with the pmt--and that you feel the box is more than capable so that it should not fail. Yet it has. There will be an explanation.
We may be able to find and resolve it in less than a couple of hours, perhaps even less. I do it daily, helping when even teams of smart folks may have struggled for days or weeks. That's not bragging; it's simply that they don't likely attack such knotty cf problems day in and day out, and so I might connect a dot they missed.
And I can offer time today even, this morning--or next week or whenever. We can even arrange evenings or weekends if necessary (but it should not be) More on my rates, approach, satisfaction guarantee, online calendar and more at carehart.org/consulting
Copy link to clipboard
Copied
@Charlie Arehart or @BKBK might be able to help you all out, but sometimes @mattf48248714 's solution is the best one: don't use CF for everything, let some dedicated service figure it out. CF has to provide a solution for everything. That solution might not do all the stuff you want. It might be discontinued, leaving you legacy code you need to rewrite. Sometimes it's best to learn additional new solutions and pick the best tool for the job, even if it's more work. I'm reminded of all the middling-quality CF-JavaScript integrations that depended on the Yahoo! YUI libraries ... which have been discontinued for a while. Anyway, I think websockets are complicated enough that they might warrant a separate product just for themselves if you can get it.
@Michael_Evolve , going back to your specific problem, I can pick up some potential problems. First, the words "concurrent users" are going to mean different things when you talk about HTTP/1.1 vs websockets, where you essentially have "always-on" connections. Second, sending all messages to all users instead of just their intended recipients is likely to cause issues, I think. I'd try to retrofit that to your CF websockets implementation if you can. If you can't, and no one can help you, maybe take a look at this library, which serves - and I quote - a "metric buttload" of websockets:
https://github.com/uNetworking/uWebSockets.js
Copy link to clipboard
Copied
Hi @Dave Watts ,
You've made a big point there. Great advice, too. If the developer can find dedicated, specialist Websocket software that integrates with ColdFusion, then that will be the preferable solution. In fact Separation-of-Concerns and GRASP (Information Expert, Modularity, High-Cohesion) suggest that that is best-practice.
Copy link to clipboard
Copied
@mattf48248714 , your description of the issue is exemplary. Even before I finished reading it for the first time, I had an idea of what could be the problem. I immediately thought of resource exhaustion. But I needed to look further into it.
First of all, the IIS settings in workers.properties (worker.wss.connection_pool_size, worker.wss.connection_pool_timeout, worker.wss.max_reuse_connections) could matter. As you will see in a moment, one of your settings is problematic. But it seems like something else is causing the problem you're seeing (that is, clients connecting instantly but failing to subscribe later).
The IIS settings control the AJP connector between IIS and ColdFusion. They determine, for example, how many AJP connections can be open and reused. So, for the overall traffic and general health of your server, you will need to use the optimal IIS settings.
Suggestions:
See https://www.petefreitag.com/blog/tuning-tomcat-iis-connectors/
Nevertheless, I now think that resource exhaustion is ithe main cause of the issue. 5000 - 10 000, that's a lot of concurrent connections. That is likely to exhaust resources as follows:
If clients connect or disconnect frequently or are idle for too long, the websocket server may not properly clean up old, defunct subscriptions or sockets. If so, then, sooner or later, websocket's channel list will get polluted with these invalid or "ghost" subscribers.
Suggestions:
Copy link to clipboard
Copied
BKBK, I don't believe the suggestions for the theory of resource exhaustions are applicable to this case. @mattf48248714 's setup used the proxy, which would ignore those settings. CF admin hides those settings when you select "Use Proxy".
I spent sometime with Charlie on looking at my specific issue which is similar to Matt's. I'm believing more and more that these "ghosts" connections are the culprit. In my case, a chat application, makes it so that each client must be connected to the socket at all times to receive messages. I think each time a client somehow disconnects, and reconnect, it opens a "new" connection and the old one is never cleaned up properly. I now have my application pool recycling at midnight (thanks @Charlie Arehart !) and that has helped me so far. It's been a few days since I've had to flat out restart the cf app service. I think now if i can figure out how to remove these "ghost" connections, it would be far more stable.
Copy link to clipboard
Copied
You say, "BKBK, I don't believe the suggestions for the theory of resource exhaustions are applicable to this case", followed later by "I'm believing more and more that these "ghosts" connections are the culprit." That leads me to think that there are a few misunderstandings to clarify.
First of all, your two statements contradict each other. In my suggestion, a "ghost" connection may be one of the results of resource exhaustion.
Anyway, to be clear, my explanation is about "resource exhausion" (singular). And I didn't suggest it as a theory. More as a hypothesis, backed up by something practical for you to test.
That said, I am glad to hear that you have made some headway towards a solution.
Copy link to clipboard
Copied
@mattf48248714 's setup used the proxy, which would ignore those settings. CF admin hides those settings when you select "Use Proxy".
I spent sometime with Charlie on looking at my specific issue which is similar to Matt's. I'm believing more and more that these "ghosts" connections are the culprit. In my case, a chat application, makes it so that each client must be connected to the socket at all times to receive messages. I think each time a client somehow disconnects, and reconnect, it opens a "new" connection and the old one is never cleaned up properly. I now have my application pool recycling at midnight (thanks @Charlie Arehart !) and that has helped me so far. It's been a few days since I've had to flat out restart the cf app service. I think now if i can figure out how to remove these "ghost" connections, it would be far more stable.
By @Michael_Evolve
My suggestion amounts to this:
Find more inspiration, events, and resources on the new Adobe Community
Explore Now