I have a web application that makes extensive use of websockets for call center agents. There are up to 50 simultaneous users who are triggering messages and listening on channels for chat, phone calls and broadcast messaging. After about a month in production (and, with no issues), we had two failures yesterday where messages stopped being processed. So far, it is unclear if the .cfc extension handlers were receiving the browser websocket messages during the failure, but it is positive that the server was not publishing back out to the browser listeners.
When I searched the CF and IIS logs, there is NO record of any anomalous events during the stalled periods (application.log, exception.log, etc...). Restarting the ColdFusion Application Service causes messages to start flowing immediately, but there is no recovery (no catching up) for messages sent during the stall.
All other website functionality appeared to be working properly during the stall.
I'm awaiting the next failure so that I can diagnose further, but I'd rather it not fail (duh).
Has this issue been reported previously (I could not find a similar report)?
Are there any limits to the number of simultaneous registered websocket clients in CF11 (ultimately need headroom for 250 simultaneous users)?
Is there any other location where errors might have been captured?
Is there a record or queue for websocket messages that are processed by the server before/during publishing?
Is there any way to detect this issue in real-time without building a heartbeat websocket?
Windows 2012 Server R2
ColdFusion 11 Update 2 (rolled back immediately in December from Update 3 due to its breaking of websockets over SSL)
Websockets over SSL using RapidSSL cert over Port 8577