Copy link to clipboard
Copied
We recently updated our four physical servers to CF10 update 14, and gave them all a reboot to let the latest Kernel and RHEL6 security updates take effect. Each server has two worker instances, cfusion1 and cfusion2, in a round robin cluster.
The servers came back fine after the reboot until the following day. Upon restarting any CF instance, it seems the instance locks up. If I bring both CF instances down, then both back up, the cluster *USUALLY* comes back fine, although sometimes it does not, and I have to reboot the box. To make matteres even weirder, sometimes I can restart an instance, regardless if both are up or down, if I remove the secondary IP address on em1:1. Weird. This issue exists across all four physical servers. I have pulled one out of our web cluster to try to troubleshoot, while the other three limp along.
The major issue is that when one of the instances hang, they do so in a zombie state, where they are half dead, but not dead enough for the tomcate cluster to expire the instance. That means half my requests are processed by the working instance, and the other half my requests queue up indefinitely, eventually bringing my webserver down completely. While it seems that shutting down both instances then bringing both up again usually works, its not something I like to do on production machines. And occasionally, the instance wont come back. These machines have become painfully unstable.
When I attempt to restart cfusion2, heres what I see in the coldfusion-error.log
Nov 24, 2014 6:02:58 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /opt/coldfusion10/jre/lib/amd64/server:/opt/coldfusion10/jre/lib/amd64:/opt/coldfusion10/jre/../lib/amd64:/opt/coldfusion10/cfusion2/lib:/opt/coldfusion10/cfusion2/lib/_ilnx21/bin:/opt/coldfusion10/cfusion2/lib/international::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8502"]
Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8447"]
Nov 24, 2014 6:03:00 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8014"]
Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.54
Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.10.240.104:4002
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=416205, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl
Nov 24, 2014 6:03:02 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
Nov 24, 2014 6:03:04 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#/
Nov 24, 2014 6:03:05 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#/
Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Register manager localhost#/ to cluster element Engine with name Catalina
Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Starting clustering manager at localhost#/
Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Operation has timed out(30000 ms.).; Faulty members:tcp://{10, 10, 240, 104}:4001;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:109)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:837)
at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:789)
at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#/], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#/]; session state send at 11/24/14 6:03 PM received in 30,264 ms.
Nov 24, 2014 6:03:36 PM org.apache.catalina.session.StandardSession tellNew
SEVERE: Session event listener threw exception
java.lang.NullPointerException
at coldfusion.bootstrap.HttpFlexSessionBootstrap.getListener(HttpFlexSessionBootstrap.java:154)
at coldfusion.bootstrap.HttpFlexSessionBootstrap.sessionCreated(HttpFlexSessionBootstrap.java:69)
at org.apache.catalina.session.StandardSession.tellNew(StandardSession.java:422)
at org.apache.catalina.session.StandardSession.setId(StandardSession.java:394)
at org.apache.catalina.ha.session.DeltaSession.setId(DeltaSession.java:275)
at org.apache.catalina.ha.session.DeltaManager.handleSESSION_CREATED(DeltaManager.java:1336)
at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1214)
at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:803)
at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Then when I try to stop the instance, I get this indication of a half dead process:
#/opt/coldfusion10/cfusion2/bin/coldfusion stop
Stopping ColdFusion 10 server instance named cfusion2, please wait
Nov 24, 2014 6:06:03 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8009is not active. Stop the server only after it is started.
ColdFusion 10 server instance named cfusion2 has been stopped
The working cluster instance cfusion1 shows this in the coldfusion-error.log
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:17 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.session.DeltaManager.send(DeltaManager.java:497)
at org.apache.catalina.ha.session.DeltaManager.sendCreateSession(DeltaManager.java:487)
at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:463)
at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:450)
at org.apache.catalina.connector.Request.doGetSession(Request.java:2947)
at org.apache.catalina.connector.Request.getSession(Request.java:2311)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:897)
at coldfusion.runtime.AppHelper.setupJ2eeSessionScope(AppHelper.java:974)
at coldfusion.runtime.AppHelper.setupSessionScope(AppHelper.java:1067)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:361)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:112)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:94)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:79)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:58)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
at coldfusion.CfmServlet.service(CfmServlet.java:219)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:218)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 59 more
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:18 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
I have tried all the usual stuff, like increase the timeouts, juggle ports and addresses. It seems that the two clusters simply cannot communicate with each other.
My server.xml files have been mildly tweaked for PCI compliance, so we have an SSL redirect. I have tried going back to stock, but it doesnt seem to help either.
# cat /opt/coldfusion10/cfusion1/runtime/conf/server.xml
<Server port="8008" shutdown="SHUTDOWN">
<Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">
</Listener>
<Listener className="org.apache.catalina.core.JasperListener">
</Listener>
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">
</Listener>
<Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">
</Listener>
<GlobalNamingResources>
<Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">
</Resource>
</GlobalNamingResources>
<Service name="Catalina">
<Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">
</Executor>
<Connector port="8501" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8446" executor="tomcatThreadPool" maxThreads="50">
</Connector>
<Connector port="8446" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">
</Connector>
<Connector port="8013" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">
</Connector>
<Engine jvmRoute="cfusion1" name="Catalina" defaultHost="localhost">
<Realm className="org.apache.catalina.realm.LockOutRealm">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">
</Realm>
</Realm>
<Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">
<Valve pattern="%h %l %u %t "%r" %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">
</Valve>
</Host>
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">
<Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">
</Manager>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">
</Membership>
<Receiver port="4001" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">
</Receiver>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">
</Transport>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">
</Interceptor>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">
</Valve>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">
</Valve>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">
</ClusterListener>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">
</ClusterListener>
</Cluster>
</Engine>
</Service>
</Server>
# cat /opt/coldfusion10/cfusion2/runtime/conf/server.xml
<Server port="8009" shutdown="SHUTDOWN">
<Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">
</Listener>
<Listener className="org.apache.catalina.core.JasperListener">
</Listener>
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">
</Listener>
<Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">
</Listener>
<GlobalNamingResources>
<Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">
</Resource>
</GlobalNamingResources>
<Service name="Catalina">
<Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">
</Executor>
<Connector port="8502" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8447" executor="tomcatThreadPool" maxThreads="50">
</Connector>
<Connector port="8447" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">
</Connector>
<Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false">
</Connector>
<Engine jvmRoute="cfusion2" name="Catalina" defaultHost="localhost">
<Realm className="org.apache.catalina.realm.LockOutRealm">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">
</Realm>
</Realm>
<Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">
<Valve pattern="%h %l %u %t "%r" %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">
</Valve>
</Host>
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">
<Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">
</Manager>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">
</Membership>
<Receiver port="4002" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">
</Receiver>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">
</Transport>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">
</Interceptor>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">
</Valve>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">
</Valve>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">
</ClusterListener>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">
</ClusterListener>
</Cluster>
</Engine>
</Service>
</Server>
Netstat does not show anything else using the same ports.
Any suggestions? Any information is greatly appreciated!
Thanks,
-Tony
Copy link to clipboard
Copied
The "Manager Pathname" should be comment out in the context.xml as well in both the instances
Folllow this : https://forums.adobe.com/message/6361184#6361184
Copy link to clipboard
Copied
Manager Pathname has been commented out since the cluster was built. This was working up until the recent update 14. When I roll back to update 13, it works correctly, with no such session communication errors in the log files.
What's weird is that is is somewhat sporadically working (update 14). It seems that when the box is in production, and there is a light load on the machine, I get errors, and a dead instance. But if I shut down httpd, or if I remove the secondary IP address, or even change it to an unused secondary IP address, the instances do seem to communicate, although they still produce errors, and take much longer to light up. The problem seems to be at least somewhat dependent on load / handling active requests.
There is definitely a problem with update 14 and session replication as far as I can tell.
Copy link to clipboard
Copied
I can see some errors related to Connector and FYI there are few problems with update14 connectors.
Though those issues are fixed by Adobe but they are not yet publicly available. User can get these from Adobe Support team.
What I will suggest is you contact Adobe support team to get the latest connector dll's. And after applying connector patch see if you still get these errors in update 14.
Thanks,
Milan.
Copy link to clipboard
Copied
Thanks, but this is a linux environment, so DLLs wont be of much help. They may have the equivalent jar files though. I have already opened a bug report (3857664). Is there a better option to reach them? Thanks for the info, glad to see I may not be the only one with problems.