Cluster Replication issues in CF10 on RHEL6
We recently updated our four physical servers to CF10 update 14, and gave them all a reboot to let the latest Kernel and RHEL6 security updates take effect. Each server has two worker instances, cfusion1 and cfusion2, in a round robin cluster.
The servers came back fine after the reboot until the following day. Upon restarting any CF instance, it seems the instance locks up. If I bring both CF instances down, then both back up, the cluster *USUALLY* comes back fine, although sometimes it does not, and I have to reboot the box. To make matteres even weirder, sometimes I can restart an instance, regardless if both are up or down, if I remove the secondary IP address on em1:1. Weird. This issue exists across all four physical servers. I have pulled one out of our web cluster to try to troubleshoot, while the other three limp along.
The major issue is that when one of the instances hang, they do so in a zombie state, where they are half dead, but not dead enough for the tomcate cluster to expire the instance. That means half my requests are processed by the working instance, and the other half my requests queue up indefinitely, eventually bringing my webserver down completely. While it seems that shutting down both instances then bringing both up again usually works, its not something I like to do on production machines. And occasionally, the instance wont come back. These machines have become painfully unstable.
When I attempt to restart cfusion2, heres what I see in the coldfusion-error.log
Nov 24, 2014 6:02:58 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /opt/coldfusion10/jre/lib/amd64/server:/opt/coldfusion10/jre/lib/amd64:/opt/coldfusion10/jre/../lib/amd64:/opt/coldfusion10/cfusion2/lib:/opt/coldfusion10/cfusion2/lib/_ilnx21/bin:/opt/coldfusion10/cfusion2/lib/international::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8502"]
Nov 24, 2014 6:02:59 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8447"]
Nov 24, 2014 6:03:00 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-bio-8014"]
Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Nov 24, 2014 6:03:00 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.54
Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:/10.10.240.104:4002
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to 500
Nov 24, 2014 6:03:00 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:4
Nov 24, 2014 6:03:00 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=416205, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:4
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for 1000 milliseconds to establish cluster membership, start level:8
Nov 24, 2014 6:03:01 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl
Nov 24, 2014 6:03:02 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:8
Nov 24, 2014 6:03:04 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#/
Nov 24, 2014 6:03:05 PM org.apache.catalina.ha.session.ClusterSessionListener messageReceived
WARNING: Context manager doesn't exist:localhost#/
Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Register manager localhost#/ to cluster element Engine with name Catalina
Nov 24, 2014 6:03:06 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Starting clustering manager at localhost#/
Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:03:36 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Operation has timed out(30000 ms.).; Faulty members:tcp://{10, 10, 240, 104}:4001;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:109)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:837)
at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:789)
at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#/], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4001,{10, 10, 240, 104},4001, alive=451725, securePort=-1, UDP Port=-1, id={-40 103 -88 33 -118 2 70 76 -125 -43 102 49 -86 -103 123 -42 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.
Nov 24, 2014 6:03:36 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#/]; session state send at 11/24/14 6:03 PM received in 30,264 ms.
Nov 24, 2014 6:03:36 PM org.apache.catalina.session.StandardSession tellNew
SEVERE: Session event listener threw exception
java.lang.NullPointerException
at coldfusion.bootstrap.HttpFlexSessionBootstrap.getListener(HttpFlexSessionBootstrap.java:154)
at coldfusion.bootstrap.HttpFlexSessionBootstrap.sessionCreated(HttpFlexSessionBootstrap.java:69)
at org.apache.catalina.session.StandardSession.tellNew(StandardSession.java:422)
at org.apache.catalina.session.StandardSession.setId(StandardSession.java:394)
at org.apache.catalina.ha.session.DeltaSession.setId(DeltaSession.java:275)
at org.apache.catalina.ha.session.DeltaManager.handleSESSION_CREATED(DeltaManager.java:1336)
at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1214)
at org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions(DeltaManager.java:803)
at org.apache.catalina.ha.session.DeltaManager.startInternal(DeltaManager.java:756)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5476)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Then when I try to stop the instance, I get this indication of a half dead process:
#/opt/coldfusion10/cfusion2/bin/coldfusion stop
Stopping ColdFusion 10 server instance named cfusion2, please wait
Nov 24, 2014 6:06:03 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8009is not active. Stop the server only after it is started.
ColdFusion 10 server instance named cfusion2 has been stopped
The working cluster instance cfusion1 shows this in the coldfusion-error.log
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:17 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:17 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.session.DeltaManager.send(DeltaManager.java:497)
at org.apache.catalina.ha.session.DeltaManager.sendCreateSession(DeltaManager.java:487)
at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:463)
at org.apache.catalina.ha.session.DeltaManager.createSession(DeltaManager.java:450)
at org.apache.catalina.connector.Request.doGetSession(Request.java:2947)
at org.apache.catalina.connector.Request.getSession(Request.java:2311)
at org.apache.catalina.connector.RequestFacade.getSession(RequestFacade.java:897)
at coldfusion.runtime.AppHelper.setupJ2eeSessionScope(AppHelper.java:974)
at coldfusion.runtime.AppHelper.setupSessionScope(AppHelper.java:1067)
at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:361)
at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:112)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:94)
at coldfusion.filter.BrowserDebugFilter.invoke(BrowserDebugFilter.java:79)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:58)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
at coldfusion.CfmServlet.service(CfmServlet.java:219)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.ha.session.JvmRouteBinderValve.invoke(JvmRouteBinderValve.java:218)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:333)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 59 more
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:18 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.transport.nio.ParallelNioSender doLoop
WARNING: Not retrying send for:tcp://{10, 10, 240, 104}:4002; Sender is disconnected.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:19 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
Nov 24, 2014 6:06:19 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Send failed, and sender is disconnected. Not retrying.; Faulty members:tcp://{10, 10, 240, 104}:4002;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:171)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:89)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:54)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:78)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:77)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:93)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:77)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:224)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:182)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:843)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:815)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:539)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:524)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:506)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:419)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:343)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:422)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:199)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.catalina.tribes.transport.nio.NioSender.read(NioSender.java:169)
at org.apache.catalina.tribes.transport.nio.NioSender.process(NioSender.java:119)
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.doLoop(ParallelNioSender.java:142)
... 26 more
Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]] message. Will verify.
Nov 24, 2014 6:06:23 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member already disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 104}:4002,{10, 10, 240, 104},4002, alive=192591, securePort=-1, UDP Port=-1, id={-96 126 -95 87 -79 5 76 -125 -101 -68 -56 -60 90 -22 -1 7 }, payload={}, command={}, domain={}, ]]
I have tried all the usual stuff, like increase the timeouts, juggle ports and addresses. It seems that the two clusters simply cannot communicate with each other.
My server.xml files have been mildly tweaked for PCI compliance, so we have an SSL redirect. I have tried going back to stock, but it doesnt seem to help either.
# cat /opt/coldfusion10/cfusion1/runtime/conf/server.xml
<Server port="8008" shutdown="SHUTDOWN">
<Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">
</Listener>
<Listener className="org.apache.catalina.core.JasperListener">
</Listener>
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">
</Listener>
<Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">
</Listener>
<GlobalNamingResources>
<Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">
</Resource>
</GlobalNamingResources>
<Service name="Catalina">
<Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">
</Executor>
<Connector port="8501" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8446" executor="tomcatThreadPool" maxThreads="50">
</Connector>
<Connector port="8446" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">
</Connector>
<Connector port="8013" protocol="AJP/1.3" redirectPort="8446" tomcatAuthentication="false">
</Connector>
<Engine jvmRoute="cfusion1" name="Catalina" defaultHost="localhost">
<Realm className="org.apache.catalina.realm.LockOutRealm">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">
</Realm>
</Realm>
<Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">
<Valve pattern="%h %l %u %t "%r" %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">
</Valve>
</Host>
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">
<Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">
</Manager>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">
</Membership>
<Receiver port="4001" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">
</Receiver>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">
</Transport>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">
</Interceptor>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">
</Valve>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">
</Valve>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">
</ClusterListener>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">
</ClusterListener>
</Cluster>
</Engine>
</Service>
</Server>
# cat /opt/coldfusion10/cfusion2/runtime/conf/server.xml
<Server port="8009" shutdown="SHUTDOWN">
<Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on">
</Listener>
<Listener className="org.apache.catalina.core.JasperListener">
</Listener>
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener">
</Listener>
<Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener">
</Listener>
<GlobalNamingResources>
<Resource description="User database that can be updated and saved" name="UserDatabase" pathname="conf/tomcat-users.xml" factory="org.apache.catalina.users.MemoryUserDatabaseFactory" type="org.apache.catalina.UserDatabase" auth="Container">
</Resource>
</GlobalNamingResources>
<Service name="Catalina">
<Executor name="tomcatThreadPool" minSpareThreads="4" maxThreads="150" namePrefix="catalina-exec-">
</Executor>
<Connector port="8502" protocol="org.apache.coyote.http11.Http11Protocol" connectionTimeout="20000" redirectPort="8447" executor="tomcatThreadPool" maxThreads="50">
</Connector>
<Connector port="8447" sslEnabledProtocols="TLSv1, TLSv1.1, TLSv1.2" protocol="HTTP/1.1" keystorePass="xxxxxxxx" SSLEnabled="true" scheme="https" secure="true" keystoreFile="/home/.keystore" keyAlias="tomcat" maxThreads="150" ciphers="TLS_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA" clientAuth="false">
</Connector>
<Connector port="8014" protocol="AJP/1.3" redirectPort="8447" tomcatAuthentication="false">
</Connector>
<Engine jvmRoute="cfusion2" name="Catalina" defaultHost="localhost">
<Realm className="org.apache.catalina.realm.LockOutRealm">
<Realm className="org.apache.catalina.realm.UserDatabaseRealm" resourceName="UserDatabase">
</Realm>
</Realm>
<Host name="localhost" autoDeploy="false" unpackWARs="true" appBase="webapps">
<Valve pattern="%h %l %u %t "%r" %s %b" directory="logs" prefix="localhost_access_log." className="org.apache.catalina.valves.AccessLogValve" suffix=".txt" resolveHosts="false">
</Valve>
</Host>
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="6">
<Manager notifyListenersOnReplication="true" expireSessionsOnShutdown="false" className="org.apache.catalina.ha.session.DeltaManager">
</Manager>
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership port="45564" dropTime="10000" address="228.0.0.104" className="org.apache.catalina.tribes.membership.McastService" frequency="500">
</Membership>
<Receiver port="4002" autoBind="100" address="auto" selectorTimeout="10000" maxThreads="6" className="org.apache.catalina.tribes.transport.nio.NioReceiver">
</Receiver>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="30000">
</Transport>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector">
</Interceptor>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor">
</Interceptor>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="">
</Valve>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve">
</Valve>
<ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener">
</ClusterListener>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener">
</ClusterListener>
</Cluster>
</Engine>
</Service>
</Server>
Netstat does not show anything else using the same ports.
Any suggestions? Any information is greatly appreciated!
Thanks,
-Tony
