CF2016 cluster members won't start

Question

I have a 2 member cluster. If member A is running member B will not start. If I have member B running member A will not start. The member that fails to start will eventually time out (Window Service) with this error:

ar 01, 2018 9:22:16 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send

SEVERE: Unable to send message through cluster sender.

org.apache.catalina.tribes.ChannelException: Operation has timed out(3000 ms.).; Faulty members:tcp://{xxx.xxx.xxx.xxx}:4005;

at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:102)

at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:47)

at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:57)

at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:82)

at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:91)

at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:92)

at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:237)

at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:190)

at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:684)

at org.apache.catalina.ha.session.DeltaManager.sendSessions(DeltaManager.java:1442)

at org.apache.catalina.ha.session.DeltaManager.handleGET_ALL_SESSIONS(DeltaManager.java:1359)

at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1171)

at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:929)

at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:77)

at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:783)

at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:764)

at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:300)

at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:116)

at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:276)

at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:244)

at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:213)

at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:101)

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

at java.lang.Thread.run(Unknown Source)

Any ideas what the issue could be?

Charlie Arehart · Accepted Answer

This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

Either way, do let us know. :-)

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded