Skip to main content
Known Participant
March 1, 2018
Answered

CF2016 cluster members won't start

  • March 1, 2018
  • 1 reply
  • 1014 views

I have a 2 member cluster.  If member A is running member B will not start.  If I have member B running member A will not start.  The member that fails to start will eventually time out (Window Service) with this error:

ar 01, 2018 9:22:16 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send

SEVERE: Unable to send message through cluster sender.

org.apache.catalina.tribes.ChannelException: Operation has timed out(3000 ms.).; Faulty members:tcp://{xxx.xxx.xxx.xxx}:4005;

    at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:102)

    at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:47)

    at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:57)

    at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:82)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:91)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:92)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:237)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:190)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:684)

    at org.apache.catalina.ha.session.DeltaManager.sendSessions(DeltaManager.java:1442)

    at org.apache.catalina.ha.session.DeltaManager.handleGET_ALL_SESSIONS(DeltaManager.java:1359)

    at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1171)

    at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:929)

    at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:77)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:783)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:764)

    at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:300)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:116)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:276)

    at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:244)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:213)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:101)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

    at java.lang.Thread.run(Unknown Source)

Any ideas what the issue could be?

    This topic has been closed for replies.
    Correct answer Charlie Arehart

    This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

    <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

    Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

    If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

    If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

    Either way, do let us know. :-)

    1 reply

    Charlie Arehart
    Community Expert
    Charlie ArehartCommunity ExpertCorrect answer
    Community Expert
    March 1, 2018

    This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

    <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

    Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

    If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

    If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

    Either way, do let us know. :-)

    /Charlie (troubleshooter, carehart. org)
    demarcaoAuthor
    Known Participant
    March 1, 2018

    It appears that there was a port conflict between the two.  Thanks Charlie!