Copy link to clipboard
Copied
I have a 2 member cluster. If member A is running member B will not start. If I have member B running member A will not start. The member that fails to start will eventually time out (Window Service) with this error:
ar 01, 2018 9:22:16 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send
SEVERE: Unable to send message through cluster sender.
org.apache.catalina.tribes.ChannelException: Operation has timed out(3000 ms.).; Faulty members:tcp://{xxx.xxx.xxx.xxx}:4005;
at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:102)
at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:47)
at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:57)
at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:82)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)
at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:91)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:92)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:237)
at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:190)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:684)
at org.apache.catalina.ha.session.DeltaManager.sendSessions(DeltaManager.java:1442)
at org.apache.catalina.ha.session.DeltaManager.handleGET_ALL_SESSIONS(DeltaManager.java:1359)
at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1171)
at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:929)
at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:77)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:783)
at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:764)
at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:300)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)
at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:116)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)
at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)
at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:276)
at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:244)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:213)
at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:101)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Any ideas what the issue could be?
This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:
<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"
Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?
If so, are the instances on the same machine? If so, what if you change that to a different port for e
...Copy link to clipboard
Copied
This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:
<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"
Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?
If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?
If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.
Either way, do let us know. 🙂
Copy link to clipboard
Copied
It appears that there was a port conflict between the two. Thanks Charlie!