Highlighted

CF2016 cluster members won't start

New Here ,
Mar 01, 2018

Copy link to clipboard

Copied

I have a 2 member cluster.  If member A is running member B will not start.  If I have member B running member A will not start.  The member that fails to start will eventually time out (Window Service) with this error:

ar 01, 2018 9:22:16 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send

SEVERE: Unable to send message through cluster sender.

org.apache.catalina.tribes.ChannelException: Operation has timed out(3000 ms.).; Faulty members:tcp://{xxx.xxx.xxx.xxx}:4005;

    at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:102)

    at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:47)

    at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:57)

    at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:82)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:91)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:92)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:237)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:190)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:684)

    at org.apache.catalina.ha.session.DeltaManager.sendSessions(DeltaManager.java:1442)

    at org.apache.catalina.ha.session.DeltaManager.handleGET_ALL_SESSIONS(DeltaManager.java:1359)

    at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1171)

    at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:929)

    at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:77)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:783)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:764)

    at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:300)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:116)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:276)

    at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:244)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:213)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:101)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

    at java.lang.Thread.run(Unknown Source)

Any ideas what the issue could be?

Adobe Community Professional
Correct answer by Charlie Arehart | Adobe Community Professional

This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

Either way, do let us know. 🙂

Views

279

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

CF2016 cluster members won't start

New Here ,
Mar 01, 2018

Copy link to clipboard

Copied

I have a 2 member cluster.  If member A is running member B will not start.  If I have member B running member A will not start.  The member that fails to start will eventually time out (Window Service) with this error:

ar 01, 2018 9:22:16 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send

SEVERE: Unable to send message through cluster sender.

org.apache.catalina.tribes.ChannelException: Operation has timed out(3000 ms.).; Faulty members:tcp://{xxx.xxx.xxx.xxx}:4005;

    at org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:102)

    at org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:47)

    at org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:57)

    at org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:82)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor.sendMessage(MessageDispatchInterceptor.java:91)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:92)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:78)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:237)

    at org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:190)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:684)

    at org.apache.catalina.ha.session.DeltaManager.sendSessions(DeltaManager.java:1442)

    at org.apache.catalina.ha.session.DeltaManager.handleGET_ALL_SESSIONS(DeltaManager.java:1359)

    at org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1171)

    at org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:929)

    at org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:77)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:783)

    at org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:764)

    at org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:300)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:116)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:83)

    at org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:276)

    at org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:244)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:213)

    at org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:101)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

    at java.lang.Thread.run(Unknown Source)

Any ideas what the issue could be?

Adobe Community Professional
Correct answer by Charlie Arehart | Adobe Community Professional

This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

Either way, do let us know. 🙂

Views

280

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Mar 01, 2018 0
Adobe Community Professional ,
Mar 01, 2018

Copy link to clipboard

Copied

This sure sounds like a port conflict of some sort. So about that port 4005 mentioned in the error, I think you'll find that's the tcpListenPort, specified in the server.xml file within the element:

<Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener"

Do you show that in your server.xml for each instance? Is it defined to be the same value, in the server.xml on both instances?

If so, are the instances on the same machine? If so, what if you change that to a different port for each instance? Just give it a shot, in the one that is not running now and therefore won't start. Does it then start?

If so, the next question would of course be what impact might this have. I have not been able to find good enough docs (in CF or tomcat) to explain that. But the pragmatic question would seem simply whether a) the instances now both come up and b) whether the failover and replication work for you.

Either way, do let us know. 🙂

/Charlie (server troubleshooter, carehart.org)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Mar 01, 2018 0
demarcao LATEST
New Here ,
Mar 01, 2018

Copy link to clipboard

Copied

It appears that there was a port conflict between the two.  Thanks Charlie!

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Mar 01, 2018 0