• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Clustered CF Instances Hang, Difficult to Restart

Participant ,
Jul 26, 2021 Jul 26, 2021

Copy link to clipboard

Copied

Hello all.  Seems We have run into another issue with our newly deployed ColdFusion 2021 servers.  We experienced a similar error back with CF8, and earlier update levels of CF10.  It seems  the issue was resolved in later updates of CF10, but now with CF2021, this issue seems to be back again.

 

We have three physical bare metal servers, each running four clustered instances of coldfusion.  All of these are behind a fortinet load balancer.  While the CF instances on each box are nammed the same (cfusion1 - cfusion4) each box's CF cluster is on a different port to eliminate multicast confusion between machines.  We have changed channelSendOptions to 6 in the server.xml files in order to reduce the number of "Session Already Invalidated" error messages in the coldfusion-error.log files.

 

While we dont have much problem restarting instances when the server has been removed from the load balancer, we do see difficulty restarting instances even under moderate load.   The CF instance will appear to start fine form the command line, however the instance never starts taking traffic, and the  CFIDE/administrator for that instance will not load.  Upon trying to stop the hung instance, we get an error:

 

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion stop
Stopping ColdFusion 2021 server, please wait
Jul 22, 2021 10:37:43 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started.
ColdFusion 2021 server has been stopped

[root@Node1 ~]# /opt/ColdFusion2021/cfusion1/bin/coldfusion start
Starting ColdFusion 2021 server ...
======================================================================
ColdFusion 2021 server has been started.
ColdFusion 2021 will write logs to /opt/ColdFusion2021/cfusion1/bin/../logs/coldfusion-out.log
======================================================================

 

There doesn't appear to be an useful information in the coldfusion-error.log, nor the logs of it's peers.

 

Do I need to make any adjustments to the tomcat cluster timeouts perhaps?  Amd I missing some other type of best practice when clustering CF instances?   Any suggestions on how to troubleshoot this further?

 

Thanks for any advice, 
-Tony

 

 

Views

4.2K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 26, 2021 Jul 26, 2021

Copy link to clipboard

Copied

Sorry, I should note that this is a Red Hat Enterprise Linux 8.4 OS, running ColdFusion2021 Update 1, and using the shipped CF JRE.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 29, 2021 Jul 29, 2021

Copy link to clipboard

Copied

Doesn't sound good.

What appears in coldfusion-error.log and server.log? Could you share the contents?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 29, 2021 Jul 29, 2021

Copy link to clipboard

Copied

Sure.   Ill try to pull some info out from the restarts we did while under load.

 

Here is the coldfusion-error.log during one such shutdown where I got the error:

 

# /opt/ColdFusion2021/cfusion1/bin/coldfusion stop
Stopping ColdFusion 2021 server, please wait
Jul 22, 2021 10:37:43 PM com.adobe.coldfusion.launcher.Launcher stopServer
SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started.
ColdFusion 2021 server has been stopped

 

 

Jul 22, 2021 10:34:36 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler ["http-nio-8501"]
Jul 22, 2021 10:34:36 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
Jul 22, 2021 10:34:41 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: The Apache Tomcat Native library which allows using OpenSSL was not found on the java.library.path: [/opt/ColdFusion2021/cfusion1/bin/../lib:/opt/ColdFusion2021/cfusion1/bin/../lib/_linux64::/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]
Jul 22, 2021 10:34:41 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-nio-8501"]
Jul 22, 2021 10:34:42 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
Jul 22, 2021 10:34:42 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service [Catalina]
Jul 22, 2021 10:34:42 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet engine: [Apache Tomcat/9.0.41]
Jul 22, 2021 10:34:42 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Jul 22, 2021 10:34:42 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:[/10.10.240.111:4001]
Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.util.UUIDGenerator <clinit>
INFO: Creation of SecureRandom instance for UUID generation using [DRBG] took [8,013] milliseconds.
Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to [500]
Jul 22, 2021 10:34:50 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[4]
Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14416877, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14383954, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:34:50 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=14350371, securePort=-1, UDP Port=-1, id={62 125 69 96 -17 114 72 18 -73 41 -81 6 1 -15 118 38 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:[4]
Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[8]
Jul 22, 2021 10:34:51 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:[104857600] bytes of type: [org.apache.catalina.tribes.io.BufferPool15Impl]
Jul 22, 2021 10:34:52 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:[8]
Jul 22, 2021 10:34:55 PM org.apache.catalina.util.SessionIdGeneratorBase createSecureRandom
WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [2,581] milliseconds.
Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Register manager [localhost#] to cluster element [Engine] with name [Catalina]
Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Starting clustering manager at [localhost#]
Jul 22, 2021 10:34:55 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#], requesting session state from [org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14421378, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={}, domain={}]]. This operation will timeout if no session state has been received within [60] seconds.
Jul 22, 2021 10:34:55 PM java.io.ObjectInputFilter$Config lambda$static$0
INFO: Creating serialization filter from !org.mozilla.**;!com.sun.syndication.**;!org.apache.commons.beanutils.**
Jul 22, 2021 10:35:00 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions
INFO: Manager [localhost#]; session state sent at [7/22/21 10:34 PM] received in [5,210] ms.
Jul 22, 2021 10:35:20 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=14447694, securePort=-1, UDP Port=-1, id={-42 115 82 28 94 81 68 -106 -83 -82 10 -41 115 5 -3 113 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}]]
Jul 22, 2021 10:35:38 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=1005, securePort=-1, UDP Port=-1, id={-69 96 -18 100 -111 -42 73 90 -67 73 78 -44 -48 23 -31 17 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:36:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14469480, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]] message. Will verify.
Jul 22, 2021 10:36:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=14469480, securePort=-1, UDP Port=-1, id={31 -48 -24 -94 -77 -103 70 -126 -86 25 -118 -27 9 117 97 -3 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:36:27 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4003,{10, 10, 240, 111},4003, alive=1005, securePort=-1, UDP Port=-1, id={99 -46 79 6 30 126 66 -17 -93 -20 98 18 126 -46 109 -122 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:36:55 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberDisappeared
INFO: Received member disappeared:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=14475665, securePort=-1, UDP Port=-1, id={62 125 69 96 -17 114 72 18 -73 41 -81 6 1 -15 118 38 }, payload={}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={}]]
Jul 22, 2021 10:37:15 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4004,{10, 10, 240, 111},4004, alive=1005, securePort=-1, UDP Port=-1, id={-95 52 111 -107 -57 105 66 -3 -73 15 -76 125 67 -64 90 125 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:38:48 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFO: The Apache Tomcat Native library which allows using OpenSSL was not found on the java.library.path: [/opt/ColdFusion2021/cfusion1/bin/../lib:/opt/ColdFusion2021/cfusion1/bin/../lib/_linux64::/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib]
Jul 22, 2021 10:38:48 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-nio-8501"]
Jul 22, 2021 10:38:48 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
Jul 22, 2021 10:38:48 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service [Catalina]
Jul 22, 2021 10:38:48 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet engine: [Apache Tomcat/9.0.41]
Jul 22, 2021 10:38:48 PM org.apache.catalina.ha.tcp.SimpleTcpCluster startInternal
INFO: Cluster is about to start
Jul 22, 2021 10:38:48 PM org.apache.catalina.tribes.transport.ReceiverBase bind
INFO: Receiver Server Socket bound to:[/10.10.240.111:4001]
Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.util.UUIDGenerator <clinit>
INFO: Creation of SecureRandom instance for UUID generation using [DRBG] took [8,011] milliseconds.
Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.membership.McastServiceImpl setupSocket
INFO: Setting cluster mcast soTimeout to [500]
Jul 22, 2021 10:38:56 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[4]
Jul 22, 2021 10:38:57 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:[4]
Jul 22, 2021 10:38:57 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Sleeping for [1000] milliseconds to establish cluster membership, start level:[8]
Jul 22, 2021 10:38:58 PM org.apache.catalina.tribes.membership.McastServiceImpl waitForMembers
INFO: Done sleeping, membership established, start level:[8]
Jul 22, 2021 10:39:01 PM org.apache.catalina.util.SessionIdGeneratorBase createSecureRandom
WARNING: Creation of SecureRandom instance for session ID generation using [SHA1PRNG] took [2,567] milliseconds.
Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Register manager [localhost#] to cluster element [Engine] with name [Catalina]
Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager startInternal
INFO: Starting clustering manager at [localhost#]
Jul 22, 2021 10:39:01 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessions
INFO: Manager [localhost#]: skipping state transfer. No members active in cluster group.
Jul 22, 2021 10:39:02 PM org.apache.catalina.core.ApplicationContext log
INFO: ColdFusionStartUpServlet: ColdFusion: Starting application services
Jul 22, 2021 10:39:02 PM org.apache.catalina.core.ApplicationContext log
INFO: ColdFusionStartUpServlet: ColdFusion: VM version = 11.0.1+13-LTS
Jul 22, 2021 10:39:07 PM java.io.ObjectInputFilter$Config lambda$static$0
INFO: Creating serialization filter from !org.mozilla.**;!com.sun.syndication.**;!org.apache.commons.beanutils.**
Jul 22, 2021 10:39:08 PM org.apache.catalina.ha.session.JvmRouteBinderValve startInternal
INFO: JvmRouteBinderValve started
Jul 22, 2021 10:39:08 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-nio-8501"]
Jul 22, 2021 10:39:08 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-nio-127.0.0.1-8012"]
Jul 22, 2021 10:39:08 PM com.adobe.coldfusion.launcher.Launcher run
INFO: Server startup in 19576 ms
Jul 22, 2021 10:39:35 PM org.apache.catalina.tribes.io.BufferPool getBufferPool
INFO: Created a buffer pool with max size:[104857600] bytes of type: [org.apache.catalina.tribes.io.BufferPool15Impl]
Jul 22, 2021 10:39:35 PM org.apache.catalina.ha.tcp.SimpleTcpCluster memberAdded
INFO: Replication member added:[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=1005, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:39:44 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=4507, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]] message. Will verify.
Jul 22, 2021 10:39:44 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared
INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 10, 240, 111}:4002,{10, 10, 240, 111},4002, alive=4507, securePort=-1, UDP Port=-1, id={10 52 -57 -66 -33 -74 70 -50 -97 117 -73 -20 -104 2 45 111 }, payload={}, command={}, domain={}]]
Jul 22, 2021 10:39:44 PM org.apache.catalina.ha.tcp.SimpleTcpCluster send

 

 

 

 

 

 

And here's the same timeframe from server.log

 

"Information","main","07/22/21","18:34:10",,"Starting logging..."
"Information","main","07/22/21","18:34:10",,"Starting license..."
"Information","main","07/22/21","18:34:12",,"Enterprise Edition enabled"
"Information","main","07/22/21","18:34:12",,"Starting crypto..."
"Information","main","07/22/21","18:34:12",,"Installed JSafe JCE provider: Version 6.21 Crypto-J 6.2.1, EMC Corporation. JsafeJCE Security Provider (implements RSA, DSA, ECDSA, Diffie-Hellman, ECDH, A
ES, DES, Triple DES, DESX, RC2, RC4, RC5, PBE, MD2, MD5, RIPEMD160, SHA1, SHA224, SHA256, SHA384, SHA512, HMAC-MD5, HMAC-RIPEMD160, HMAC-SHA1, HMAC-SHA224, HMAC-SHA256, HMAC-SHA384, HMAC-SHA512, HMACD
RBG, HASHDRBG, CTRDRBG, FIPS186PRNG, SHA1PRNG, MD5PRNG; RFC 3394, RFC 5649 AES Key Wrap; X.509 CertificateFactory; PKCS12, PKCS15 KeyStore; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathValidator
s; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathBuilders; LDAP, Collection CertStores)"
"Information","main","07/22/21","18:34:12",,"Starting security..."
"Information","main","07/22/21","18:34:12",,"Starting scheduler..."
"Information","main","07/22/21","18:34:12",,"Starting WatchService..."
"Information","main","07/22/21","18:34:12",,"Starting sql..."
"Information","main","07/22/21","18:34:13",,"Starting runtime..."
"Information","main","07/22/21","18:34:14",,"Starting client..."
"Information","main","07/22/21","18:34:14",,"Starting archive..."
"Information","main","07/22/21","18:34:14",,"Starting CloudConfig..."
"Information","main","07/22/21","18:34:14",,"Starting VendorCredential..."
"Information","main","07/22/21","18:34:14",,"Starting rest..."
"Information","main","07/22/21","18:34:14",,"Starting registry..."
"Information","main","07/22/21","18:34:14",,"Package adminapi started..."
"Information","main","07/22/21","18:34:14",,"Package administrator started..."
"Information","main","07/22/21","18:34:14",,"Package redissessionstorage started..."
"Information","main","07/22/21","18:34:14",,"Package debugger started..."
"Information","main","07/22/21","18:34:14",,"Package zip started..."
"Information","main","07/22/21","18:34:14",,"Package image started..."
"Information","main","07/22/21","18:34:14",,"Package caching started..."
"Information","main","07/22/21","18:34:14",,"Package cfmongodb started..."
"Information","main","07/22/21","18:34:14",,"Package mail started..."
"Information","main","07/22/21","18:34:14",,"Package spreadsheet started..."
"Information","main","07/22/21","18:34:15",,"Package axis started..."
"Information","main","07/22/21","18:34:15",,"Package chart started..."
"Information","main","07/22/21","18:34:15",,"Package feed started..."
"Information","main","07/22/21","18:34:15",,"Package print started..."
"Information","main","07/22/21","18:34:15",,"Package search started..."
"Information","main","07/22/21","18:34:15",,"Package document started..."
"Information","main","07/22/21","18:34:15",,"Package presentation started..."
"Information","main","07/22/21","18:34:15",,"Package eventgateways started..."
"Information","main","07/22/21","18:34:15",,"Package dotnet started..."
"Information","main","07/22/21","18:34:15",,"Package pmtagent started..."
"Information","main","07/22/21","18:34:15",,"Package htmltopdf started..."
"Information","main","07/22/21","18:34:15",,"Package awslambda started..."
"Information","main","07/22/21","18:34:15",,"com package will not be deployed as it is not installed."
"Information","main","07/22/21","18:34:15",,"Package saml started..."
"Information","main","07/22/21","18:34:15",,"Package awss3 started..."
"Information","main","07/22/21","18:34:15",,"Package awss3legacy started..."
"Information","main","07/22/21","18:34:15",,"Package azureblob started..."
"Information","main","07/22/21","18:34:15",,"Package pdf started..."
"Information","main","07/22/21","18:34:15",,"Package websocket started..."
"Error","Thread-24","07/22/21","18:34:15",,"Connect to 127.0.0.1:8993 [/127.0.0.1] failed: Connection refused (Connection refused) http://127.0.0.1:8993/PDFgServlet/"
"Information","main","07/22/21","18:34:15",,"WebSocket server listens on port: 8576"
"Information","main","07/22/21","18:34:15",,"Package orm started..."
"Information","main","07/22/21","18:34:15",,"Package ormsearch started..."
"Information","main","07/22/21","18:34:15",,"Package ajax started..."
"Information","main","07/22/21","18:34:15",,"Package derby started..."
"Information","main","07/22/21","18:34:15",,"Package oracle started..."
"Information","main","07/22/21","18:34:15",,"Package mysql started..."
"Information","main","07/22/21","18:34:15",,"Package db2 started..."
"Information","main","07/22/21","18:34:15",,"Package sybase started..."
"Information","main","07/22/21","18:34:15",,"Package postgresql started..."
"Information","main","07/22/21","18:34:15",,"Package sqlserver started..."
"Information","main","07/22/21","18:34:15",,"odbc package will not be deployed as it is not installed."
"Information","main","07/22/21","18:34:15",,"Package scheduler started..."
"Information","main","07/22/21","18:34:15",,"Package ftp started..."

"Information","main","07/22/21","18:34:15",,"Package awssqs started..."
"Information","main","07/22/21","18:34:15",,"Package awssns started..."
"Information","main","07/22/21","18:34:15",,"Package azureservicebus started..."
"Information","main","07/22/21","18:34:15",,"Package awsdynamodb started..."
"Information","main","07/22/21","18:34:15",,"Package report started..."
"Information","main","07/22/21","18:34:15",,"Package exchange started..."
"Information","main","07/22/21","18:34:15",,"Package sharepoint started..."
"Information","main","07/22/21","18:34:15",,"ColdFusion started"
"Information","main","07/22/21","18:34:15",,"ColdFusion: application services are now available"
"Information","Thread-10","07/22/21","18:34:17",,"A same serial number has been found on another ColdFusion server. The server may be out of compliance."
"Information","Thread-27","07/22/21","22:34:36",,"ColdFusion stopped"
"Information","main","07/22/21","22:39:02",,"Starting logging..."
"Information","main","07/22/21","22:39:02",,"Starting license..."
"Information","main","07/22/21","22:39:04",,"Enterprise Edition enabled"
"Information","main","07/22/21","22:39:04",,"Starting crypto..."
"Information","main","07/22/21","22:39:04",,"Installed JSafe JCE provider: Version 6.21 Crypto-J 6.2.1, EMC Corporation. JsafeJCE Security Provider (implements RSA, DSA, ECDSA, Diffie-Hellman, ECDH, AES, DES, Triple DES, DESX, RC2, RC4, RC5, PBE, MD2, MD5, RIPEMD160, SHA1, SHA224, SHA256, SHA384, SHA512, HMAC-MD5, HMAC-RIPEMD160, HMAC-SHA1, HMAC-SHA224, HMAC-SHA256, HMAC-SHA384, HMAC-SHA512, HMACDRBG, HASHDRBG, CTRDRBG, FIPS186PRNG, SHA1PRNG, MD5PRNG; RFC 3394, RFC 5649 AES Key Wrap; X.509 CertificateFactory; PKCS12, PKCS15 KeyStore; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathValidators; X.509V1, PKIX, PKIX-SuiteB, PKIX-SuiteBTLS CertPathBuilders; LDAP, Collection CertStores)"
"Information","main","07/22/21","22:39:04",,"Starting security..."
"Information","main","07/22/21","22:39:04",,"Starting scheduler..."
"Information","main","07/22/21","22:39:04",,"Starting WatchService..."
"Information","main","07/22/21","22:39:04",,"Starting sql..."
"Information","main","07/22/21","22:39:05",,"Starting runtime..."
"Information","main","07/22/21","22:39:06",,"Starting client..."
"Information","main","07/22/21","22:39:06",,"Starting archive..."
"Information","main","07/22/21","22:39:06",,"Starting CloudConfig..."
"Information","main","07/22/21","22:39:06",,"Starting VendorCredential..."
"Information","main","07/22/21","22:39:06",,"Starting rest..."
"Information","main","07/22/21","22:39:06",,"Starting registry..."
"Information","main","07/22/21","22:39:06",,"Package adminapi started..."
"Information","main","07/22/21","22:39:06",,"Package administrator started..."
"Information","main","07/22/21","22:39:06",,"Package redissessionstorage started..."
"Information","main","07/22/21","22:39:06",,"Package debugger started..."
"Information","main","07/22/21","22:39:06",,"Package zip started..."
"Information","main","07/22/21","22:39:06",,"Package image started..."
"Information","main","07/22/21","22:39:06",,"Package caching started..."
"Information","main","07/22/21","22:39:06",,"Package cfmongodb started..."
"Information","main","07/22/21","22:39:06",,"Package mail started..."
"Information","main","07/22/21","22:39:06",,"Package spreadsheet started..."
"Information","main","07/22/21","22:39:07",,"Package axis started..."
"Information","main","07/22/21","22:39:07",,"Package chart started..."
"Information","main","07/22/21","22:39:07",,"Package feed started..."
"Information","main","07/22/21","22:39:07",,"Package print started..."
"Information","main","07/22/21","22:39:07",,"Package search started..."
"Information","main","07/22/21","22:39:07",,"Package document started..."
"Information","main","07/22/21","22:39:07",,"Package presentation started..."
"Information","main","07/22/21","22:39:07",,"Package eventgateways started..."
"Information","main","07/22/21","22:39:07",,"Package dotnet started..."
"Information","main","07/22/21","22:39:07",,"Package pmtagent started..."
"Information","main","07/22/21","22:39:07",,"Package htmltopdf started..."
"Information","main","07/22/21","22:39:07",,"Package awslambda started..."
"Information","main","07/22/21","22:39:07",,"com package will not be deployed as it is not installed."
"Information","main","07/22/21","22:39:07",,"Package saml started..."
"Information","main","07/22/21","22:39:07",,"Package awss3 started..."
"Information","main","07/22/21","22:39:07",,"Package awss3legacy started..."
"Information","main","07/22/21","22:39:07",,"Package azureblob started..."
"Information","main","07/22/21","22:39:07",,"Package pdf started..."
"Information","main","07/22/21","22:39:07",,"Package websocket started..."
"Error","Thread-24","07/22/21","22:39:07",,"Connect to 127.0.0.1:8993 [/127.0.0.1] failed: Connection refused (Connection refused) http://127.0.0.1:8993/PDFgServlet/"
"Information","main","07/22/21","22:39:07",,"WebSocket server listens on port: 8576"
"Information","main","07/22/21","22:39:07",,"Package orm started..."
"Information","main","07/22/21","22:39:07",,"Package ormsearch started..."
"Information","main","07/22/21","22:39:07",,"Package ajax started..."
"Information","main","07/22/21","22:39:07",,"Package derby started..."
"Information","main","07/22/21","22:39:07",,"Package oracle started..."
"Information","main","07/22/21","22:39:07",,"Package mysql started..."
"Information","main","07/22/21","22:39:07",,"Package db2 started..."
"Information","main","07/22/21","22:39:07",,"Package sybase started..."

"Information","main","07/22/21","22:39:07",,"Package postgresql started..."
"Information","main","07/22/21","22:39:07",,"Package sqlserver started..."
"Information","main","07/22/21","22:39:07",,"odbc package will not be deployed as it is not installed."
"Information","main","07/22/21","22:39:07",,"Package scheduler started..."
"Information","main","07/22/21","22:39:07",,"Package ftp started..."
"Information","main","07/22/21","22:39:07",,"Package awssqs started..."
"Information","main","07/22/21","22:39:07",,"Package awssns started..."
"Information","main","07/22/21","22:39:07",,"Package azureservicebus started..."
"Information","main","07/22/21","22:39:07",,"Package awsdynamodb started..."
"Information","main","07/22/21","22:39:07",,"Package report started..."
"Information","main","07/22/21","22:39:07",,"Package exchange started..."
"Information","main","07/22/21","22:39:07",,"Package sharepoint started..."
"Information","main","07/22/21","22:39:07",,"ColdFusion started"
"Information","main","07/22/21","22:39:07",,"ColdFusion: application services are now available"
"Information","Thread-10","07/22/21","22:39:09",,"A same serial number has been found on another ColdFusion server. The server may be out of compliance."

 

 

 

Thanks for any suggestions you might have, 
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

Tony, I see no errors in those logs. As for whatever you are experiencing, there could be a number of explanations...too many to list (what to see, how to interpret it, what to look at next based on that), so while you may prefer to see if BKBK or anyone else may throw you a lifeline, I will say that I'm confident we can find and resolve this in an online consulting session. Indeed, I'm so confident that you won't pay for the time if it's not valuable. If interested, see carehart.org/consulting. We may not need even a hour (and I bill in only 15 min increments). 

 

Otherwise, I'll still be watching and will chime in if I have anything more to offer. 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

Hi Charlie - 

 

I have brought up your consulting services to my manager and CTO.  I think they want to see how things shake out over the next month or so to see if we can get things to a manageable place or not.  But if we cannot find relief researching on our own, You are certainly our first choice in consulting.

 

That said, I notice that Tomcat is running DeltaManager for the simple TCP cluster again.  In the past we have found relief by moving towards backupManager instead of DeltaManager.  Does anyone know if this is still a viable option?  It no longer seems specified in the server.xml.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

My understanding is that BackupManager only copies to one other instance, and DeltaManager copies to all other instances. So I'm not sure how well BackupManager actually solves your underlying problem - keeping all the instances in sync. But I'm not much of a Tomcat guy.

 

Dave Watts, Eidolon LLC

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

I agree with @Charlie Arehart that there aren't any errors in your logs. It sounds to me like you just have a lot of session replication happening when you start up an instance, and that can take a long time. CF clustering is peer-to-peer, so any time any one of your instances changes the session state, that change has to go to every other server in the cluster, and there's a lot of traffic in all directions. I see something kind of similar with any other server where a bunch of data has to be loaded into memory before it can start doing work - I saw that back in the day with Adobe LiveCycle ES where it would take 20+ minutes to restart the service, and I've seen it with CF-based CMSs where a lot of data is loaded into memory at startup. So, my recommendation would be to either get used to it, or reduce the amount of user-specific data, or use an alternative to session replication like shared database (maybe client storage, maybe something you roll yourself).

 

Dave Watts, Eidolon LLC

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

Thanks Dave - 

 

I suspect you may be right on the money, and this is purely a matter of replicating a large quantity of session data.  I will speak with my Dev manager and see if they would be willing to reduce the session timeout or otherwise find a way to reduce data.  Previously they were reluctant to move towards a centralized session option, though I dont recall why.

 

But at least going on this asssumption, I may be able to work aorund it somewhat.

 

Thanks again,
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

The bad part about centralizing session data is that it's going to be exponentially slower to talk to a database than to talk to RAM. The good part is that usually it doesn't matter that much, and you avoid the pain of peer-to-peer networking.

 

Dave Watts, Eidolon LLC

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

Going on this premise, it seems that when a server hangs on startup, it will never sync no matter how long we wait.  Is it a matter of hitting a timeout?   Looking at the tomcat documentation for the timeout values in  the server.xml, nothing immediately jumps out at me.

 

This also doesn't seem to explain why we didnt have this problem with CF10 after update 14.  However my dev manager just mentioned that sessions did not seem to be replicated correctly on CF10, so maybe tomcat was never replicating sessions in CF10, thefore we never experienced the delay.   Strance though, since we certainly had j2ee sessions enabled.

 

Thanks again for your help and suggestions,
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

I've seen occasions where session replication didn't happen properly, and of course it's an optional setting whether or not you have J2EE sessions enabled. J2EE sessions are just a prerequisite for session replication.

 

Dave Watts, Eidolon LLC

Dave Watts, Eidolon LLC

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

And I will add, on top of all of Dave's typically awesome advice, that these are indeed some of the things I might have had us pursue...but whereas in forums like this, we all tend to share knobs to tweak (which may or may not be the right one, or the right one for your situation, or where it's not so clear if a given knob setting would help or hurt), I would have instead pursued diagnostics to identify the issue, so as to know almost for sure what WAS the issue, and then to consider how best to solve it.

 

And also, when it comes to alternatives to solve any problem, there can be many options with pros and cons. Some of that has been offered here already. For intsance you mention lowering session timeouts, and that's an option; though sometimes the better solution may be to find and prevent excessive creation of sessions (and there are many possible explanations and solutions for that, alone).

Or Dave mentioned a central session store, and you indicated having considered that in the past. There were Tomcat-based ways to store sessions in a DB, that may have had their warts. But CF2016 added a new ability to store CF sessions in a Redis datastore (even without CF Enterrise and even without using clustering, though it works for that use case as well). It's configured instead on the memory variables page of the Admin. That said, it specifically does NOT work with J2EE sessions but instead "plain CF sessions"--and yet those can be made to be as secure as and work the SAME as J2EE sessions, so it's not the gotcha some would think.

Then again, perhaps the problem (of the instance failing under load) is NOT related to session replication at all. Maybe there is an issue with the metaspace or heap, or some other JVM or CF configuration issue, and so on. 

So again there may be more at play here, but the good news is that there will be some solution. And this is where looking into things would more effectively get you to a solution. I realize in an earlier comment you said that your management "want to see how things shake out over the next month or so". OK, of course.

 

But then again if you have CF crashing or hanging even once, it would seem to make good economic sense to spend even an hour of time to look into things, or perhaps a couple. (As I noted, you won't pay for time you don't find valuable.)

 

But sure, we can all let this play out. It will benefit future readers, in addition to saving yourorg some money. I just fear that it could be a slog to get to the right problem and right solution this way, but we may well.


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 30, 2021 Jul 30, 2021

Copy link to clipboard

Copied

Thanks Charlie - 

 

The instances are not randomly crashing, so once they are up and running, they seem to be fairly stable.  We only seem to get hung instances on restart, or instance stop / start when needed for changes.  If need be, we pull the whole server from the load balancer and restart without any load.  For this reason, we dont have a lot of service affecting outages, its more just frustration for us on  the back end.  But I do largely agree, the consulting fees would probably seem fairly small compared to shelling out for five enterprise licenses.  It could also have more to do with saving face explaining it to the owner of the company.  I dont rightly have a full grasp on the management side of things, atleast not behind the curtain.   Regardless, I will continue to suggest your services, even if we try to sell it as tuning or optimization.

 

 

Back to the talking out loud about this issue:

 

I was looking back at some long since forgotten notes of mine from the CF8 days, it seems we had extended the timeout on the HTTP/1.1 connector (port 8501) in the past.  20 seconds seems rather short.  In fact, we honestly have no way of knowing if its only the HTTP (port 8501) connector that is broken since that's how we are monitoring CF instances.  For all we know, the AJP (Port 8012) connector may be happily serving pages to apache, and we only think the instance is hosed because we monitor on port 8501.

 

Ill have to do some more reading up on things, but right now that might be our first stab.

 

Thank you to all once again, 
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 31, 2021 Jul 31, 2021

Copy link to clipboard

Copied

@GuitsBoy , I have seen that the issue here is similar to a ColdFusion 10 bug you reported:

https://community.adobe.com/t5/coldfusion/cluster-replication-issues-in-cf10-on-rhel6/td-p/6674898 
https://tracker.adobe.com/#/view/CF-3857664 

 

ColdFusion 2021 might still have the bug, or a similar one. So, I would suggest you open a bug ticket for the current issue.

 

That said, the issue seems to be caused by sub-optimal communication between ColdFusion and Tomcat. The error-message "SEVERE: Shutdown Port 8007is not active. Stop the server only after it is started" suggests that Tomcat had already stopped when, at Jul 22, 2021 10:37:43 PM, ColdFusion sent it the command to stop.

 

In addition, we can see that Tomcat had restarted by Jul 22, 2021 10:37:43 PM. However, it is unclear when and how the server-start or server-restart command was issued. That lost information is crucial. I consider it to be symptomatic of a bug. Also symptomatic of a bug is the disappearance of a cluster member ("Jul 22, 2021 10:36:18 PM org.apache.catalina.tribes.group.interceptors.TcpFailureDetector memberDisappeared"). Though Tomcat later made up for this and verified that the member was still there.

 

In spite of the communication hitches, the logs look OK. Some solutions others have used in similar situations are:

 

1) Change the value of the channelSendOptions attribute in <cluster> from the default channelSendOptions="8" (asynchronous replication) to channelSendOptions="6" (synchronous replication);

2) Move the XML element <manager> from server.xml to context.xml.

3) Report a bug. 🙂

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 31, 2021 Jul 31, 2021

Copy link to clipboard

Copied

The following might or might not have anything to do with it, but does make sense:

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jul 31, 2021 Jul 31, 2021

Copy link to clipboard

Copied

Thanks BKBK - 

 

You found an old post I hadn't found when trying to dig up old information, thanks.

I thought it was CF10 update14 that fixed  the issue, but it seem here that update 14 actually broke things instead.  However later updates corrected the problem, since we were most recently using CF10 update 23 without any tweaks to the stock server.xml.

 

To answer some of your questions, any server restart was done by me.  The server did not automatically start up or recover in any way.  I would stop the server, get the "stop the server only after its started" error, then start it again.  I would continue this pattern increasing the time between commands until the startup was successful.

 

As for the "member disappeared" error, this was likely due to me stopping another peer instance / different member of the CF cluster.  I assumed this message was normal when a member suddenly disappears (is restarted).

 

We are currently running channelsendoptions 6, to keep from getting thousands of "session already invalidated" messages in the logs.

 

What is the idea behind moving the <manager> element to context.xml?

 

I will report a bug on monday when I am back in the office.  

 

In the past, we had issues with using an external (RedHat provided) JRE, so the dev manager prefers we stick to the officially bundled JRE that comes with CF.   Your link seems to be Quasi official since it's distributed by Adobe, but does stray from the stock install.  Will these JRE updates be included in the next hotfix/update 2?  Are there any known issues that would suggest Java 11.0.12 might fix this problem?

 

Thank you again for your assistance.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Dec 20, 2021 Dec 20, 2021

Copy link to clipboard

Copied

Seems this issue has still not been fixed.  I installed Update3 today to fix the log4j vulnerability and have EXTREME difficulty getting all my instances to come back up again.  It takes as many as ten tries to sucessfully restart an instance.  I'm fairly confident I had opened a bug on this, but I didnt see it listed.  Perhaps it was under a different (work) account.  Either way, I re-submitted the bug under this account.

 

Just for my own sanity, can someone look over what modifications need to be made from the stock server.xml config for my particular environment?

 

To sum up, we have three bare metal RHEL8 servers, (node1 - node3).  Each server has coldfusion enterprise installed, and has four additional instances each (cfusion1 - cfusion4).

 

Here's a sample server.xml cluster block:

<Cluster channelSendOptions="6" className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership address="228.0.0.1" port="45501" className="org.apache.catalina.tribes.membership.McastService" dropTime="3000" frequency="500"/>
<Receiver selectorTimeout="5000" address="auto" autoBind="100" port="4011" className="org.apache.catalina.tribes.transport.nio.NioReceiver" maxThreads="6"/>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
</Sender>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor"/>
</Channel>
<Valve filter="" className="org.apache.catalina.ha.tcp.ReplicationValve"/>
<Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>

 

Both the Membership Address and the Membership Port have been changed to unique entries for each cluster / physical server. (Total of 3 values)

 

The receiver port has been changed to be unique across all twelve instances.

 

By default, all I should really have to change is the Membership Port to be unique for each of teh three clusters.  I shouldnt need to change the Membership Address, nor the Receiver port, but I am trying to avoid any potential port conflicts.  Still, this doesnt seem to help at all.

 

The only thing that does seem to help is to make each Membership Port unique for each of teh 12 instances, however I believe this puts each of the 12 instances onto it's own individual cluster of 1, and there would be no failover or session replication.  However when we do have the instances isolated, they seem to happily restart every (or almost every) time.

 

I cant imagine we're the only people to run CF with this topology, and the fact that it's still broken after six months is really frustrating.  

Does anyone else have any suggestions?  Thank you,

 

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 20, 2021 Dec 20, 2021

Copy link to clipboard

Copied

Yes, I have a suggestion. But first, I do have a question. You refer to going to update 3, but can you please clarify whether you had previously been on update 2? or were you on something earlier?

 

I ask becuase if you look at the technote for update 2, there is a mention of "known issues", one of which is this:

"When the multicast port is busy in your environment, there will be errors in logs after restarting the instances that are part of a cluster. To resolve the issue, in the ColdFusion Admin, change the multicast port in the Cluster Manager page"

 

I realize it refers to "errors" and you are instead finding that the instance will not start.  We can't know for sure (from this alone) whether this does or does not relate to a problem like you have.

 

And you may say, "ok, but we said we moved to update 3". I heard that. But update 3 ONLY fixed the log4j issue. It did not fix ANY problems (bugs or known issues) in update 2. As such, if you hit this problem in update 2, you will hit it still in update 3. Even if you were on update 1 or earlier, though, it's not clear if the lack of mention of this in the technote for update 1 means it was NOT known, only that it may not have been documented

 

And I will admit that when I saw this in the technote for update 2, I wondered what Adobe may have meant about "change the multicast port". Change it to what? How to know if it should be changed to some port that IS or IS NOT already in use. But I wanted to point this out for you in seeking "any suggestions". 

 

Give it some consideration, and try changing it to something else if you well, and let us know how it goes, either way.

 

And if you may feel that you need more insight into this, rather than ask here (which may or may not get a response from Adobe on this), send an email to cfsup@adobe.com, and ask them to elaborate on what this point about really meant for you. That's a simpler objective question compared to instead asking them for help on your specific problem (though of course, you could consider asking them that also...I just recommend you ask that separately, as that may be much harder to get help for than this first simpler question.)

 

Hope that helps.


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Dec 20, 2021 Dec 20, 2021

Copy link to clipboard

Copied

Thanks Charlie.  We certainly were on update 2, as well as a hotfix for the broken CFReports / Jasper issue you also helped me find a workaround for.  Incidentally, update3 blows away any hotfixes or patches, and they need to be reapplied after update 3, which is how I found myself back in this whole mess today.

Since we have  three physical servers, each with its own cluster, and separate multicast ports, we have already changed the port from default on at least two of the three boxes.  We noticed the issue persists across all three machines.  It seems like if that were the fix, it would have already have been in place, unless the port needs  to be changes again AFTER update 2 was in place.

 

Quite honestly, I'm a little unfamiliar with in what capacity adobe actually offers support on their product.  I was once told they offer basic installation support, but nothing beyond plain vanilla install, and no customization.  Considering the licenses are nearly 10K a pop, its a little disappointing they dont even reply to my emails.  But perhaps I'm asking too much or sending it to the wrong place.  Ill try the email you suggested.

Thanks again, 
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 20, 2021 Dec 20, 2021

Copy link to clipboard

Copied

First, yep, I was aware of the issue of the updates only including the change for log4j and no more, thus calling for you to add the hotfixes, and had shared word of it. Just lots of places people go looking for help.

 

As for the multicast ports, I'm not really familiar with that but I'm pretty sure the different servers should have the same ports and have their firewalls opened to be sharing them between the servers. Since the cf session replication (which is failing for you) is in fact tomcat session replication, consider also looking to tomcat resources (docs and communities) for more on this. 

 

Also, consider switching away from that replication to the "new" redis session sharing that was added to cf2016. You may find it eliminates all this trouble. 

 

Finally, as for your support concerns, first where have you been sending emails? If you mean here, no, this is not a place to expect help from Adobe. Note that it's called the community forums. Sometimes they chime in: most often they don't.

 

As for where to get direct support from them, they do offer free installation support, at cfinstal@adobe.com. Then they also offer paid support, per call or via subscription. I just Googled:

ColdFusion support plans

 

And the first result has all the info, it seems:

 

https://coldfusion.adobe.com/2018/12/adobe-coldfusion-support-policies-and-options-faq/

 

Bottom line, you should be able to get the enterprise-class supoort you expect for CF as an enterprise product. (Some don't want to pay for support, thus these forums, and other community resources.)

 

Then there are 3rd parties, like myself, who specialize in supporting folks using CF. We and other volunteers here try to help as we can. Even we make no guarantees to be responsive to every thread. It's just not possible. But of course when you reach out for paid help, you can nearly always expect a response, and perhaps a more considered one--and usually a more custom one, aided by a screenshare session to better understand details.

 

Anyway, as you can see I (and perhaps others) are here for you. Let's keep digging, if you want. 


/Charlie (troubleshooter, carehart.org)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Dec 21, 2021 Dec 21, 2021

Copy link to clipboard

Copied

Thank you again Charlie - 

We are not replicating sessions between physical servers, only between CF instances on the same physical server.    Ultimately, it might have been nice to be able to let session data persist across all instances on all physical servers, but for now, the difficulties outweighed  the benefit, so replicating data between only the four instances on each box is fine for our needs.  

I have previously suggested a moving towards a Redis server or other options, but this was shot down due to it being another single point of failure.  

 

While youre right that this is an underlying tomcat issue, we are using CF with a relatively normal and straight forward cluster.  We dont have much traffic, and while there is some moderate session data, I dont think its anything crazy.  I really dont understand why this wouldnt work right out of the box?  From the reading Ive done, having 4 instances replicating should be fine for deltaManager in tomcat.

 

As far as support, I will once again suggest to management either the paid adobe support, or your own services, however each time I bring it up, it never goes anywhere, which is why I find myself back here when answers cannot be dug up on the internet.

Thanks again, 
-Tony

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2021 Dec 21, 2021

Copy link to clipboard

Copied

@GuitsBoy , I have reviewed the thread, and miss the following in your implementation:

  1.  The element 
    <Manager className="org.apache.catalina.ha.session.DeltaManager"
                       expireSessionsOnShutdown="false"
                       notifyListenersOnReplication="true"/>​

     

  2.  The contents of /runtime/conf/context.xml
         (which I expect to include the <manager> element)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Dec 21, 2021 Dec 21, 2021

Copy link to clipboard

Copied

Hi BKBK.   I believe the context.xml is unchanged from stock.  Here are the contents:

 

<?xml version='1.0' encoding='utf-8'?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- The contents of this file will be loaded for each web application -->
<Context>

<!-- Default set of monitored resources. If one of these changes, the -->
<!-- web application will be reloaded. -->
<WatchedResource>WEB-INF/web.xml</WatchedResource>

<JarScanner>
<JarScanFilter tldSkip="${tomcat.util.scan.StandardJarScanFilter.jarsToSkip},*.jar"/>
</JarScanner>

<!-- Default cookie processor class in 8.5.x is Rfc6265CookieProcessor
Changing it to LegacyCookieProcessor as previous versions of Tomcat have
been using it. New implementation is breaking cookie implementation -->
<CookieProcessor className="org.apache.tomcat.util.http.LegacyCookieProcessor" />

<!-- Uncomment this to disable session persistence across Tomcat restarts -->
<!-- -->
<!--<Manager pathname="" />-->
<!--<Manager notifyListenersOnReplication="true" className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false"/>-->
<Manager notifyListenersOnReplication="true" className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false"/>

<!-- Uncomment this to enable Comet connection tacking (provides events
on session expiration as well as webapp lifecycle) -->
<!--
<Valve className="org.apache.catalina.valves.CometConnectionManagerValve" />
-->
</Context>

 

Thank you.

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 22, 2021 Dec 22, 2021

Copy link to clipboard

Copied

OK. Thanks for the answer. It is good that the Context element contains a Manager sub-element. However, the Manager element occurs twice. Please delete the commented one.

 

Moving on. Two more suggestions:

  1. Does /WEB-INF/web.xml contain the following element:
      <distributable />​

  The element should be in web.xml.

 

2. The setting channelSendOptions="6" was part of an experiment. Since you continue to have issues, we may conclude the experiment is not yet successful. So I would change the setting back to its default value of 8 (that is, asynchronous replication):

 

 

channelSendOptions="8"

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources
Documentation