App instances crashes when under heavy load
Hello Adobe Support,
My problem is that many of our AMS servers have app instance crashes when under heavy load. We are using the biggest possible instance at Amazon AWS so we cant just add bigger instances.
My hope is that you can help us in the right direction on solving this issue based on the log files snippets I have included below.
Another weird thing I see is in the admin console where the bandwidth graphs shows in/out bandwidth dropping to 0 for a couple of seconds. This only happens during peak time.
Here are some logs from the time of an instance crash:
/var/log/messages:
Mar 20 14:01:51 ip-10-230-57-112 Adaptor[25827]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4
Mar 20 14:01:51 ip-10-230-57-112 Adaptor[25827]: Listener started ( _defaultRoot_ ) : 10.230.57.112:19350 (rtmfp-core)/v4
Mar 20 14:01:51 ip-10-230-57-112 Adaptor[25827]: Public rtmfp-core addresses for listener _defaultRoot_ are: 54.216.15.146:19350
Mar 20 14:02:25 ip-10-230-57-112 Adaptor[26142]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4
Mar 20 14:02:25 ip-10-230-57-112 Adaptor[26142]: Listener started ( _defaultRoot_ ) : 10.230.57.112:19350 (rtmfp-core)/v4
Mar 20 14:02:25 ip-10-230-57-112 Adaptor[26142]: Public rtmfp-core addresses for listener _defaultRoot_ are: 54.216.15.146:19350
Mar 20 14:02:46 ip-10-230-57-112 kernel: amscore[25947]: segfault at 0000000000000000 rip 0000000000000000 rsp 00002aaaad682f88 error 14
Mar 20 14:03:00 ip-10-230-57-112 Adaptor[26450]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4
Mar 20 14:03:00 ip-10-230-57-112 Adaptor[26450]: Listener started ( _defaultRoot_ ) : 10.230.57.112:19350 (rtmfp-core)/v4
Mar 20 14:03:00 ip-10-230-57-112 Adaptor[26450]: Public rtmfp-core addresses for listener _defaultRoot_ are: 54.216.15.146:19350
Core Logs:
#Date: 2014-03-20
#Fields: date time x-pid x-status x-ctx x-comment
2014-03-20 14:01:51 25827 (i)2581237 Starting admin app on core (25827). -
2014-03-20 14:01:51 25827 (i)2581238 Core (25827) connecting to admin. -
2014-03-20 14:01:51 25827 (i)2581231 Core (25827) connected to admin. -
2014-03-20 14:01:51 25827 (i)2581234 Core (25827) connection to admin accepted. -
2014-03-20 14:01:51 25827 (i)2631174 Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4 -
2014-03-20 14:01:51 25827 (i)2631174 Listener started ( _defaultRoot_ ) : 10.230.57.112:19350 (rtmfp-core)/v4 -
2014-03-20 14:01:51 25827 (i)2631509 Public rtmfp-core addresses for listener _defaultRoot_ are: 54.216.15.146:19350 -
2014-03-20 14:01:51 25827 (i)2581246 Core (25827) sending register cmd to edge. -
#Date: 2014-03-20
#Fields: date time x-pid x-status x-ctx x-comment
2014-03-20 14:01:51 23351 (i)2581247 Core (23351) disconnected from edge. -
2014-03-20 14:01:51 23351 (i)2581233 Core (23351) disconnecting from admin: 200. -
#Date: 2014-03-20
#Fields: date time x-pid x-status x-ctx x-comment
2014-03-20 14:02:25 26142 (i)2581237 Starting admin app on core (26142). -
2014-03-20 14:02:25 26142 (i)2581238 Core (26142) connecting to admin. -
2014-03-20 14:02:25 26142 (i)2581231 Core (26142) connected to admin. -
2014-03-20 14:02:25 26142 (i)2581234 Core (26142) connection to admin accepted. -
2014-03-20 14:02:25 26142 (i)2631174 Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4 -
2014-03-20 14:02:25 26142 (i)2631174 Listener started ( _defaultRoot_ ) : 10.230.57.112:19350 (rtmfp-core)/v4 -
2014-03-20 14:02:25 26142 (i)2631509 Public rtmfp-core addresses for listener _defaultRoot_ are: 54.216.15.146:19350 -
2014-03-20 14:02:25 26142 (i)2581246 Core (26142) sending register cmd to edge. -
2014-03-20 14:02:04 23351 (e)2581279 Assert failed in tincan/util/TCAsyncIO.cpp line 1948 -
2014-03-20 14:02:26 23351 (e)2581279 Assert failed in tincan/server/TCMsgHandler.cpp line 695 -
2014-03-20 14:02:26 23351 (e)2581279 Assert failed in tincan/server/TCMsgHandler.cpp line 695 -
Master Logs:
2014-03-20 14:01:51 22725 (w)2581257 Core (23351) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2014-03-20 14:01:51 22725 (i)2581221 Core (25827) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/ams/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2014-03-20 14:02:25 22725 (w)2581257 Core (25827) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2014-03-20 14:02:25 22725 (i)2581221 Core (26142) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/ams/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2014-03-20 14:02:49 22725 (i)2581223 Core (25827) is no longer active. -
2014-03-20 14:03:00 22725 (w)2581257 Core (26142) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2014-03-20 14:03:00 22725 (i)2581221 Core (26450) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/ams/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2014-03-20 14:06:23 22725 (w)2581260 Core (23351) did not shutdown as expected. Killing core now. -
2014-03-20 14:06:23 22725 (w)2581434 Master executing fast shutdown of core ID (23351). -
2014-03-20 14:06:23 22725 (w)2581260 Core (26142) did not shutdown as expected. Killing core now. -
2014-03-20 14:06:23 22725 (w)2581434 Master executing fast shutdown of core ID (26142). -
2014-03-20 14:06:28 22725 (e)2581395 Failed to signal process condition: errno(22). -
2014-03-20 14:06:28 22725 (e)2581395 Failed to signal process condition: errno(22). -
2014-03-20 14:06:38 22725 (e)2581395 Failed to signal process condition: errno(22). -
2014-03-20 14:06:38 22725 (e)2581395 Failed to signal process condition: errno(22). -
2014-03-20 14:06:39 22725 (i)2581223 Core (23351) is no longer active. -
2014-03-20 14:06:39 22725 (i)2581223 Core (26142) is no longer active. -
Edge Logs:
2014-03-20 14:01:51 22835 (i)2701054 Virtual host Disconnecting unresponsive core LATEST/fr_FR_ComServ (23351) after 30 seconds is not available. -
2014-03-20 14:01:51 22835 (i)2581250 Edge disconnected from core (23351). -
2014-03-20 14:01:51 22835 (i)2581537 Resending StartCore message to master (vhost: _defaultVHost_, app: , inst: , tag: _1). -
2014-03-20 14:01:51 22835 (i)2581252 Registering core (25827). -
2014-03-20 14:02:25 22835 (i)2701054 Virtual host Disconnecting unresponsive core LATEST/fr_FR_ComServ (25827) after 30 seconds is not available. -
2014-03-20 14:02:25 22835 (i)2581250 Edge disconnected from core (25827). -
2014-03-20 14:02:25 22835 (i)2581537 Resending StartCore message to master (vhost: _defaultVHost_, app: , inst: , tag: _1). -
2014-03-20 14:02:25 22835 (i)2581252 Registering core (26142). -
2014-03-20 14:03:00 22835 (i)2701054 Virtual host Disconnecting unresponsive core LATEST/fr_FR_ComServ (26142) after 30 seconds is not available. -
2014-03-20 14:03:00 22835 (i)2581250 Edge disconnected from core (26142). -
2014-03-20 14:03:00 22835 (i)2581537 Resending StartCore message to master (vhost: _defaultVHost_, app: , inst: , tag: _1). -
2014-03-20 14:03:00 22835 (i)2581252 Registering core (26450). -
