FMS core segfault and "is not responding and is being restarted"
Hello,
We run about 20 FMS servers through Amazon AWS using the Adobe AMI: fms-4.0.1.8011-centos-5.5-ami-1 (ami-6a1deb03)
On some of the servers we get fmscore segfaults, which kills the apps instances and make it impossible to launch new app instances. I am having a hard time finding out whether this could be due to errors in our code or if there is some kind of instability in the FMS. It does seems to happen only on the FMSs that are most heavily used.
Below are extracts from various logs (message, core, master, edge). Can anyone see from those logs what might be wrong?
Thanks in advance!
var/log/message
Jan 7 17:09:15 domU-12-31-39-18-08-75 Stream[30456]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 0
Jan 7 17:09:15 domU-12-31-39-18-08-75 Adaptor[30456]: Bad network data; terminating connection : (Adaptor: _defaultRoot_, VHost: _defaultVHost_, IP: 70.252.9.75, App: V20111115/en_US_ComServ, Protocol: rtmp, Client: 4702424758279553391, Handle: 66705) : C 00 ^ B0 00 00 98 10 00 00 0F a c t o r I d
Jan 7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4
Jan 7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4
Jan 7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352
Jan 7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4
Jan 7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19350 (rtmfp-core)/v4
Jan 7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19350
Jan 7 17:27:20 domU-12-31-39-18-08-75 kernel: fmscore[30943]: segfault at 0000003100000031 rip 0000003100000031 rsp 00002aaaac153f88 error 14
Jan 7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4
Jan 7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4
Jan 7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352
Jan 7 17:28:16 domU-12-31-39-18-08-75 Stream[30456]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 0
Jan 7 17:28:16 domU-12-31-39-18-08-75 Adaptor[30456]: Bad network data; terminating connection : (Adaptor: _defaultRoot_, VHost: _defaultVHost_, IP: 168.103.82.186, App: V20111115/en_US_ComServ, Protocol: rtmp, Client: 4702698373801656687, Handle: 3695052682) : 83 00 C3 E 00 00 17 C O N N E C T - a c t o r I d
Jan 7 17:28:16 domU-12-31-39-18-08-75 kernel: fmscore[31615]: segfault at 00000032000001bb rip 000000000095216e rsp 00002aaaacc19f90 error 4
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19350 (rtmfp-core)/v4
Jan 7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19350
Jan 7 17:28:52 domU-12-31-39-18-08-75 Stream[32435]: Exception while processing message: Unexpected virtual message from client 4702111234743419248: C3 00 00 00 00 02 00 00 17 00 00
Jan 7 17:28:52 domU-12-31-39-18-08-75 Stream[32085]: Exception while processing message: Unexpected virtual message from client 4702118931173818736: 83 00 00 00 00 02 00 00 01 00 00
Jan 7 17:28:52 domU-12-31-39-18-08-75 Stream[32085]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 2
edge.log
2012-01-07 17:26:53 13212 (i)2701054 Virtual host Disconnecting unresponsive core V20111115/en_US_theSkatePark (30456) after 30 seconds is not available. -
2012-01-07 17:26:53 13212 (i)2581250 Edge disconnected from core (30456). -
2012-01-07 17:26:53 13212 (i)2581252 Registering core (31491). -
2012-01-07 17:26:54 13212 (i)2701054 Virtual host Disconnecting unresponsive core V20111115/en_US_theCafe (30839) after 30 seconds is not available. -
2012-01-07 17:26:54 13212 (i)2581250 Edge disconnected from core (30839). -
2012-01-07 17:26:59 13212 (i)2581252 Registering core (31769). -
2012-01-07 17:27:45 13212 (i)2701054 Virtual host Disconnecting unresponsive core V20111115/en_US_ComServ (31491) after 30 seconds is not available. -
2012-01-07 17:27:45 13212 (i)2581250 Edge disconnected from core (31491). -
2012-01-07 17:27:45 13212 (i)2581252 Registering core (32085). -
2012-01-07 17:28:49 13212 (i)2701054 Virtual host Disconnecting unresponsive core V20111115/en_US_ComServ (32085) after 30 seconds is not available. -
2012-01-07 17:28:49 13212 (i)2581250 Edge disconnected from core (32085). -
2012-01-07 17:28:49 13212 (i)2581252 Registering core (32435). -
2012-01-07 17:28:49 13212 (i)2701054 Virtual host Disconnecting unresponsive core V20111115/en_US_theDressup_OPERATOR (31769) after 30 seconds is not available. -
2012-01-07 17:28:49 13212 (i)2581250 Edge disconnected from core (31769). -
2012-01-07 17:28:49 13212 (i)2581252 Registering core (32752). -
2012-01-07 17:28:53 13212 (i)2581250 Edge disconnected from core (32435). -
2012-01-07 17:28:54 13212 (i)2581252 Registering core (582). -
core.log
#Date: 2012-01-07
#Fields: date time x-pid x-status x-ctx x-comment
2012-01-07 17:26:53 31491 (i)2581237 Starting admin app on core (31491). -
2012-01-07 17:26:53 31491 (i)2581238 Core (31491) connecting to admin. -
2012-01-07 17:26:53 31491 (i)2581231 Core (31491) connected to admin. -
2012-01-07 17:26:53 31491 (i)2581234 Core (31491) connection to admin accepted. -
2012-01-07 17:26:53 31491 (i)2631174 Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4 -
2012-01-07 17:26:53 31491 (i)2631174 Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4 -
2012-01-07 17:26:53 31491 (i)2631509 Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352 -
2012-01-07 17:26:53 31491 (i)2581246 Core (31491) sending register cmd to edge. -
2012-01-07 17:26:54 30839 (i)2581247 Core (30839) disconnected from edge. -
2012-01-07 17:26:54 30839 (i)2581233 Core (30839) disconnecting from admin: 200. -
2012-01-07 17:26:53 30456 (i)2581247 Core (30456) disconnected from edge. -
2012-01-07 17:26:53 30456 (i)2581233 Core (30456) disconnecting from admin: 200. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:26:54 31491 (w)2631025 Ignoring message from client during authentication. -
2012-01-07 17:27:07 30456 (e)2581279 Assert failed in tincan/server/TCMsgHandler.cpp line 694 -
2012-01-07 17:27:07 30456 (e)2581279 Assert failed in tincan/server/TCMsgHandler.cpp line 694 -
2012-01-07 17:27:07 30456 (e)2581279 Assert failed in tincan/server/TCMsgHandler.cpp line 694 -
master.log
2012-01-07 17:26:53 13105 (w)2581257 Core (30456) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:26:53 13105 (i)2581221 Core (31491) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:26:54 13105 (w)2581257 Core (30839) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted! -
2012-01-07 17:26:54 13105 (i)2581221 Core (31769) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_0" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0". -
2012-01-07 17:27:22 13105 (i)2581223 Core (30839) is no longer active. -
2012-01-07 17:27:45 13105 (w)2581257 Core (31491) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:27:45 13105 (i)2581221 Core (32085) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:28:17 13105 (i)2581223 Core (31491) is no longer active. -
2012-01-07 17:28:49 13105 (w)2581257 Core (32085) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:28:49 13105 (i)2581221 Core (32435) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:28:49 13105 (w)2581257 Core (31769) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted! -
2012-01-07 17:28:49 13105 (i)2581221 Core (32752) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_0" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0". -
2012-01-07 17:28:53 13105 (i)2581223 Core (32435) is no longer active. -
2012-01-07 17:28:53 13105 (w)2581256 Core (32435) _defaultRoot_:_defaultVHost_:::_1 experienced 1 failure! -
2012-01-07 17:28:53 13105 (i)2581221 Core (582) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:28:57 13105 (i)2581223 Core (32435) is no longer active. -
2012-01-07 17:29:12 13105 (i)2581223 Core (31769) is no longer active. -
2012-01-07 17:29:17 13105 (i)2581223 Core (32085) is no longer active. -
2012-01-07 17:29:40 13105 (w)2581257 Core (582) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:29:40 13105 (i)2581221 Core (826) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:29:40 13105 (w)2581257 Core (26279) _defaultRoot_:_defaultVHost_:::_2 is not responding and is being restarted! -
2012-01-07 17:29:40 13105 (i)2581221 Core (1159) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_2" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_2". -
2012-01-07 17:29:42 13105 (w)2581257 Core (32752) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted! -
2012-01-07 17:29:42 13105 (i)2581221 Core (1443) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_0" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0". -
2012-01-07 17:30:20 13105 (w)2581257 Core (826) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:30:20 13105 (i)2581221 Core (1787) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:30:21 13105 (w)2581257 Core (1443) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted! -
2012-01-07 17:30:21 13105 (i)2581221 Core (2151) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_0" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0". -
2012-01-07 17:30:22 13105 (w)2581257 Core (1159) _defaultRoot_:_defaultVHost_:::_2 is not responding and is being restarted! -
2012-01-07 17:30:22 13105 (i)2581221 Core (2463) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_2" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_2". -
2012-01-07 17:30:52 13105 (i)2581223 Core (26279) is no longer active. -
2012-01-07 17:30:52 13105 (i)2581223 Core (32752) is no longer active. -
2012-01-07 17:30:52 13105 (i)2581223 Core (826) is no longer active. -
2012-01-07 17:30:52 13105 (i)2581223 Core (1443) is no longer active. -
2012-01-07 17:31:02 13105 (i)2581223 Core (1159) is no longer active. -
2012-01-07 17:31:52 13105 (w)2581257 Core (1787) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted! -
2012-01-07 17:31:52 13105 (i)2581221 Core (2810) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_1" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1". -
2012-01-07 17:31:53 13105 (w)2581257 Core (2151) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted! -
2012-01-07 17:31:53 13105 (i)2581221 Core (3048) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app -inst -tag "_0" -console -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0". -
2012-01-07 17:31:53 13105 (w)2581260 Core (582) did not shutdown as expected. Killing core now. -
2012-01-07 17:31:53 13105 (w)2581260 Core (30456) did not shutdown as expected. Killing core now. -
2012-01-07 17:31:57 13105 (i)2581223 Core (30456) is no longer active. -
2012-01-07 17:31:57 13105 (i)2581223 Core (582) is no longer active. -
2012-01-07 17:31:57 13105 (i)2581223 Core (2810) is no longer active. -
2012-01-07 17:31:57 13105 (w)2581256 Core (2810) _defaultRoot_:_defaultVHost_:::_1 experienced 1 failure! -
2012-01-07 17:32:22 13105 (i)2581223 Core (2151) is no longer active. -
2012-01-07 17:32:37 13105 (i)2581223 Core (1787) is no longer active. -
