Skip to main content
Participant
January 9, 2012
Question

FMS core segfault and "is not responding and is being restarted"

  • January 9, 2012
  • 3 replies
  • 3346 views

Hello,

We run about 20 FMS servers through Amazon AWS using the Adobe AMI: fms-4.0.1.8011-centos-5.5-ami-1 (ami-6a1deb03)

On some of the servers we get fmscore segfaults, which kills the apps instances and make it impossible to launch new app instances. I am having a hard time finding out whether this could be due to errors in our code or if there is some kind of instability in the FMS. It does seems to happen only on the FMSs that are most heavily used.

Below are extracts from various logs (message, core, master, edge). Can anyone see from those logs what might be wrong?

Thanks in advance!

var/log/message

Jan  7 17:09:15 domU-12-31-39-18-08-75 Stream[30456]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 0

Jan  7 17:09:15 domU-12-31-39-18-08-75 Adaptor[30456]: Bad network data; terminating connection : (Adaptor: _defaultRoot_, VHost: _defaultVHost_, IP: 70.252.9.75, App: V20111115/en_US_ComServ, Protocol: rtmp, Client: 4702424758279553391, Handle: 66705) : C 00 ^ B0 00 00 98 10 00 00 0F a c t o r I d

Jan  7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4

Jan  7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4

Jan  7 17:26:53 domU-12-31-39-18-08-75 Adaptor[31491]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352

Jan  7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4

Jan  7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19350 (rtmfp-core)/v4

Jan  7 17:26:59 domU-12-31-39-18-08-75 Adaptor[31769]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19350

Jan  7 17:27:20 domU-12-31-39-18-08-75 kernel: fmscore[30943]: segfault at 0000003100000031 rip 0000003100000031 rsp 00002aaaac153f88 error 14

Jan  7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4

Jan  7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4

Jan  7 17:27:45 domU-12-31-39-18-08-75 Adaptor[32085]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352

Jan  7 17:28:16 domU-12-31-39-18-08-75 Stream[30456]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 0

Jan  7 17:28:16 domU-12-31-39-18-08-75 Adaptor[30456]: Bad network data; terminating connection : (Adaptor: _defaultRoot_, VHost: _defaultVHost_, IP: 168.103.82.186, App: V20111115/en_US_ComServ, Protocol: rtmp, Client: 4702698373801656687, Handle: 3695052682) : 83 00 C3 E 00 00 17 C O N N E C T - a c t o r I d

Jan  7 17:28:16 domU-12-31-39-18-08-75 kernel: fmscore[31615]: segfault at 00000032000001bb rip 000000000095216e rsp 00002aaaacc19f90 error 4

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32435]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Listener started ( _defaultRoot_ ) : 127.0.0.1:19350 (rtmfp-core)/v4

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Listener started ( _defaultRoot_ ) : 10.104.17.135:19350 (rtmfp-core)/v4

Jan  7 17:28:49 domU-12-31-39-18-08-75 Adaptor[32752]: Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19350

Jan  7 17:28:52 domU-12-31-39-18-08-75 Stream[32435]: Exception while processing message: Unexpected virtual message from client 4702111234743419248: C3 00 00 00 00 02 00 00 17 00 00

Jan  7 17:28:52 domU-12-31-39-18-08-75 Stream[32085]: Exception while processing message: Unexpected virtual message from client 4702118931173818736: 83 00 00 00 00 02 00 00 01 00 00

Jan  7 17:28:52 domU-12-31-39-18-08-75 Stream[32085]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 2

edge.log

2012-01-07          17:26:53          13212          (i)2701054          Virtual host Disconnecting unresponsive core V20111115/en_US_theSkatePark (30456) after 30 seconds is not available.          -

2012-01-07          17:26:53          13212          (i)2581250          Edge disconnected from core (30456).          -

2012-01-07          17:26:53          13212          (i)2581252          Registering core (31491).          -

2012-01-07          17:26:54          13212          (i)2701054          Virtual host Disconnecting unresponsive core V20111115/en_US_theCafe (30839) after 30 seconds is not available.          -

2012-01-07          17:26:54          13212          (i)2581250          Edge disconnected from core (30839).          -

2012-01-07          17:26:59          13212          (i)2581252          Registering core (31769).          -

2012-01-07          17:27:45          13212          (i)2701054          Virtual host Disconnecting unresponsive core V20111115/en_US_ComServ (31491) after 30 seconds is not available.          -

2012-01-07          17:27:45          13212          (i)2581250          Edge disconnected from core (31491).          -

2012-01-07          17:27:45          13212          (i)2581252          Registering core (32085).          -

2012-01-07          17:28:49          13212          (i)2701054          Virtual host Disconnecting unresponsive core V20111115/en_US_ComServ (32085) after 30 seconds is not available.          -

2012-01-07          17:28:49          13212          (i)2581250          Edge disconnected from core (32085).          -

2012-01-07          17:28:49          13212          (i)2581252          Registering core (32435).          -

2012-01-07          17:28:49          13212          (i)2701054          Virtual host Disconnecting unresponsive core V20111115/en_US_theDressup_OPERATOR (31769) after 30 seconds is not available.          -

2012-01-07          17:28:49          13212          (i)2581250          Edge disconnected from core (31769).          -

2012-01-07          17:28:49          13212          (i)2581252          Registering core (32752).          -

2012-01-07          17:28:53          13212          (i)2581250          Edge disconnected from core (32435).          -

2012-01-07          17:28:54          13212          (i)2581252          Registering core (582).          -

core.log

#Date: 2012-01-07

#Fields: date          time          x-pid          x-status          x-ctx          x-comment

2012-01-07          17:26:53          31491          (i)2581237          Starting admin app on core (31491).          -

2012-01-07          17:26:53          31491          (i)2581238          Core (31491) connecting to admin.          -

2012-01-07          17:26:53          31491          (i)2581231          Core (31491) connected to admin.          -

2012-01-07          17:26:53          31491          (i)2581234          Core (31491) connection to admin accepted.          -

2012-01-07          17:26:53          31491          (i)2631174          Listener started ( _defaultRoot_ ) : 127.0.0.1:19352 (rtmfp-core)/v4          -

2012-01-07          17:26:53          31491          (i)2631174          Listener started ( _defaultRoot_ ) : 10.104.17.135:19352 (rtmfp-core)/v4          -

2012-01-07          17:26:53          31491          (i)2631509          Public rtmfp-core addresses for listener _defaultRoot_ are: 184.73.54.224:19352          -

2012-01-07          17:26:53          31491          (i)2581246          Core (31491) sending register cmd to edge.          -

2012-01-07          17:26:54          30839          (i)2581247          Core (30839) disconnected from edge.          -

2012-01-07          17:26:54          30839          (i)2581233          Core (30839) disconnecting from admin: 200.          -

2012-01-07          17:26:53          30456          (i)2581247          Core (30456) disconnected from edge.          -

2012-01-07          17:26:53          30456          (i)2581233          Core (30456) disconnecting from admin: 200.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:26:54          31491          (w)2631025          Ignoring message from client during authentication.          -

2012-01-07          17:27:07          30456          (e)2581279          Assert failed in tincan/server/TCMsgHandler.cpp line 694          -

2012-01-07          17:27:07          30456          (e)2581279          Assert failed in tincan/server/TCMsgHandler.cpp line 694          -

2012-01-07          17:27:07          30456          (e)2581279          Assert failed in tincan/server/TCMsgHandler.cpp line 694          -

master.log

2012-01-07          17:26:53          13105          (w)2581257          Core (30456) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:26:53          13105          (i)2581221          Core (31491) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:26:54          13105          (w)2581257          Core (30839) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted!          -

2012-01-07          17:26:54          13105          (i)2581221          Core (31769) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_0" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0".          -

2012-01-07          17:27:22          13105          (i)2581223          Core (30839) is no longer active.          -

2012-01-07          17:27:45          13105          (w)2581257          Core (31491) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:27:45          13105          (i)2581221          Core (32085) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:28:17          13105          (i)2581223          Core (31491) is no longer active.          -

2012-01-07          17:28:49          13105          (w)2581257          Core (32085) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:28:49          13105          (i)2581221          Core (32435) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:28:49          13105          (w)2581257          Core (31769) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted!          -

2012-01-07          17:28:49          13105          (i)2581221          Core (32752) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_0" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0".          -

2012-01-07          17:28:53          13105          (i)2581223          Core (32435) is no longer active.          -

2012-01-07          17:28:53          13105          (w)2581256          Core (32435) _defaultRoot_:_defaultVHost_:::_1 experienced 1 failure!          -

2012-01-07          17:28:53          13105          (i)2581221          Core (582) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:28:57          13105          (i)2581223          Core (32435) is no longer active.          -

2012-01-07          17:29:12          13105          (i)2581223          Core (31769) is no longer active.          -

2012-01-07          17:29:17          13105          (i)2581223          Core (32085) is no longer active.          -

2012-01-07          17:29:40          13105          (w)2581257          Core (582) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:29:40          13105          (i)2581221          Core (826) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:29:40          13105          (w)2581257          Core (26279) _defaultRoot_:_defaultVHost_:::_2 is not responding and is being restarted!          -

2012-01-07          17:29:40          13105          (i)2581221          Core (1159) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_2" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_2".          -

2012-01-07          17:29:42          13105          (w)2581257          Core (32752) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted!          -

2012-01-07          17:29:42          13105          (i)2581221          Core (1443) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_0" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0".          -

2012-01-07          17:30:20          13105          (w)2581257          Core (826) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:30:20          13105          (i)2581221          Core (1787) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:30:21          13105          (w)2581257          Core (1443) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted!          -

2012-01-07          17:30:21          13105          (i)2581221          Core (2151) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_0" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0".          -

2012-01-07          17:30:22          13105          (w)2581257          Core (1159) _defaultRoot_:_defaultVHost_:::_2 is not responding and is being restarted!          -

2012-01-07          17:30:22          13105          (i)2581221          Core (2463) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_2" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_2".          -

2012-01-07          17:30:52          13105          (i)2581223          Core (26279) is no longer active.          -

2012-01-07          17:30:52          13105          (i)2581223          Core (32752) is no longer active.          -

2012-01-07          17:30:52          13105          (i)2581223          Core (826) is no longer active.          -

2012-01-07          17:30:52          13105          (i)2581223          Core (1443) is no longer active.          -

2012-01-07          17:31:02          13105          (i)2581223          Core (1159) is no longer active.          -

2012-01-07          17:31:52          13105          (w)2581257          Core (1787) _defaultRoot_:_defaultVHost_:::_1 is not responding and is being restarted!          -

2012-01-07          17:31:52          13105          (i)2581221          Core (2810) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_1" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_1".          -

2012-01-07          17:31:53          13105          (w)2581257          Core (2151) _defaultRoot_:_defaultVHost_:::_0 is not responding and is being restarted!          -

2012-01-07          17:31:53          13105          (i)2581221          Core (3048) started, arguments : -adaptor "_defaultRoot_" -vhost "_defaultVHost_" -app  -inst  -tag "_0" -console  -conf "/opt/adobe/fms/conf/Server.xml" -name "_defaultRoot_:_defaultVHost_:::_0".          -

2012-01-07          17:31:53          13105          (w)2581260          Core (582) did not shutdown as expected. Killing core now.          -

2012-01-07          17:31:53          13105          (w)2581260          Core (30456) did not shutdown as expected. Killing core now.          -

2012-01-07          17:31:57          13105          (i)2581223          Core (30456) is no longer active.          -

2012-01-07          17:31:57          13105          (i)2581223          Core (582) is no longer active.          -

2012-01-07          17:31:57          13105          (i)2581223          Core (2810) is no longer active.          -

2012-01-07          17:31:57          13105          (w)2581256          Core (2810) _defaultRoot_:_defaultVHost_:::_1 experienced 1 failure!          -

2012-01-07          17:32:22          13105          (i)2581223          Core (2151) is no longer active.          -

2012-01-07          17:32:37          13105          (i)2581223          Core (1787) is no longer active.          -

    This topic has been closed for replies.

    3 replies

    No Way11Author
    Participant
    January 9, 2012

    Thanks for confirming.

    The instance that seems to fail is our main chat application. At peak time it has up to 15000 connected clients that read and write to sharedobjects. We don't do any streaming, only sharedobjects writing. I can foresee that we could easily reach more than 100000 possible client connections before the end of 2012.

    This is running on the largest possible Amazon AWS instance. Do you know if we might be reaching some kind of connection limit. We would prefer to have all our players playing in the same instance, but if we are reaching some kind of max number of connections then we have to develop a logic where people have to choose an instance when logging in. That is not optimal if you want to be on the same instance as all your friends.

    Nikhil_Kalyan
    Participating Frequently
    January 10, 2012

    There is no connection limit for the number of connections to FMS. As long as the network bandwidth can sustain the traffic, FMS should be able to maintain all the connections, all these connections can be either on a singular instance or spawned on multiple, it still holds all.

    In your logs, you have : Jan  7 17:09:15 domU-12-31-39-18-08-75 Stream[30456]: Bad network data; terminating connection : chunkstream error:message bytes 128 are more than the specified message length 0

    This comes up when there is a malformed data (in your case, sharedobjects) that is being sent on RTMP. Please make sure you have the sharedobjects properly formatted to see if you are not sending any extra bytes.

    No Way11Author
    Participant
    January 9, 2012

    Ok i turned core dump on based on this KB:

    http://kb2.adobe.com/cps/408/kb408783.html

    I hope this will give something if it crashes again.

    Nikhil_Kalyan
    Participating Frequently
    January 9, 2012

    Hi,

    Yup, that should help, we generally go with the generic ulimit option to catch any process dump. http://stackoverflow.com/questions/2919378/how-to-enable-core-dump-in-my-linux-c-program

    Nikhil_Kalyan
    Participating Frequently
    January 9, 2012

    Hi,

    Thanks for using AWS for FMS.

    Can you please provide the crash dumps when it happens. We would like to investigate this issue. Thank you !

    No Way11Author
    Participant
    January 9, 2012

    Hi,

    Thanks for looking into this!

    Are crash dumps on by default and if so, where can i find them on an Amazon AWS Ccentos 5.5 FMS installation?

    If they are not on, can you tell me how to turn them on for the fmscore?

    Thanks!