Skip to main content
Participant
August 23, 2009
Question

FMS 3.5 infinite loop issue - io+diag logs full

  • August 23, 2009
  • 1 reply
  • 1854 views

Sometimes - I should say often, this occur 2 times out of 4 - when client using dist live webroot sample hard disconnect (i.e. closing browser window) I start to see many entries per second in edge logs :

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c6a510 socket 59 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c3eca8 socket 58 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c34ae0 socket 42 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c3e268 socket 60 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c38498 socket 41 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c6ae60 socket 61 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c6a510 socket 59 with 104 (Connection reset by peer)). -

2009-08-23 20:58:11 79286 (w)2631008 Asynchronous I/O operation failed (close: shutdown failed on conn 0x28c3eca8 socket 58 with 104 (Connection reset by peer)). -

In the same time I see a fmscore process jumping up to 80% CPU. Syslog also start to work hard, even with local0.* directed to /dev/null
I let run the process several hours ad decided to kill it. -TERM didn't work -KILL do the job. It is then not possible to start a new live session, even from another IP/computer, and I see the following in core log, tens by second also :
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
2009-08-23 20:59:15 79071 (e)2581394 Failed to wait for process condition: errno(22). -
Then it is a fmsedge process that start to eat all CPU and even when killed the only way to go is to restart FMS.
This is on a QuadCore 4GB RAM host, and I'm the only connected user, so no ressources issues afaik, http proxy is sent to another server that still answer during the high load (GigaEthernet between the 2 hosts), but requesting FMS host with http doesn't answer (browser time out). The apache server receive no request from FMS proxy btw.
Any tips ?

    This topic has been closed for replies.

    1 reply

    Participant
    August 24, 2009

    Ok... I discovered I was running 3.5.1 so I upgraded to 3.5.2, and now I can't even start the server, logs show this :

    Aug 24 18:45:41 Adaptor[9324]: Bind failed in migration thread : pid = 9324 : No such file or directory (2)

    Aug 24 18:45:41 kernel: Aug 24 18:45:41 yauza8 Adaptor[9324]: Bind failed in migration thread : pid = 9324 : No such file or directory (2)

    Aug 24 18:45:41 Adaptor[9324]: Migration thread on core 9324 terminated unexpectedly.

    Aug 24 18:45:41 kernel: Aug 24 18:45:41 yauza8 Adaptor[9324]: Migration thread on core 9324 terminated unexpectedly.

    No fmsedge/fmscore show up in processes, and no socket opened (nor 80 nor 1935).
    Btw running this, I can launch and use adminserver.
    Is there a way to know which file is that process trying to open "No such file or directory (2)" ? Just in case my root install /opt/adobe/fms is a symlink, can this cause trouble ?
    Thanks for any help on this...

    Ce message a été modifié par: FreeBSDguru

    Participating Frequently
    August 26, 2009

    It's worth a try to get rid of the symlink.  The bind is taking place over a unix domain socket and the code is explicit for linux (sorry if you had fms running on bsd)   The filesystem is what provides this functionality, so if it's not a symlink, it could be a permissions issue.  Check the /$FMSDIR/tmp directory.

    Participant
    August 26, 2009

    Hello. I took time today to review everything and it appear that libcap was the issue.

    After moving to libcap-1.10.15 everything is running smoothly

    You' re right I should have mention that I'm on FreeBSD, but at first glance logs didn't send me in the right direction (truss did it !).