Skip to main content
Known Participant
March 6, 2010
Question

FMS proxying HTTP does not close sockets correctly?

  • March 6, 2010
  • 2 replies
  • 32697 views

Hi,

So, after waiting for a while, we decided to upgrade our streamers to FMS 3.5.3 and set them up to proxy HTTP requests to a lighttpd server instead of the "built in" Apache. With the previous version 3.5 of FMS, we had problems of FMS hanging up after a while and stopping relaying requests, generally a few hours, even when it did not serve a lot of requests. With version 3.5.3, we had great hopes that this problem had disappeared as we had FMS running for more than a month on a test server. So we did the upgrade and we encounter another problem:

After less that one day of activity, our servers crashed for an unknown reason, almost at the same time. Analysing the logs, I saw out of memory messages, but how could two linux servers stop working in the same 5 minutes interval. I supposed there was a power failure of some sort. And I kept analysing because the sysadmins in the server room swore that they saw out of memory kernel panic on the consoles. After a bit of investigation, I found the reasons: the TCP connexions between FMS and the lighttpd server are stacking up and fill the OS network memory! Now, I don't know who is the culprit:

  1. FMS code does not manage correctly HTTP proxy sessions?
  2. lighttpd does not handle correctly HTTP proxying from FMS and does not close correctly the sessions?
  3. Our Ubuntu linux network stack is not correctly configured to manage so many connexions?

Here are the details, any assistance is welcome!

The servers: linux Ubuntu 8.04 with 8 GB RAM.

FMS 3.5.3 proxying to lighttpd 1.4.19 listening on port 81. FMS and lighttpd are set up on the same server.

The streamer receives many (> 10/s) HTTP requests from iPhones which are proxied to lighttpd. iPhones request partial content and lighttpd is well suited for these type of requests.

Locking at the state of the TCP stack:


# netstat -st
IcmpMsg:
    InType3: 6
    InType8: 5501
    OutType0: 5501
    OutType3: 6
Tcp:
    101662 active connections openings
    1192680 passive connection openings
    33 failed connection attempts
    177391 connection resets received
    1653 connections established
    73559682 segments received
    70196072 segments send out
    165137 segments retransmited
    0 bad segments received.
    37219 resets sent
UdpLite:
TcpExt:
    22 resets received for embryonic SYN_RECV sockets
    141884 packets pruned from receive queue because of socket buffer overrun
    2052 packets pruned from receive queue
    40293 TCP sockets finished time wait in fast timer
    801 time wait sockets recycled by time stamp
    711318 delayed acks sent
    2401 delayed acks further delayed because of locked socket
    Quick ack mode was activated 535 times
    68554 packets directly queued to recvmsg prequeue.
    11705 bytes directly in process context from backlog
    341070 bytes directly received in process context from prequeue
    5114830 packet headers predicted
    22083 packets header predicted and directly queued to user
    32542120 acknowledgments not containing data payload received
    25289961 predicted acknowledgments
    9191 times recovered from packet loss due to fast retransmit
    6222 times recovered from packet loss by selective acknowledgements
    Detected reordering 7 times using FACK
    Detected reordering 1 times using SACK
    Detected reordering 7 times using reno fast retransmit
    Detected reordering 9 times using time stamp
    15 congestion windows fully recovered without slow start
    42 congestion windows partially recovered using Hoe heuristic
    17 congestion windows recovered without slow start by DSACK
    16517 congestion windows recovered without slow start after partial ack
    51 TCP data loss events
    3916 timeouts after reno fast retransmit
    3372 timeouts after SACK recovery
    10048 timeouts in loss state
    9157 fast retransmits
    6 forward retransmits
    28990 retransmits in slow start
    30176 other TCP timeouts

    7656 classic Reno fast retransmits failed
    95 SACK retransmits failed
    2738020 packets collapsed in receive queue due to low socket buffer
    52 DSACKs sent for old packets
    52 DSACKs received
    3 connections reset due to unexpected data
    74950 connections reset due to early user close
    2846 connections aborted due to timeout
    TCP ran low on memory 10 times
    TCPDSACKIgnoredOld: 41
IpExt:

The last red line can be visualised below; sometimes the server crashes in out of memory kernel panic...

And if you compare with previous behavior, you can tell when we did the upgrade to FMS 3.5.3...

A few words on our previous configuration with FMS 3.5. To be able to accept HTTP requests, we have a front load balancer which is able to route HTTP requests to lighttpd and RTMP requests to FMS. But with this configuration, we don't benefit from the ability of FMS to fallback to RTMPT for clients behind a proxy or proxy ol' HTTP requests... With the previous configuration, we never had so many connections active simultaneously: we change from 200 max to more than 7000 with FMS 3.5.3 in proxy!

Some figures on what exactly happens:

# netstat -tn | grep :81 | awk '{print $6}' | sort | uniq -c
     20 CLOSE_WAIT
   2245 ESTABLISHED
   1236 FIN_WAIT1
      5 TIME_WAIT


# netstat -tn

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0 275841 127.0.0.1:81            127.0.0.1:45088         FIN_WAIT1
tcp   562347      0 127.0.0.1:55150         127.0.0.1:81            ESTABLISHED
tcp        0 225152 127.0.0.1:81            127.0.0.1:34856         ESTABLISHED
tcp        0 238081 127.0.0.1:81            127.0.0.1:54310         FIN_WAIT1
tcp        0 114433 127.0.0.1:81            127.0.0.1:51810         FIN_WAIT1
tcp   670800      0 127.0.0.1:40746         127.0.0.1:81            ESTABLISHED
tcp   597444      0 127.0.0.1:40654         127.0.0.1:81            ESTABLISHED
tcp   592810      0 127.0.0.1:40669         127.0.0.1:81            ESTABLISHED
tcp        0 103168 127.0.0.1:81            127.0.0.1:54544         ESTABLISHED
tcp        0 201344 127.0.0.1:81            127.0.0.1:40567         ESTABLISHED
tcp        0      0 10.1.74.30:1935         10.1.74.2:60372         ESTABLISHED
tcp   665360      0 127.0.0.1:45282         127.0.0.1:81            ESTABLISHED
tcp        0 210432 127.0.0.1:81            127.0.0.1:40662         ESTABLISHED
tcp        0 207361 127.0.0.1:81            127.0.0.1:44895         FIN_WAIT1
tcp        0 109825 127.0.0.1:81            127.0.0.1:52144         FIN_WAIT1
tcp        0 221825 127.0.0.1:81            127.0.0.1:54844         FIN_WAIT1
tcp        0 103168 127.0.0.1:81            127.0.0.1:40791         ESTABLISHED
tcp   782301      0 127.0.0.1:55074         127.0.0.1:81            ESTABLISHED
tcp        0 219137 127.0.0.1:81            127.0.0.1:45040         FIN_WAIT1
tcp        0 209921 127.0.0.1:81            127.0.0.1:44837         FIN_WAIT1
tcp   574170      0 127.0.0.1:45275         127.0.0.1:81            ESTABLISHED
tcp        0 252033 127.0.0.1:81            127.0.0.1:54907         FIN_WAIT1
tcp        0 254977 127.0.0.1:81            127.0.0.1:54515         FIN_WAIT1
tcp        0 142977 127.0.0.1:81            127.0.0.1:52129         FIN_WAIT1

In case this is needed, FMS configuration is almost out of the box:

_defaultRoot_/Adaptor.xml:

<HTTPTunnel>
                <Enable>true</Enable>

                <NodeID></NodeID>

                <IdlePostInterval>512</IdlePostInterval>

                <IdleAckInterval>512</IdleAckInterval>

                <IdleTimeout>60</IdleTimeout>

                <MimeType>application/x-fcs</MimeType>

                <WriteBufferSize>16</WriteBufferSize>

                <SetCookie>true</SetCookie>

                <Redirect enable="false" maxbuf="16384">
                        <Host port="80">:8080</Host>
                        <Host port="443">:8443</Host>
                </Redirect>

                <HttpProxy enable="true" maxbuf="16384">
                        <Host port="80">${HTTPPROXY.HOST}</Host>
                </HttpProxy>

                <NeedClose>true</NeedClose>

                <MaxWriteDelay>20</MaxWriteDelay>
                <MinWriteDelay>12</MinWriteDelay>

                <MaxHeaderLineLength>1024</MaxHeaderLineLength>

</HTTPTunnel>

I've tried changing lighttpd configuration to handle KeepAlive session or not, it has no influence on the behaviour.

If you spot a configuration mistake or have any idea, I'm ready to test it as soon as possible... Asa?

Thanks

    This topic has been closed for replies.

    2 replies

    Participant
    August 10, 2010

    Has this been fixed? What was done?

    We have been having the same issue. We worked on it with Adobe support and thought it might have been fixed with a "patched" version of FMS that was created to solve the problem. However, we still see the issue (recently we had a huge increase in load due to an annual event).

    Over time, when we run netstat, we see Apache connections (127.0.0.1:88134) stuck in a FIN_WAIT_1 state while the corresponding FMS processes (127.0.0.1:1025 to 5000) are in an ESTABLISHED state. It appears as though the connection is closed from the http proxy side while FMS doesn't acknowledge the close.

    We notice this behavior seems related to load. High load = connections getting hung more frequently. The only fix seems to be restarting the FMS service which restarts the embedded apache as well.

    I'm curious if you found a fix or even a workaround to help you cope with these connections. Even something to close them periodically would be a huge help. So far, we're resorting to an automatic restart of the server ever 2 days which is hardly a proper production solution.

    I would really appreciate any information you could provide with regards to your particular problem and whether or not it's been solved.

    An example (just a snippet) of the netstat monitor we have running each hour on the half-hour:

      Proto  Local Address          Foreign Address        State

      <snip FMS to Apache>

      TCP    127.0.0.1:2847         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2848         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2849         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2850         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2851         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2853         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2854         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2855         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2856         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2857         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2858         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2859         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2860         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2861         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2862         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2863         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2864         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2865         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2866         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2867         127.0.0.1:8134         ESTABLISHED
      TCP    127.0.0.1:2868         127.0.0.1:8134         ESTABLISHED

      </snip FMS to Apache>

      <snip Apache to FMS (corresponding entries)>

      TCP    127.0.0.1:8134         127.0.0.1:2847         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2848         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2849         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2850         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2851         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2853         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2854         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2855         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2856         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2857         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2858         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2859         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2860         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2861         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2862         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2863         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2864         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2865         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2866         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2867         FIN_WAIT_1
      TCP    127.0.0.1:8134         127.0.0.1:2868         FIN_WAIT_1

      </snip Apache to FMS>

    This particular log shows all ports between 1024 and 5000 in the same state as above. A restart resets all the connections and the process starts over. From my investigation, Apache seems to be acting properly. It tries to be a 'good citizen' and issue a graceful close that goes unacknowledged by FMS. Unfortunately, Apache doesn't go back to check a connection it considers closed, so they connection remains half-closed.

    Please, please let me know if you found anything out. Information on the web is scarce and I'm at my wit's end!

    Asa_-_FMS
    Adobe Employee
    Adobe Employee
    August 11, 2010

    Hi dgianetti,

    Like Pierre, if you can contact me offlist we should be able to help you here.

    awhilloc@adobe.com

    Thanks,

    Asa

    Asa_-_FMS
    Adobe Employee
    Adobe Employee
    March 8, 2010

    Hi Pierre,

    Feel free to contact me off-list about this.  We'll probably want to take a hang dump of FMS to determine if we've lost reader threads or something else and sort the problem.

    Asa