Skip to main content
Known Participant
October 6, 2010
Question

Fun reading about FMS, FMLE, CDN's and automatic stream recovery

  • October 6, 2010
  • 1 reply
  • 978 views

Long.  But just thought people might be interested.

Wow, where to start.

How about with a question - what do you do when your encode connectivity to your FMS takes a hit?

ENCODER --------|break|-----------FMS--------CDN----------VIEWERS

You know, like someone walks by your encoder and accidentally kicks the network cable out.  Or your T1 leaving the broadcast location goes down for 20 seconds or so and then pops back up. Or a router not far from you explodes.

Well, if someone kicks the plug out, just plug it back in.  You'll be fine.....  Not.

If the T1 goes down, don't worry.  When it comes back up, streaming will recover gracefully and all of your viewers will be impressed at the resiliency of your system.

Not.

If a router explodes, duck and cover.

Well, after weeks of testing these scenarios with two well known FMS CDN's, FMLE, a well known hardware encoder, and live streams of a looping DVD running so long that I literally wanted to set both the player and DVD on fire, I've concluded something - automatic, graceful stream recovery in a real live broadcasting situation to thousands of viewers is like Snuffleupagus on Sesame Street - only Big Bird has ever really seen it.

Recovering a live stream automatically, if you have your encoder connected to FMS and a viewer connected directly to that same FMS, is easy.  Works every time.  Pull the plug on your encoder, wait 20 seconds, and plug it back in.  The viewer picks up the stream again.  Piece of cake.

But doing so when configured as I laid out earlier - encoder to FMS, FMS republishes to CDN origin, CDN origin forwards to CDN edges, viewers connect to CDN edges - is almost like understanding banking derivatives.  It's probably not gonna happen.

It could happen if you encode with FMLE.  That's one of my observations after the testing.  FMLE seems to work, even in the complex config laid out above.  Kick the plug out, and all of your thousands of viewers will recover if you just plug back in (and I use the kick the plug out situation as an example.  More likely you'll have a T1 go down or have some other temporary connectivity bump at the broadcast location).

But I use a popular (and expensive) hardware encoder and not FMLE (doh!!!).  And, of course, that doesn't work.  Kick the plug out, put the plug back, and the viewers get a face full of frozen stream - forever.  Unless, of course, they manually refresh their browsers.

Now the encoder seems like the culprit and it's fair to say that it probably is.  But here's what I learned in all of the testing.  These are the things that really make troubleshooting and solving this issue nearly impossible:

1.  Encoder live stream connections to FMS are subjective; each manufacturer is at liberty to implement them as they see fit.

2.  How each manufacturer does it is kind of like a state secret, so troubleshooting is kind of like a faith based initiative

3.  Republishing streams - like from one FMS onto another FMS, say at a CDN, is also a bit of a state secret.  And, unfortunately, this is where you'll be handling streaming errors like untimely encoder disconnections - in your main.asc at the first FMS in the chain.  I may be missing it, but the work flow for this republishing process is really not documented anywhere that I can see.  There is an Adobe doc on "Publishing from Server to Server" but if you tell your CDN you're using it to republish to their network they'll come to your house and throw garbage on your lawn.  First of all it contains no mention of "FCPublish" - and CDN's loves them some FCPublish.  More importantly, it contains no FCUnpublish or releaseStream - two things seemingly critical for automatic stream recovery in the event of a connectivity problem on the encode side.  So I don't think this document is meant cover republishing in any sort of robust or comprehensive way.  Just a code tidbit or two.

4.  There is no NetStatusEvent activity generated at the server (in the FMS console) when you pull the encoder plug (I can understand this - the plug is out).  Correspondingly, there is no NetStatusEvent activity received further on down the chain all the way to the viewer's video player.  Again, understandable - FMS is not aware that anything out of the ordinary has occurred yet.  But what this means is I can't take any action at the player to refresh the NetConnection and NetStream.  No event to hang my hat on.  Even creepier is the fact that I don't need to take any such action at the player when FMLE is used to encode.  Data just begins playing magically again when connectivity is reestablished.  Wish I knew how they did that.

5.  The NetStatusEvent activity begins *after* the encoder cable is reinserted.  Makes sense - someone out there is connecting to FMS.  You'd expect a flurry of activity there.  Problem is that the republish code on that first server in the chain has two things to do on encoder reconnect: get rid of the old stale NC and NS objects, and make some sparkly new ones, and those two things appear intermingled when looking at FMS console traces.  Seems like prime ground for race conditions.  Additionally it seems the republish code there kind of has to know that this isn't someone else trying to publish a stream name that's already in use.  It has to know that this is the same guy and that he just got bumped off for a bit - but he's back now.

6.  After plugging back in, the republish code at the first FMS does successfully republish and data does begin flowing out to the viewing players again (I can see this in the FMS console and also when monitoring FMS activity at the end user video players). The video players' buffers fill.  But for some strange reason the players freeze when the hardware encoder is used.  They start playing again if FMLE was used.  Is the data coming from the hardware encoder somehow different?  Weird.

So I'm digesting all of this now and trying to devise a way to build resiliency on the encode side.  Tough when you consider all of the diverse environments that have to cooperate to make it all go - encoders, first FMS, CDN origin, CDN edges, player.

If you have any helpful suggestions in this regard, things I've missed, etc., they'd be welcome.


Thanks.

    This topic has been closed for replies.

    1 reply

    fmeuser
    Participating Frequently
    October 7, 2010

    You said that FMLE works even in the complex setup explained by you, but still you use the expensive hardware encoder. What exactly is the reason for you to switch to hardware encoder? Is it only for the testing purpose or functionality related?

    Known Participant
    October 8, 2010

    Good question.  It's functionality, portabliity, and ease of use on remote job sites.

    The hardware encoder does things like multiple formats, streaming to two different locations at once (redundancy), more flexible signal input and output options, etc.  We send it out on jobs and techs can operate easy onsite.  It's a corporate thing.  I don't really get to make the call on that.

    The point of my post was to give a glimpse of the complexity in troubleshooting a problem like this when the work flow spans multiple environments.  Nobody's in charge.

    I have passed the FMS console traces from the FMLE encode and the hardware encode sessions to the hardware encode manufacturer and the CDN.  They demonstrate that the hardware encoder connect differs from the FMLE connect workflow.  Some methods are not called and some others are called but in a different order.  Who's right?  Is there a standard?