How to determine when FMS is starting to get overloaded?
We have a set of FMS's deployed on Amazon's EC2. One of the things we want to be able to do is automatically detect when we should start up another FMS instance. To do that, I've been looking for metrics I could measure on the local FMS box to help me identify "transition" points, e.g., when we should add capacity or remove excess capacity.
I ran some load testing to find out where the capacity limits of a particular box, but ran into a couple of problems:
* Traditional system metrics (cpu/memory/run queue length) did not do a great job of predicting when we'd hit a wall. Load was really the only thing that seemed to climb much and it was only at about 4 (on a 4-core box) when things went south.
* When we *did* hit a wall, it was a pretty sharp cliff. We seemed to be doing fine at 70+70 streams (~300kbps streams in reflected out) and at 75+75 streams, but when I went to 80+80 streams, BAM! Things just started unravelling. With very little in the way of error logs to indicate what might be happening. But all of the sudden, my counters for simultaneous streams/etc dropped down from 80ish to 20ish (I was still publishing 80 to the server).
I tried bumping up the EC2 instance size (under the theory that we were being bandwidth-capped or stream-capped), but didn't really see much difference.
I see two possibilities:
* We actually are being bandwidth- or stream- capped and going up to a bigger box didn't help
* There are a number of other metrics on the server I could look at that would have shown a gradual degradation.
Assuming the latter, does anyone have any suggestions for what metrics I might measure on the FMS to decide if we were starting to get loaded? For example, I've thought about comparing Stream.time to NetStream.time for streams I'm reflecting out of the server.
