I would work backward in this case. Take for instance a home DSL connection with 1.5 Mbps down. Now given that constraint, you would need to aggregate 5 webcam streams over it. This will compromise the bitrate of the content (ie visual quality) as each stream would take up less than 300 kbps as one would need to account for TCP overhead. This means that the webcams need to encode the content to meet these requirements.
Now by doing so, you have 5 separate streams coming from the webcam, which you now want to turn into a "single one". You can multiplex these 5 streams over a single netconnection (or 5 streams over 5 different netconnections), but the network footprint will not be any smaller because those streams have been encoded at that bitrate.
So the moral of the story is: find a good encoder.