Is there a good method/software for determining best rotoscoping method & estimating roto time?

Report · Mar 25, 2019

Different rotoscoping tasks can be easy or complex and time consuming, and a lot can depend on the contents of the shot and which parts need rotoscoping or otherwise separating from the background.

One thing that could affect time or complexity of rotoscoping could be the amount of complex motion in the shot (though it won't always be the case as there might be complex motion in a shot but a client may only want something rotoscoped in it that doesn't have the complex motion).

Is there any available software/quick methods of calculating the complexity of motion in a shot that would give a figure, ie. complexity of motion vectors (like something that could give it on a scale of complexity) such as 0-100 motion vector complexity, that could be used in estimating rotoscoping complexity/time of a video/shot?

Though motion vector complexity doesn't necessarily impact roto complexity (eg. it depends which things need to be removed from the background, I assume motion vector complexity could also be affected by noise or similar but maybe some pre-blurring could be used), but I think on average it would be a good indicator of roto complexity.

-

Another question would be is there a simple way to determine the best (quickest & easiest) way to remove the background from a shot (whether by rotoscoping with masks, roto with rotobrush, or keying etc. or is it just best to try a few?). Are there cases where some use of difference mattes could be used or software that can sometimes automatically/semi automatically remove the background from particular shots (eg. such as a shot with horses charging at the camera with a slight camera move at the end and the horses completely exit at the end of the shot, and the grass would need to be removed and the trees and sky in the background removed. Is there a way of using difference mattes that could work for that (I tried but couldn't get that to work. There's also some movement in the grass)? Is there any current software that could almost automatically do that (separate the horses & riders from the background)?

Report · Mar 25, 2019

You're not going to like this answer: experience.

There's nothing automatic about rotoscoping.

Report · Mar 25, 2019

But surely if there is software that gave a complexity level of motion vectors that could be used in a calculation to help determine roto time. Or maybe it could be used as an indicator of maximum roto time.

edit: Maybe the motion vector info already stored in a H264 video is shown by some software or could be used in some calculation to give an average motion vector complexity figure for the entire video which could then be used in a calculation for estimated rotoscoping time.

There was one program years ago that could remove an object from video by just placing a rough mask on the object (though I think it depended on the shot). Maybe it would need a trained AI program to auto-separate particular objects and they aren't really available yet. But maybe there could be a best procedure (if not software) that could give the likely best method for a particular shot.

You're not going to like this answer: experience.

Part of that could be from knowing what a particular shot contained and how long it took with particular methods. In theory some of that info could be put into an algorithm/calculation to give an approx roto time as well as maybe a best method for a particular shot.

Report · Mar 25, 2019

I have been doing roto for so long that I can tell you in about two minutes if the shot is going to take me 10 minutes or 10 hours. The more experience you gain the easier it it is going to be to just look at a shot and know how long it’s going to take.

The most common mistake or someone new to this particular job is to try to do too many things with a single mask. If I’m trying to cut out a person there will usually be eight or 10 masks. One mask for the torso with maybe three or four keep frames, a mask for each arm with maybe seven or eight keyframes, and a mask for the head with maybe 10 or 12 keyframes. I see a lot of newbies putting keyftames for every five or six frames on a mask that has 40 vertices and includes an entire person. You would have to have an extraordinarily complicated piece of software to analyze a shot and make an intelligent decision on which parts to include in each mask. A couple of days with a good instructor Is all that you would need to be able to look at a shot and figure out how long it’s going to take to do the roto and be accurate with your prediction.

Another thing that helps an awful lot is knowing when to motion stabilize or even do a modified corner pen track before you do the Roto.

I was teaching a class not too long ago and we needed to separate an actor from the background for about 15 frames while he walked in front of the sign. Almost every student tried to include the entire body for the entire length of the clip. All that was necessary was to motion stabilize the shot so that the sign didn’t move. Create a mask that covers only the actors upper torso for 15 or 20 frames, then put the motion back in the shot. The entire job took less than 10 minutes for a few but some were trying to Roto the entire 300 frames in the shot And they never finished.

I personally don’t think that running a clip through some kind of analysis software is going to be anywhere near as efficient as a little experience and just taking a look at the shot.

Report · Mar 25, 2019

I agree that with many vertices on a mask it can be difficult. Sometimes it needs a lot of vertices for when it's closeup but really needs less when it's in the distance (though if possible it's probably best to split the clip into sections I assume for that).

I agree that where possible more masks with fewer vertices per mask should be simpler to work with than a mask with many vertices that covers multiple joints since that should make the joint rotations a lot easier. Though it can also mean you end up with many masks and many masks on screen simultaneously. I think other software can hide masks where After Effects doesn't have that option (an alternative for fewer masks shown could be to duplicate the footage I suppose (and have some masks on different copies of the clip). But Adobe could make it easier to quickly organise masks (eg. so you can have parented masks for simpler organisation and finding the right mask to change. In theory rotoing things like people or horses etc. could be done using a joint type system for the masks. That should make it faster.

I personally don’t think that running a clip through some kind of analysis software is going to be anywhere near as efficient as a little experience and just taking a look at the shot.

If the auto-analysis type software isn't best software could still help in giving a good estimation based on a few things you typed in or selected (maybe a spreadsheet calc could be used instead). Sometimes a quote is needed for a video (or maybe multiple videos) before you see the video and only know a few things (such as the approx duration and the resolution). Maybe such software (or spreadsheet calc) could give a series of quotes/expected times based on different assumed complexities of the (as yet unseen) video(s) eg. "if the video contains x/y/z the quote would be X and time Y". But such software could also be used to give a fast, more accurate quote/estimated time when you have seen the video (even without the motion vector complexity value, even though I assume that would help), just by selecting/typing a few values for it. Though it probably isn't a simple calculation. Maybe neural network software could determine which factors (about the videos) affected roto time and complexity and how much, or maybe data/statistical analysis could, based on past work.

But those factors that affect it and approx how much should be relatively common to all roto artists (eg. it should be common info that could be used if we can determine that data, so we shouldn't have to wait to have worked on a similar shot before before giving as accurate quotes/times as possible for it).

Report · Mar 25, 2019

I think you are way over thinking this. Roto work is almost always different for each shot. Each setup requires a different approach. In all my years in the visual effects and movie making business, I've never even considered taking a talking head or a presenter and removing the background from several minutes of video. It's just not practical.

Rotobrush can greatly speed up some types of roto work, but most of it, nearly 90% of the roto work that I do is done to fix some kind of overlay. I've done some very complicated moving actor roto. One project had a general mask for most of the face, two masks on a hat, about 5 masks on the hair, and a ton of motion blur and feathering going on, and I think I even had a separate mask for the actors nose, but that kind of roto job is usually just a few frames. If you needed more than three or four seconds of that kind of roto work then you probably need to reshoot or ask for a couple more days of production time in the budget.

The key to successfully completing any masking or keying job is in planning the shot and executing it well. No client can afford to have you just take a camera and shoot something and then decide they need to remove the background. I don't think we are anywhere near having software that can look at a scene, decide what needs to be done, and give you a time estimate or set a budget. All the motion prediction in the world isn't going to be able to figure out which motion is the actor and which is the background, and what parts need to go and what parts need to stay. Software can figure out is where edges are (mask tracking can seriously speed up Roto by the way) but it can't figure out what you need to do to make the shot work.

Report · Mar 25, 2019

I don't think we are anywhere near having software that can look at a scene, decide what needs to be done, and give you a time estimate or set a budget

But a simpler version of that could just ask the user what is in the shot and details about the shot and give an answer based on that (eg. ask the number of people in the shot who need rotoscoping, the number of animals (such as horses) that need roto, ask how much each move, it could ask whether or how many things were in the shot that change very frequently (such as a flag flapping frequently in the wind which could require many keyframes), maybe it could ask if/how much very fine detail was in the shot that need to be preserved (eg. fine hair strands would be complex). In theory if the right info is asked for and it can be done quickly enough it should be able to give faster, more accurate quotes than it otherwise would.

All the motion prediction in the world isn't going to be able to figure out which motion is the actor and which is the background, and what parts need to go and what parts need to stay

Though recently it's been the shots with a lot of motion complexity (eg. horses galloping with multiple horses and people, people running, flags moving rapidly in the wind, sword fighting with multiple people), that are also long duration (up to 46 secs at 4K), that are more complex than simpler videos. These videos I think would have motion vector complexities that were indicative of actual complexity. The program could ask if the overall motion of the video was indicative of the main things that needed rotoscoping (eg. foreground subjects), and it could take that answer into account when determining it's calculation (of roto time/complexity and maybe the price to quote).

Report · Mar 26, 2019

Though recently it's been the shots with a lot of motion complexity (eg. horses galloping with multiple horses and people, people running, flags moving rapidly in the wind, sword fighting with multiple people), that are also long duration (up to 46 secs at 4K), that are more complex than simpler videos. These videos I think would have motion vector complexities that were indicative of actual complexity. The program could ask if the overall motion of the video was indicative of the main things that needed rotoscoping (eg. foreground subjects), and it could take that answer into account when determining it's calculation (of roto time/complexity and maybe the price to quote).

Not really. Even AI-bolstered approaches to this stuff merely determine segmentation and then try to guess around what each segment could be, trying to figure out how it could possibly move and what the typical properties are based on the gigazillions of samples stored in the learning database. If you head over to the Photoshop forum and check what issues people already have with Sensei-based features (Select and Mask, content-aware fill etc.) even for still images, you can see how limited this still is. To make matters worse, computers don't "think" like humans, so you wouldn't get an estimate out of just this type of data. You'd have to combine it with additional AI like perceptively rating what is visually acceptable even if not technically correct or even more basic how people actually do their work. You know, we humans tend to do "stupid" things like simplifying complex curves, chopping off ears and hair if only it looks okay-ish in the context of any given shot, we accept smudging up and patching stuff in physically implausible, illogical ways if only it looks good enough and a million other "bad" things that an automatic algorithm even with the most fuzzy logic likely just wouldn't do. So unless you can come up with some sort of global community initiative where people woulkd be willing to roto their hair off on standardized sample files for a year just to produce this data for training the AI, this will not be going anyehere. so like it or not, you're back to what the others already told you: Your personal experience is your best advise. After you've done this a bunch of times, you simply know the difference between cutting out a car on a road for an advert and fixing a poorly shot talking head in an interview...

Mylenium