The Problem With Key Poses
As I’ve previously mentioned, it appears to be very difficult for me to prevent whole and entire books from coming out of my mouth when I try to answer people’s questions. And if you’re going to write a book, you might as well publish it!
Last week I received an email from Iris Chase. With their permission, I’m reproducing the email (lightly edited for brevity) and answering it here, because I’m lazy and only want to write things once.
Hello Raf!
Sorry if this is a bit rambling, your blog just got me really excited!
Seeing your siggraph presentation practically brought tears to my eyes. You're exploring a lot of stuff I've been dreaming of doing for the past couple of years. I want to thank you for all the work you've done and the thoughts you've put out on your blog, which I read in lieu of eating a few days ago (resulting in a splitting headache, haha, worth it).
I understand that you are more interested in mixing 1s and 2s (maybe even 3s?? Blasphemy!) at 24fps but my interest lies in doing things at 60+ (Ideally 120 but I didn't wanna make a "TWOS SPARKLE" type's head explode so we'll just say 60 for now). I'm also interested in the more realistic/subtle (but not quite photoreal, more like a painting).
Obviously I want to avoid linear interpolation, but tuning each frame is impractical at best given those framerates.
You say that maybe ephemeral rigging could be useful for subtly given further development of the tools, have you put much thought into this?
Sticking with the zBrush analogy, where polygonal modelling is still really good for hardsurface modelling, I can't help but feel that splines are better suited for subtle, moving holds. How on earth else do you intend to say "Tilt head .5mm over the course of 5 seconds, no wait not quite that far" to a computer?
You talked about maybe baking everything into a traditional FK rig to fine-tune, but that seems kind of depressing and causes you to lose so many of the advantages of ephemeral rigging, where the graphs don't intuitively map to what you're seeing. As far as I know one of the worst parts about traditional rigs in the first place is how difficult fine-tuning actually is.
I was thinking about it last night and had the thought of interpolating the ephemeral rigs between key poses. Where the interpolation exists only between two poses, and the spline controls just the attenuation of that particular rig, against those particular joints. Then further refinement could be built up on top of the new motionpaths, but only worrying about how the controls of the ephemeral rig attenuates the result of the previous attenuation, potentially resulting in much more intuitive looking/controllable splines (which would just be a factor to drive the rig over time, no coordinates). I'm thinking out loud a little bit here so please stop me if it's ridiculous, and sorry if it's not totally clear, I intend to spend the rest of the day thinking about it, haha.
Thank you for your time,
~Iris.
P.S. Your cat is adorable!
First of all, thanks for your email, and I’m glad my thoughts have been helpful to you! This is actually something I’ve been thinking about a lot since SIGGRAPH, where I had a chance to talk to a lot of other people working on animation tools. I was initially looking at this purely from the standpoint of cartoony character animation, but I’ve come to believe that interpolationless animation has, if anything, even greater potential value to people editing very dense animation data. That would include things like mocap data and procedurally generated motion, but it would also include any kind of per-frame animation data at a high frame rate. Yes, I do actually think that 120fps animation would actually be better done interpolationlessly if we had the right tools! Those tools would necessarily have to mean that you wouldn’t have to edit every pose like I do with my variable pose rate cartoony animation.
The idea of interpolating the pose rather than individual controls might indeed be better then what we have in some ways, but I don’t think it address some of the basic issues inherent to key poses. The assumption behind persistent key frames/poses/whatever is that there are specific frames/poses that are useful for defining motion and also for adjusting it. No matter what, if you have key poses that you’re interpolating between and that defines your motion, you’ll have to modify it by adjusting those poses. They’ve essentially been baked into the shot.
I think this assumption ultimately descends from traditional drawn animation, where the concept of key drawings/poses originated. In that case, you’d be drawing your key poses and then making little graphs telling your in-betweening crew how they were supposed to “interpolate.” The key drawings would be pretty much set in stone at that point. But, while I do have something of a reputation for pencil envy and will generally stan traditional animation techniques, it’s worth remembering that drawn animation was invented under production pressure, just like CG, and had its own compromises to make.
This “waterfall pipeline” between key animators and inbetweeners has not by any means always been something animators have liked (Bill Tytla apparently greeted a young Milt Kahl, an inbetweener at the time, with “What scenes have you screwed up lately?”) Some drawn animators prefer to do their own inbetweens when they can, or prefer animating straight ahead, without defining key drawings at all (e.g. Shamus Culhane). And that’s with human inbetweeners who are capable of making intelligent decisions about arcs and character silhouettes in a way even very sophisticated algorithms cannot yet do (though maybe machine learning will change that).
The animation we see on screen is a continuous stream of poses. Key poses are how we rough out the motion and begin to visualize it, but they are not inherent to that motion--what they are is cognitive tools, allowing us to conceptualize motion in mentally digestible slices and make judgements about it’s aesthetic qualities and emotional impact while it is still under construction. When I animate, I try not to become too locked into them. As I begin filling out the character’s movement with breakdowns, I will inevitably begin to see how the key poses should be adjusted to flow better with the movement I’m shaping. By the time I’m finished with the shot, there are no longer any discernible, specific key poses. If I want to rework the motion and decide to return it to a set of key poses, I may well choose a completely different set that most effectively describes the extremes of motion that exist now, as opposed to the ones I was imagining when I began laying down key poses.
To return to the zBrush analogy, the advent of tools like Dynamesh and the ascendency of retopology-based workflows seems to me to essentially be an admission that there aren’t any appropriate set of starting edge loops for a complex organic shape in the general case. You literally can’t make one that won’t be pretty likely to paint you into a corner at some point. Sure, conventional polygonal modeling and subdivs/NURBs are still effective modeling tools for a lot of mechanical shapes, but that’s because those shapes are very “parameterizable”--they have clearly defined topological features like “corners,” but the spaces between those features are simple and can be generated procedurally. Characters generally aren’t like that, either in modeling or in animation, and that’s as true for high frame rates as it is for low ones. They have “specific complexity.”
This view is born out by some of the discussions I had at SIGGRAPH. One of the topics of discussion among animation tools developers is how to take high-frequency data and turn it into something animators can edit without the issues inherent to animation layers. However, research into turning that data into usable keyframes at places like Pixar seems to have foundered on a fundamental issue—no two animators can agree on where those keyframes should be placed! This is one of the primary reasons why I’m beginning to see interpolationless animation as a general solution to the problem of editing motion, not limited to a low/variable pose rate. While I generally haven’t been that interested in procedural animation (wanting as much artistic control as possible), it’s possible that there actually are ways that machine learning could assist animators significantly that just haven’t been explored because they do not fit into a keyframe-based workflow. If all motion is ultimately just per-frame data, and you have the right tools to manipulate that data, then different forms of motion creation can be utilized together much more freely.
So what would tools for editing dense animation data actually look like? This is all pretty speculative, and the real answer is “I don’t entirely know yet.” We won’t be able to see what techniques are most effective till we have a chance to try them out, just as the details of ephemeral rig manipulation came out of developing and testing different versions of the system, and wasn’t conceived of a priori.
However, I do think we can make some educated guesses about what techniques would work well. You can look at the problem, for instance, as one of “control pose placement.” You might start with some poses that you then inbetween (using spline or linear methods). What you’d end up with at that point wouldn’t be keyframes with interpolation—you’d just have a pose on every frame. There would no longer be “key poses.” When you wanted to edit that motion, you’d then choose “edit poses” that would effect the animation as if they were key poses, but they would be chosen on a manipulation-by-manipulation basis, not baked into the scene. You can also think of this as a motion edit with a falloff (like Source Filmmaker) or temporary animation layers (which is kind of like the workflow Richard Lico has developed using standard animation layers and just baking them all the time). If a system like this was integrated with an ephemeral rigging system, you’d have a lot of freedom to manipulate arbitrarily-dense animation data in whatever way you need to.
We’re pretty far away from having such a system now (though I have some hopes that Mark3 of the ephemeral system could eventually grow into one), but I do really believe that it would be the right approach for essentially any form of character or creature animation if the tools were developed far enough.
Regardless, good luck with your own approach! I’m happy to see more people thinking about how to improve animation tools, regardless of whether they end up going down the same roads I am.
As an aside, since you have expressed interest in felines and the adorableness thereof, I present to you these rare images of felis preatorium, the elusive Hallway Panther, photographed in her natural habitat. Travelers in hallways are advised to guard their heels and pants legs carefully against attack by this fierce ambush predator.