Coding Human Lip Motions with a Learned 3D Model

Proceedings of the Int'l Workshop on Very Low Bitrate Video Coding (VLBV '98). Urbana, Illinois. |

The lips are a critical factor in spoken communication and expression. Accurately tracking and synthesizing their motions from arbitrary head poses is essential for high-quality video coding. Our approach is to build and train 3D models of lip motion to compensate for the limited information available during tracking. We use physical models as a prior and combine them with statistical models, showing how the two can be smoothly and naturally integrated into a synthesis method and a MAP estimation framework for tracking. Because the resulting description has a small number of parameters, it is ideal for coding as well. We show how our methods allow us to accurately recover the 3D lip shape from raw 2D video data and resynthesize this shape with a small number of parameters.