Learning Visual Behavior for Gesture Analysis (Master’s Thesis)

Master’s Thesis: Massachusetts Institute of Technology |

Techniques for computing a representation of human gesture from a number of example image sequences are presented. We define gesture to be the class of human motions that are intended to communicate, and visual behavior as the sequence of visual events that make a complete gesture. Two main techniques are discussed: the first computes a representation that summarizes configuration space trajectories for use in gesture recognition. A prototype is derived from a set of training gestures; the prototype is then used to define the gesture as a sequence of states. The states capture both the repeatability and variability evidenced in a training set of example trajectories. The technique is illustrated with a wide range of gesture-related sensory data. The second technique incorporates multiple models into the Hidden Markov Model framework, so that models representing instantaneous visual input are trained concurrently with the temporal model. We exploit two constraints allowing application of the technique to view-based gesture recognition: gestures are modal in the space of possible human motion, and gestures are viewpoint-dependent. The recovery of the visual behavior of a number of simple gestures with a small number of low resolution example image sequences is shown. We consider a number of applications of the techniques and present work currently in progress to incorporate the training of multiple gestures concurrently for a higher level of gesture understanding. A number of directions of future work are presented, including more sophisticated methods of selecting and combining models appropriate for the gesture. Lastly, a comparison of the two techniques is presented.

We consider a number of applications of the techniques and present work currently in progress to incorporate the training of multiple gestures concurrently for a higher level of gesture understanding. A number of directions of future work are presented, including more sophisticated methods of selecting and combining models appropriate for the gesture. Lastly, a comparison of the two techniques is presented.