State-based Recognition of Gesture

in Motion-Based Recognition (Computational Imaging and Vision)

Published by Springer | 1997

ISBN: 978-94-015-8935-2

DOI

A gesture is a motion that has a special status in a domain or context. Recent interest in gesture recognition has been spurred by its broad range of applicability in more natural user interface designs. However, the recognition of gestures, especially natural gestures, is difficult because gestures exhibit human variability. We present a technique for quantifying this variability for the purposes of representing and recognizing gesture.

We make the assumption that the useful constraints of the domain or context of a gesture recognition task are captured implicitly by a number of examples of each gesture. Specifically, we require that by observing an adequate set of examples one can (1) determine the important aspects of the gesture by noting components of the motion that are reliably repeated; and (2) learn which aspects are loosely constrained by measuring high variability. Therefore, training consists of summarizing a set of motion trajectories that are smooth in time by representing the variance of the motion at local regions in the space of measurements. These local variances can be translated into a natural symbolic description of the movement which represent gesture as a sequence of measurement or configuration states. Recognition is then performed by determining whether a new trajectory is consistent with the required sequence of states.

In this chapter we apply the configuration state representation to a range of gesture-related sensory data: the two-dimensional movements of a mouse input device, the movement of the hand measured by a magnetic spatial position and orientation sensor, and the changing eigenvector projection coefficients computed from an image sequence. The successful application of the technique to all these domains demonstrates the general utility of the approach.

We begin by describing related work on gesture recognition. We then motivate our particular choice of state-based representation and present a technique for computing it from generic sensor data. This computation requires the development of a novel technique for collapsing an ensemble of time-varying data while preserving the qualitative, topological structure of the trajectories. We develop and test methods for using the state-based representation to concurrently segment and recognize a stream of gesture data. Finally, we consider the relationship between the configuration states proposed here and Hidden Markov Models (HMMS).