Robust access to large structured data using voice form-filling

Interspeech 2005 |

Organized by ISCA

A method for accurate and scalable form-filling by voice is presented. A form consists of a number of fields. Accurate speech recognition is achieved by applying task-specific inter-field constraints. The task constraints are specified typically by providing a database of valid form-entries, such as an employee directory containing the name, location, and telephone number. Scalability to very large vocabularies, number of fields, and the ability to accept a variety of user responses, is achieved by a two-pass recognition scheme. An index-based retrieval method is used in the first-pass to produce a shortlist of form-entries. These are rescored in the second-pass to obtain the final result. Experiments on a simple corporate directory access application are presented to demonstrate that the new approach compares favorably, in terms of computing needs, with a traditional one-pass speech recognition system. Experiments on a national street address recognition application are presented to demonstrate that the new approach scales very well to large tasks