Boosting-Based Multimodal Speaker Detection for Distributed Meetings

Cha Zhang; Pei Yin; Yong Rui; Ross Cutler; P. Viola

Boosting-Based Multimodal Speaker Detection for Distributed Meetings

Cha Zhang ,
Pei Yin ,
Yong Rui ,
Ross Cutler ,
P. Viola

2006 Multimedia Signal Processing | September 2006

Published by IEEE

Publication | Publication | Publication | Publication

Download BibTex

Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based multimodal speaker detection (BMSD) algorithm. Instead of performing sound source localization (SSL) and multi-person detection (MPD) separately and subsequently fusing their individual results, the proposed algorithm uses boosting to select features from a combined pool of both audio and visual features simultaneously. The result is a very accurate speaker detector with extremely high efficiency. The algorithm reduces the error rate of SSL-only approach by 47%, and the SSL and MPD fusion approach by 27%.