Small group speaker identification with common password phrases

Speech communication | , Vol 31: pp. 131-140

Text-dependent speaker identification performance is investigated for small groups of speakers in which each speaker in a group is assigned the same sentence-long password utterance. Password utterances are modeled by whole-phrase hidden Markov models (HMMs). Several model construction conditions are studied. Baseline maximum likelihood estimate (MLE) models are constructed from three same-session training utterances. Minimum classification error (MCE) models are constructed using the training utterances of all speakers in a group. In addition, models are constructed using additional test utterances from speakers in the group or additional utterances from speakers outside the group. Results show that error rates approximately double from 5-speaker groups to 10-speaker groups. MCE models provide about 25% improvement in closed- and open-set identification error rates, but less improvement, about 10% in imposter accept rates. The greatest improvements are obtained, for both MLE and MCE models, when customer test utterances augment the training utterances. For MCE models closed-set identification error rates are approximately 0.4% and 0.6% for 5- and 10-speaker groups, respectively, while imposter accept rates are approximately 4% and 10%, respectively, when customer reject rates are 5%.