Audio Advances Help Xbox One Determine Signal from Noise

Published October 16, 2013

Share this page

Posted by Rob Knies

You might remember Ivan Tashev as the researcher behind the audio technology that helped to make Kinect for Xbox 360 such a marketplace sensation a couple of years ago. Now, with Xbox One headed for a Nov. 22 debut, Tashev, a principal software architect for Microsoft Research, is about to see his efforts with the Xbox team result in a new set of Kinect audio enhancements. This collaboration will extend the capability to control the console via voice command.

“Everything is on a higher level,” Tashev says, “with tougher requirements, with way, way stronger criteria for release and for how the audio, the acoustics, and the entire audio system should work. We had a lot of work to do, and we improved the system a lot.”

Tashev’s contributions are featured in the debut of the new Microsoft Research Luminaries video series on Channel 9, in which he describes what’s new with the Kinect’s audio performance.

But if you don’t have time for the video, here’s the key: new algorithms in the acoustical echo-cancellation block.

“There are improvements in the way we prepare and train the audio pipeline to meet the higher-quality criteria and higher speech-recognition criteria,” Tashev says. “That means software and hardware designed here at Microsoft Research and transferred to the Xbox audio team.”

That team, by the way, has been augmented, enabling even smoother collaboration.

“The Xbox team has added specialists in acoustical design and digital signal processing [DSP],” Tashev observes. “They now have three DSP engineers with Ph.D.s in signal processing. Just speaking the same signal-processing languages meant we were able to collaborate really effectively from the onset.”

In fact, Tashev is essentially an extension of the Xbox audio team, participating in Xbox 360 postmortem meetings and subsequent performance analyses, trying to determine which algorithms were underperforming and where the collaborators could find new optimizations.

“Our close collaboration with Ivan Tashev has been instrumental in building an Xbox team of audio experts and expanding the core audio-pipeline technology,” says Thomas Soemo, principal program manager lead for the Xbox platform, “which have both been essential to driving the new consumer scenarios in Xbox One.”

In a way, Tashev and his Xbox audio colleagues are victims of their own success.

“When the audio and speech recognition worked so well with Xbox 360, people got excited,” Tashev says. “With Xbox One, speech is integrated heavily into the platform, making it even more crucial to get right. We are transitioning from a nice, cool feature to an integral, inseparable component of our gaming platform.

“That’s why the bar for reliability rose so much. We’re delivering a tremendously improved audio pipeline and speech recognizer.”

The goal: making voice control better than ever.

“The bottom line,” Tashev says, “was the ability to deliver hands-free sound capturing, with people speaking from a certain distance—one, two, three, four meters—and being able to clean up the signal from noise and reverberation, removing the sound from our own loudspeakers, and delivering it nice and clean to the speech recognizer. At the time of Xbox 360, this was science fiction. Today, it’s a critical part of our human-machine interface.”

Turning science fiction into non-fiction is what drives Tashev.

“During all of my time at Microsoft Research, I absolutely have been convinced that, sooner or later, this time would come,” he says, “that we would be able to talk to our computers without wearing headphones, without holding a microphone. The way to achieve this is using speech-enhancement technologies, using multiple microphones, using microphone-array technology, using sophisticated speech-enhancement and audio-processing algorithms.

“This was my motivation, to see this vision become reality. The time came, the device appeared, and Kinect for Xbox 360 was a breakthrough. Now, with Xbox One, we are excited to see our audio capabilities deliver a whole new level of enjoyment for our customers.”

Research Groups

Audio and Acoustics Research Group

Microsoft Research Blog

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

Research Groups