{"id":305930,"date":"2011-04-14T11:00:06","date_gmt":"2011-04-14T18:00:06","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=305930"},"modified":"2016-10-15T13:41:54","modified_gmt":"2016-10-15T20:41:54","slug":"kinect-audio-preparedness-pays-off","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/kinect-audio-preparedness-pays-off\/","title":{"rendered":"Kinect Audio: Preparedness Pays Off"},"content":{"rendered":"
By Rob Knies, Senior Editor, Microsoft Research<\/em><\/p>\n It always helps to be prepared. Just ask Ivan Tashev<\/a>.<\/p>\n A principal software architect in the Speech<\/a> group at Microsoft Research Redmond<\/a>, Tashev played an integral role in developing the audio technology that enabled Kinect for Xbox 360<\/a> to become the fastest-selling consumer-electronics device ever, with eight million units sold in its first 60 days on the market.<\/p>\n Kinect represents part of Microsoft\u2019s deep investment in natural user interfaces, which make computing intuitive to use and able to do far more for users. On April 13, Scott Guthrie<\/a>, Microsoft corporate vice president of the .NET Developer Platform, announced features of the impending Kinect for Windows non-commercial software-development kit<\/a> during MIX11<\/a>, a three-day, web-focused conference in Las Vegas. Tashev himself will be speaking that day about his work in a talk entitled \u201cAudio for Kinect: From Idea to \u2018Xbox, Play!\u2019\u201d<\/p>\n Such prominence isn\u2019t earned easily. In the case of the audio functionality for Kinect, it took a combination of preparation and patience to do the trick.<\/p>\n \u201cI spent pretty much my entire career in Microsoft Research,\u201d Tashev says, \u201cknowing that, sooner or later, people would be talking to their computers. I was absolutely sure they would not want to wear a headset. So, from my first day with Microsoft Research, I\u2019ve been working on the problem of hands-free sound capturing from a certain distance in normal conditions and having enough clean sound in the output, good enough for telecommunications and for speech recognition.<\/p>\n \u201cI didn\u2019t know which product would be interested in this. These technologies were designed in Microsoft Research, and, in our experiments, they worked on a small set of data, well enough that we wrote a scientific publication.\u201d<\/p>\n Enter Alex Kipman, general manager of Xbox Incubation within Microsoft\u2019s Interactive Entertainment Business. He was driving the development of Kinect, the revolutionary product that enables controller-free command of an Xbox. He encountered Tashev in 2008 during Microsoft Research\u2019s annual TechFest<\/a> showcase, and several months later, Kipman decided to follow up.<\/p>\n \u201cWe came to Microsoft Research,\u201d he recalls, \u201cand asked: \u2018Can you help us make a system that can do speech recognition without having to push a button to talk? We\u2019re all about no buttons, so you can\u2019t have a push-to-talk system.\u2019<\/p>\n \u201cAnd we said: \u2018The system needs to be listening to us 100 percent of the time. You can leave this on for days, and it still needs to work.<\/p>\n \u201cWe said: \u2018We want a system that can do speech recognition four meters at a distance. You\u2019re not going to have a captive audience a few feet in front of a microphone. People can be anywhere about four meters\u2019 distance, and they should still be able to talk and be recognized.\u2019<\/p>\n \u201cAnd then we said: \u2018Our environment is all about people having fun. If we do our jobs correctly, every single person is going to be having fun, so there\u2019s a lot of noise from the loudspeakers, and the system still needs to pick out the signal when that person to whom you\u2019ve been listening all day says, \u201cXbox, play movie.\u201d\u2019<\/p>\n Many people might have been daunted by such a formidable laundry list, but not Tashev.<\/p>\n