This research explores audio devices that contain one or more microphones and/or one or more loudspeakers, and which support audible audio or ultrasound waves. The goal is to optimize the design and placement of microphones and loudspeakers, by considering the characteristics of the acoustical and geometrical environment, along with the perceived soundscape. This research is useful for products such as Microsoft HoloLens and Kinect for Xbox.
Microphone enclosure acoustical design
Placing a transducer into a device changes its acoustical parameters, which means device enclosures are a subject of acoustical design. We have done work on the acoustical design of various transducers for research purposes and assisted Microsoft engineering teams with the design of their products, including the acoustical design optimization for microphones in Kinect for Xbox 360. By optimizing the placement of the microphones and their positions, we achieved a directivity pattern better than that provided by open-air microphones.
Our acoustics measurement tool: the anechoic chamber
In our work on acoustical design and spatial audio, we have built a specialized setup in the Microsoft Research anechoic chamber. An arc with sixteen loudspeakers and sixteen microphones can be rotated around the device under test, letting us easily and quickly measure the 3D directivity or radiation patterns. We have also used this setup to measure the 3D directivity patterns of human hearing, also known as Head Related Transfer Functions (HRTFs).
Microphone arrays
A single microphone placed in front of a human speaker captures not only speech, but also noise and reverberation. These distortion effects can be reduced through beamforming: positioning several microphones closely together to form a microphone array, then processing the signals from these microphones to achieve higher directivity. Over the years, we have designed and built several prototypes of linear and circular microphone arrays. The gained experience and designed algorithms were used in several Microsoft products: RoundTable device, microphone array support in Windows, Kinect for Xbox, and Microsoft HoloLens.
Loudspeaker arrays
In a similar way, multiple loudspeakers can be used to form a beam of sound toward a certain direction, reaching one listener and reducing the signal level for the others. We can even create multiple beams simultaneously, sending different audio streams to different listeners in the same room. We have built several prototypes of linear loudspeaker arrays targeting, in the main, media room scenarios. One of the latest developments in this area is cross-talk cancellation algorithms. This technology sends two different signals separately to each ear of the listener, such that a loudspeaker array can act as virtual headphones. By combining cross-talk cancellation with spatial audio reproduction and tracking the position and orientation of the user’s head, sound sources behind, above, and below the listener can be reproduced virtually.
Ultrasound probing devices
Similar to how bats and dolphins use echolocation, we are researching the use of beamforming in the ultrasound band to construct images of objects. We can focus the sound by using a loudspeaker array facing a given direction, listen with the microphone array towards the same direction, and capture the reflections from objects in this direction. By scanning the space, we can construct an image of the objects in front of this ultrasound sensing device. The short wavelength of ultrasound allows detection of even small objects. This low-energy-using ultrasound probing device can generate three types of images: reflection, distance, and Doppler, making it possible to use in body position retrieval or gesture recognition applications.
Devices for augmented and virtual reality (AR/VR)
With recent advancements in augmented and virtual reality, it becomes more important to capture audio as a three-dimensional sound field. For these scenarios, we have built a variety of cylindrical and spherical microphone arrays, with 16- or 64-microphone elements. By processing these signals, we can achieve device-independent representation of the sound field. This representation allows further rendering of the audio on an arbitrary system of loudspeakers or on headphones, using spatial audio. In some cases, our spherical microphone arrays have even integrated wide-angle cameras, which converts them into a device that can broadcast AR/VR audio and video in real time.