{"id":573651,"date":"2019-03-20T07:58:36","date_gmt":"2019-03-20T14:58:36","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=573651"},"modified":"2020-04-23T15:02:40","modified_gmt":"2020-04-23T22:02:40","slug":"project-triton-and-the-physics-of-sound-with-dr-nikunj-raghuvanshi","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/podcast\/project-triton-and-the-physics-of-sound-with-dr-nikunj-raghuvanshi\/","title":{"rendered":"Project Triton and the physics of sound with Dr. Nikunj Raghuvanshi"},"content":{"rendered":"
If you\u2019ve ever played video games, you know that for the most part, they look a lot better than they sound. That\u2019s largely due to the fact that audible sound waves are much longer \u2013 and a lot more crafty \u2013 than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher<\/a> in the Interactive Media Group at Microsoft Research<\/a>, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels \u2013 in rooms, around corners, behind walls, out doors \u2013 and he\u2019s using computational physics to do it.<\/p>\n Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War<\/em>, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton<\/a>, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes.<\/p>\n Nikunj Raghuvanshi: In a game scene, you will have multiple rooms, you\u2019ll have caves, you\u2019ll have courtyards, you\u2019ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it\u2019s making its way around geometry. And the question becomes how do you compute even the direct sound? Even the initial sound\u2019s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly.<\/p>\n Host: You\u2019re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I\u2019m your host, Gretchen Huizinga.<\/strong><\/p>\n Host: If you\u2019ve ever played video games, you know that for the most part, they look a lot better than they sound. That\u2019s largely due to the fact that audible sound waves are much longer \u2013 and a lot more crafty \u2013 than visual light waves, and therefore, much more difficult to replicate in simulated environments. But Dr. Nikunj Raghuvanshi, a Senior Researcher in the Interactive Media Group at Microsoft Research, is working to change that by bringing the quality of game audio up to speed with the quality of game video. He wants you to hear how sound really travels \u2013 in rooms, around corners, behind walls, out doors \u2013 and he\u2019s using computational physics to do it.<\/strong><\/p>\n Today, Dr. Raghuvanshi talks about the unique challenges of simulating realistic sound on a budget (both money and CPU), explains how classic ideas in concert hall acoustics need a fresh take for complex games like Gears of War<\/em>, reveals the computational secret sauce you need to deliver the right sound at the right time, and tells us about Project Triton, an acoustic system that models how real sound waves behave in 3-D game environments to makes us believe with our ears as well as our eyes. That and much more on this episode of the Microsoft Research Podcast.<\/strong><\/p>\n Host: Nikunj Raghuvanshi, welcome to the podcast.<\/strong><\/p>\n Nikunj Raghuvanshi: I\u2019m glad to be here!<\/p>\n Host: You are a senior researcher in MSR\u2019s Interactive Media Group, and you situate your research at the intersection of computational acoustics and graphics. Specifically, you call it \u201cfast computational physics for interactive audio\/visual applications.\u201d<\/strong><\/p>\n Nikunj Raghuvanshi: Yep, that\u2019s a mouthful, right?<\/p>\n Host: It is a mouthful. So, unpack that! How would you describe what you do and why you do it? What gets you up in the morning?<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah, so my passion is physics. I really like the mixture of computers and physics. So, the way I got into this was, many, many years ago, I picked up this book on C++ and it was describing graphics and stuff. And I didn\u2019t understand half of it, and there was a color plate in there. It took me two days to realize that those are not photographs, they were generated by a machine, and I was like, somebody took a photo of a world that doesn\u2019t exist. So, that is what excites me. I was like, this is amazing. This is as close to magic as you can get. And then the idea was I used to build these little simulations and I was like the exciting thing is you just code up these laws of physics into a machine and you see all this behavior emerge out of it. And you didn\u2019t tell the world to do this or that. It\u2019s just basic Newtonian physics. So, that is computational physics. And when you try to do this for games, the challenge is you have to be super-fast. You have 1\/60th of a second to render the next frame to produce the next buffer of audio. Right? So, that\u2019s the fast portion. How do you take all these laws and compute the results fast enough that it can happen at 1\/60th of a second, repeatedly? So, that\u2019s where the computer science enters the physics part of it. So, that\u2019s the sort of mixture of things where I like to work in.<\/p>\n Host: You\u2019ve said that light and sound, or video and audio, work together to make gaming, augmented reality, virtual reality, believable. Why are the visual components so much more advanced than the audio? Is it because the audio is the poor relation in this equation, or is it that much harder to do?<\/strong><\/p>\n Nikunj Raghuvanshi: It is kind of both. Humans are visual dominant creatures, right? Because visuals are what is on our conscious mind and when you describe the world, our language is so visual, right? Even for sound, sometimes we use visual metaphors to describe things. So, that is part of it. And part of it is also that for sound, the physics is in many ways tougher because you have much longer wavelengths and you need to model wave diffraction, wave scattering and all these things to produce a believable simulation. And so, that is the physical aspect of it. And also, there\u2019s a perceptual aspect. Our brain has evolved in a world where both audio\/visual cues exist, and our brain is very clever. It goes for the physical aspects of both that give us separate information, unique information. So, visuals give you line-of-sight, high resolution, right? But audio is lower resolution directionally, but it goes around corners. It goes around rooms. That\u2019s why if you put on your headphones and just listen to music at the loud volume, you are a danger to everybody on the street because you have no awareness.<\/p>\n Host: Right.<\/strong><\/p>\n Nikunj Raghuvanshi: So, audio is the awareness part of it.<\/p>\n Host: That is fascinating because you\u2019re right. What you can see is what is in front of you, but you could hear things that aren\u2019t in front of you.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah.<\/p>\n Host: You can\u2019t see behind you, but you can hear behind you.<\/strong><\/p>\n Nikunj Raghuvanshi: Absolutely, you can hear behind yourself and you can hear around stuff, around corners. You can hear stuff you don\u2019t see, and that\u2019s important for anticipating stuff.<\/p>\n Host: Right.<\/strong><\/p>\n Nikunj Raghuvanshi: People coming towards you and things like that.<\/p>\n Host: So, there\u2019s all kinds of people here that are working on 3D sound and head-related transfer functions and all that.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah, Ivan\u2019s group.<\/p>\n Host: Yeah! How is your work interacting with that?<\/strong><\/p>\n Nikunj Raghuvanshi: So, that work is about, if I tell you the spatial sound field around your head, how does it translate into a personal experience in your two ears? So, the HRTF modeling is about that aspect. My work with John Snyder is about, how does the sound propagate in the world, right?<\/p>\n Host: Interesting.<\/strong><\/p>\n Nikunj Raghuvanshi: So, if there is a sound down a hallway, what happens during the time it gets from there up to your head? That\u2019s our work.<\/p>\n Host: I want you to give us a snapshot of the current state-of-the-art in computational acoustics and there\u2019s apparently two main approaches in the field. What are they, and what\u2019s the case for each and where do you land in this spectrum?<\/strong><\/p>\n Nikunj Raghuvanshi: So, there\u2019s a lot of work in room acoustics where people are thinking about, okay, what makes a concert hall sound great? Can you simulate a concert hall before you build it, so you know how it\u2019s going to sound? And, based on the constraints on those areas, people have used a lot of ray tracing approaches which borrow on a lot of literature in graphics. And for graphics, ray tracing is the main technique, and it works really well, because the idea is you\u2019re using a short wavelength approximation. So, light wavelengths are submicron and if they hit something, they get blocked. But the analogy I like to use is sound is very different, the wavelengths are much bigger. So, you can hold your thumb out in front of you and blot out the sun, but you are going to have a hard time blocking out the sound of thunder with a thumb held out in front of your ear because the waves will just wrap around. And, that\u2019s what motivates our approach which is to actually go back to the physical laws and say, instead of doing the short wave length approximation for sound, we revisit and say, maybe sounds needs the more fundamental wave equation to be solved, to actually model these diffraction effects for us. The usual thinking is that, you know, in games, you are thinking about we want a certain set of perceptual cues. We want walls to occlude sound, we want a small room to reverberate less. We want a large hall to reverberate more. And the thought is, why are we solving this expensive partial differential equation again? Can\u2019t we just find some shortcut to jump straight to the answer instead of going through this long-winded route of physics? And our answer has been that you really have to do all the hard work because there\u2019s a ton of information that\u2019s folded in and what seems easy to us as humans isn\u2019t quite so easy for a computer and and there\u2019s no neat trick to get you straight to the perceptual answer you care about.<\/p>\n (music plays)<\/p>\n Host: Much of the work in audio and acoustic research is focused on indoor sound where the sound source is within the line of sight and the audience and the listener can see what they were listening to\u2026<\/strong><\/p>\n Nikunj Raghuvanshi: Um-hum.<\/p>\n Host: \u2026and you mentioned that the concert hall has a rich literature in this field. So, what\u2019s the gap in the literature when we move from the concert hall to the computer, specifically in virtual environments?<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah, so games and virtual reality, the key demand they have is the scene is not one room, and with time it has become much more difficult. So, a concert hall is terrible if you can\u2019t see the people who are playing the sound, right? So, it allows for a certain set of assumptions that work extremely nicely. The direct sound, which is the initial sound, which is perceptually very critical, just goes in a straight line from source to listener. You know the distance so you can just use a simple formula and you know exactly how loud the initial sound is at the person. But in a game scene, you will have multiple rooms, you\u2019ll have caves, you\u2019ll have courtyards, you\u2019ll have all sorts of complex geometry and then people love to blow off roofs and poke holes into geometry all over the place. And within that, now sound is streaming all around the space and it\u2019s making its way around geometry. And the question becomes, how do you compute even the direct sound? Even the initial sound\u2019s loudness and direction, which are important? How do you find those? Quickly? Because you are on the clock and you have like 60, 100 sources moving around, and you have to compute all of that very quickly. So, that\u2019s the challenge.<\/p>\n Host: All right. So, let\u2019s talk about how you\u2019re addressing it. A recent paper that you\u2019ve published made some waves, sound waves probably. No pun intended\u2026 It\u2019s called Parametric Directional Coding for Pre-computed Sound Propagation. Another mouthful. But it\u2019s a great paper and the technology is so cool. Talk about this\u2026 research this that you\u2019re doing.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah. So, our main idea is, actually, to look at the literature in lighting again and see the kind of path they\u2019d followed to kind of deliver this computational challenge of how you do these extensive simulations and still hit that stringent CPU budget in real time. And one of the key ideas is you precompute. You cheat. You just look at the scene and just compute everything you need to compute beforehand, right? Instead of trying to do it on the fly during the game. So, it does introduce the limitation that the scene has to be static. But then you can do these very nice physical computations and you can ensure that the whole thing is robust, it is accurate, it doesn\u2019t suffer from all the sort of corner cases that approximations tend to suffer from, and you have your result. You basically have a giant look-up table. If somebody tells you that the source is over there and the listener is over here, tell me what the loudness of the sound would be. We just say okay, we this a giant table, we\u2019ll just go look it up for you. And that is the main way we bring the CPU usage into control. But it generates a knock-off challenge that now we have this huge table, there\u2019s this huge amount of data that we\u2019ve stored and it\u2019s 6-dimensional. The source can move in 3-dimensions and the listener can move in 3-dimensions. So, we have the giant table which is terabytes or even more on data.<\/p>\n Host: Yeah.<\/strong><\/p>\n Nikunj Raghuvanshi: And the game\u2019s typical budget is like 100 megabytes. So, the key challenge we\u2019re facing is, how do we fit everything in that? How do we take this data and extract out something salient that people listen to and use that? So, you start with full computation, you start as close to nature as possible and then we\u2019re saying okay, now what would a person hear out of this? Right? Now, let\u2019s do that activity of, instead of doing a shortcut, now let\u2019s think about okay, a person hears the directional sound comes from. If there is a doorway, the sound should come from the doorway. So, we pick out these perceptual parameters that are salient for human perception and then we store those. That\u2019s the crucial way you kind of bring down this enormous data set and do a sort of memory budget that\u2019s feasible.<\/p>\n Host: So, that\u2019s the paper.<\/strong><\/p>\n Nikunj Raghuvanshi: Um-hum.<\/p>\n Host: And how has it played out in practice, or in project, as it were?<\/strong><\/p>\n Nikunj Raghuvanshi: So, a little bit of history on this is, we had a paper SIGGRAPH 2010, me and John Snyder and some academic collaborators, and at that point, we were trying to think of just physical accuracy. So, we took the physical data and we were trying to stay as close to physical reality as possible and we were rendering that. And around 2012, we got to talking with Gears of War<\/em>, the studio, and we were going through what the budgets will be, how things would be. And we were like we need\u2026 this needs to\u2026 this is gigabytes, it needs to go to megabytes\u2026<\/p>\n Host: Really?<\/strong><\/p>\n Nikunj Raghuvanshi: \u2026very quickly. And that\u2019s when we were like, okay, let\u2019s simplify. Like, what\u2019s the four like most basic things that you really want from an acoustic system? Like walls should occlude sound and thing like that. So, we kind of re-winded and came to it from this perceptual viewpoint that I was just describing. Let\u2019s keep only what\u2019s necessary. And that\u2019s how we were able to ship this in 2016 in Gears of War<\/em> 4 by just re-winding and doing this process.<\/p>\n Host: How is that playing in to, you know\u2026 Project Triton is the big project that we\u2019re talking about. How would you describe what that\u2019s about and where it\u2019s going? Is it everything you\u2019ve just described or is there\u2026 other aspects to it?<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah. Project Triton is this idea that you should precompute the wave physics, instead of starting with approximations. Approximate later. That\u2019s one idea of Project Triton. And the second is, if you want to make it feasible for real games and real virtual reality and augmented reality, switch to perceptual parameters. Extract that out of this physical simulation and then you have something feasible. And the path we are on now, which brings me back to the recent paper you mentioned\u2026<\/p>\n Host: Right.<\/strong><\/p>\n Nikunj Raghuvanshi: \u2026is, in Gears of War<\/em>, we shipped some set of parameters. We were like, these make a big difference. But one thing we lacked was if the sound is, say, in a different room and you are separated by a doorway, you would hear the right loudness of the sound, but its direction would be wrong. Its direction would be straight through the wall, going from source to listener.<\/p>\n Host: Interesting.<\/strong><\/p>\n Nikunj Raghuvanshi: And that\u2019s an important spatial cue. It helps you orient yourself when sounds funnel through doorways.<\/p>\n Host: Right.<\/strong><\/p>\n Nikunj Raghuvanshi: Right? And it\u2019s a cue that sound designers really look for and try to hand-tune to get good ambiances going. So, in the recent 2018 paper, that\u2019s what we fixed. We call this portaling. It\u2019s a made-up word for this effect of sounds going around doorways, but that\u2019s what we\u2019re modeling now.<\/p>\n Host: Is this new stuff? I mean, people have tackled these problems for a long time.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah.<\/p>\n Host: Are you people the first ones to come up with this, the portaling and\u2026?<\/strong><\/p>\n Nikunj Raghuvanshi: I mean, the basic ideas have been around. People know that, perceptually, this is important, and there are approaches to try to tackle this, but I\u2019d say, because we\u2019re using wave physics, this problem becomes much easier because you just have the waves diffract around the edge. With ray tracing you face the difficult problem that you have to trace out the rays \u201cintelligently\u201d somehow to hit an edge, which is like hitting a bullseye, right?<\/p>\n Host: Right.<\/strong><\/p>\n Nikunj Raghuvanshi: So, the ray can wrap around the edge. So, it becomes really difficult. Most practical ray tracing systems don\u2019t try to deal with this edge diffraction effect because of that. Although there are academic approaches to it, in practice it becomes difficult. But as I worked on this over the years, I\u2019ve kind of realized, these are the real advantages of this. Disadvantages are pretty clear: it\u2019s slow, right? So, you have to precompute. But we\u2019re realizing, over time, that going to physics has these advantages.<\/p>\n Host: Well, but the precompute part is innovative in terms of a thought process on how you would accomplish the speed-up\u2026<\/strong><\/p>\n Nikunj Raghuvanshi: There have been papers on precomputed acoustics, academically before, but this realization that mixing precomputation and extracting these perceptual parameters? That is a recipe that makes a lot of practical sense. Because a third thing that I haven\u2019t mentioned yet is going to the perceptual domain, now the sound designer can make sense of the numbers coming out of this whole system. Because it\u2019s loudness. It\u2019s reverberation time, how long the sound is reverberating. And these numbers that are super-intuitive to sound designers, they already deal with them. So, now what you are telling them is, hey, you used to start with a blank world, which had nothing, right? Like the world before the act of creation, there\u2019s nothing. It\u2019s just empty space and you are trying to make things reverberate this way or that, now you don\u2019t need to do that. Now physics will execute first ,on the actual scene with the actual materials, and then you can say I don\u2019t like what physics did here or there, let me tweak it, let me modify what the real result is and make it meet the artistic goals I have for my game.<\/p>\n (music plays)<\/p>\n Host: We\u2019ve talked about indoor audio modeling, but let\u2019s talk about the outdoors for now and the computational challenges to making natural outdoor sounds, sound convincing.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah.<\/p>\n Host: How have people hacked it in the past and how does your work in ambient sound propagation move us forward here?<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah, we\u2019ve hacked it in the past! Okay. This is something we realized on Gears of War<\/em> because the parameters we use were borrowed, again, from the concert hall literature and, because they\u2019re parameters informed by concert halls, things sound like halls and rooms. Back in the days of Doom, this tech would have been great because it was all indoors and rooms, but in Gears of War<\/em>, we have these open spaces and it doesn\u2019t sound quite right. Outdoors sounds like a huge hall and you know, how do we do wind ambiances and rain that\u2019s outdoors? And so, we came up with a solution for them at that time which we called \u201coutdoorness.\u201d It\u2019s, again, an invented word.<\/p>\n Host: Outdoorness.<\/strong><\/p>\n Nikunj Raghuvanshi: Outdoorness.<\/p>\n Host: I\u2019m going to use that. I like it.<\/strong><\/p>\n Nikunj Raghuvanshi: Because the idea it\u2019s trying to convey is, it\u2019s not a binary indoor\/outdoor. When you are crossing a doorway or a threshold, you expect a smooth transition. You expect that, I\u2019m not hearing rain inside, I\u2019m feeling nice and dry and comfortable and now I\u2019m walking into the rain\u2026<\/p>\n Host: Yeah.<\/strong><\/p>\n Nikunj Raghuvanshi: \u2026and you want the smooth transition on it. So, we built a sort of custom tech to do that outdoor transition. But it got us thinking about, what\u2019s the right way to do this? How do you produce the right sort of spatial impression of, there\u2019s rain outside, it\u2019s coming through a doorway, the doorway is to my left, and as you walk, it spreads all around you. You are standing in the middle of rain now and it\u2019s all around you. So, we wanted to create that experience. So, the ambient sound propagation work was an intern project and now we finished it up with our collaborators in Cornell. And that was about, how do you model extended sound sources? So, again, going back to concert halls, usually people have dealt with point-like sources which might have a directivity pattern. But rain is like a million little drops. If you try to model each and every drop, that\u2019s not going to get you anywhere. So, that\u2019s what the paper is about, how to treat it as one aggregate that somebody gave us? And we produce an aggregate sort of energy distribution of that thing along with this directional characteristics and just encode that.<\/p>\n Host: And just encode it.<\/strong><\/p>\n Nikunj Raghuvanshi: And just encode it.<\/p>\n Host: How is it working?<\/strong><\/p>\n Nikunj Raghuvanshi: It works nice. It sounds good. To my ears it sounds great.<\/p>\n Host: Well you know, and you\u2019re the picky one, I would imagine.<\/strong><\/p>\n Nikunj Raghuvanshi: Yeah. I\u2019m the picky one and also when you are doing iterations for a paper, you also completely lose objectivity at some point. So, you\u2019re always looking for others to get some feedback.<\/p>\n Host: Here, listen to this.<\/strong><\/p>\n Nikunj Raghuvanshi: Well, reviewers give their feedback, so, yeah.<\/p>\n Host: Sure. Okay. Well, kind of riffing on that, there\u2019s another project going on that I\u2019d love for you to talk as much as you can about called Project Acoustics<\/a> and kind of the future of where we\u2019re going with this. Talk about that.<\/strong><\/p>\n Nikunj Raghuvanshi: That\u2019s really exciting. So, up to now, Project Triton was an internal tech which we managed to propagate from research into actual Microsoft product, internally.<\/p>\n<\/a><\/p>\n
Episode 68, March 20, 2019<\/h3>\n
Related:<\/h3>\n
\n
\nFinal Transcript<\/h3>\n