Ideas: Solving network management puzzles with Behnaz Arzani

已发布 2024年6月13日

作者 Gretchen Huizinga , Executive Producer and Host of the Microsoft Research Podcast Behnaz Arzani , Principal Researcher

分享这个页面

Microsoft Research Podcast | Ideas | Behnaz Arzani

Behind every emerging technology is a great idea propelling it forward. In the new Microsoft Research Podcast series, Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Principal Researcher Behnaz Arzani. Arzani has always been attracted to hard problems, and there’s no shortage of them in her field of choice—network management—where her contributions to heuristic analysis and incident diagnostics are helping the networks people use today run more smoothly. But the criteria she uses to determine whether a challenge deserves her time has evolved. These days, a problem must appeal across several dimensions: Does it answer a hard technical question? Would the solution be useful to people? And … would she enjoy solving it?

Learn more:

Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
Publication, February 2024
Finding Adversarial Inputs for Heuristics using Multi-level Optimization
Publication, February 2024
MetaOpt: Examining, explaining, and improving heuristic performance
Microsoft Research blog, January 2024
A Holistic View of AI-driven Network Incident Management
Publication, October 2023
Behnaz Arzani: Painting, storytelling, and other hobbies
Microsoft Research bio page

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

BEHNAZ ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Dr. Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest today is Behnaz Arzani. Behnaz is a principal researcher at Microsoft Research, and she’s passionate about the systems and networks that provide the backbone to nearly all our technologies today. Like many in her field, you may not know her, but you know her work: when your networks function flawlessly, you can thank people like Behnaz Arzani. Behnaz, it’s been a while. I am so excited to catch up with you today. Welcome to Ideas!

BEHNAZ ARZANI: Thank you. And I’m also excited to be here.

HUIZINGA: So since the show is about ideas and leans more philosophical, I like to start with a little personal story and try to tease out anything that might have been an inflection point in your life, a sort of aha moment, or a pivotal event, or an animating “what if,” we could call it. What captured your imagination and got you inspired to do what you’re doing today?

ARZANI: I think that it was a little bit of an accident and a little bit of just chance, I guess, but for me, this happened because I don’t like being told what to do! [LAUGHTER] I really hate being told what to do. And so, I got into research by accident, mostly because it felt like a job where that wouldn’t happen. I could pick what I wanted to do. So, you know, a lot of people come talking about how they were the most curious kids and they all—I wasn’t that. I was a nerd, but I wasn’t the most curious kid. But then I found that I’m attracted to puzzles and hard puzzles and things that I don’t know how to answer, and so that gravitated me more towards what I’m doing today. Things that are basically difficult to solve … I think are difficult to solve.

HUIZINGA: So that’s your inspiring moment? “I’m a bit of a rebel, and …”

ARZANI: Yup!

HUIZINGA: … I like puzzles … ”?

ARZANI: Yup! [LAUGHTER] Which is not really a moment. Yeah, I can’t point to a moment. It’s just been a journey, and it’s just, like, been something that has gradually happened to me, and I love where I am …

HUIZINGA: Yeah …

ARZANI: … but I can’t really pinpoint to like this, like this inspiring awe-drop—no.

HUIZINGA: OK. So let me ask you this: is there nobody in this building that tells you what to do? [LAUGHS]

ARZANI: There are people who have tried, [LAUGHS] but …

HUIZINGA: Oh my gosh!

ARZANI: No, it doesn’t work. And I think if you ask them, they will tell you it hasn’t worked.

HUIZINGA: OK. The other side question is, have you encountered a puzzle that has confounded you?

ARZANI: Have I encountered a puzzle? Yes. Incident management. [LAUGHTER]

HUIZINGA: And we’ll get there in the next couple of questions. Before we do, though, I want to know about who might have influenced you earlier. I mean, it’s interesting. Usually if you don’t have a what, there might not be a who attached to it …

ARZANI: No. But I have a who. I have multiple “whos” actually.

HUIZINGA: OK! Wonderful. So tell us a little bit about the influential people in your life.

ARZANI: I think the first and foremost is my mom. I have a necklace I’m holding right now. This is something my dad gave my mom on their wedding day. On one side of it is a picture of my mom and dad; on the other side is both their names on it. And I have it on every day. To my mom’s chagrin. [LAUGHTER] She is like, why? But it’s, like, it helps me stay grounded. And my mom is a person that … she had me while she was an undergrad. She got her master’s. She got into three different PhD programs in her lifetime. Every time, she gave it up for my sake and for my brother’s sake. But she’s a woman that taught me you can do anything you set your mind to and that you should always be eager to learn. She was a chemistry teacher, and even though she was a chemistry teacher, she kept reading new books. She came to the US to visit me in 2017, went to a Philadelphia high school, and asked, can I see your chemistry books? I want to see what you’re teaching your kids. [LAUGHTER] So that’s how dedicated she is to what she does. She loves what she does. And I could see it on her face on a daily basis. And at some point in my life a couple of years ago, I was talking to my mom about something, and she said, tell yourself, “I’m stronger than my mom.”

HUIZINGA: Oh my gosh.

ARZANI: And that has been, like, the most amazing thing to have in the back of my head because I view my mom as one of the strongest people I’ve ever met, and she’s my inspiration for everything I do.

HUIZINGA: Tell yourself you’re stronger than your mom. … Did you?

ARZANI: I’m not stronger than my mom, I don’t think … [LAUGHS]

HUIZINGA: [LAUGHS] You got to change that narrative!

ARZANI: But, yes, I think it’s just this thing of, like, “What would Mom do?” is a great thing to ask yourself, I think.

HUIZINGA: I love that. Well, and so I would imagine, though, that post-, you know, getting out of the house, you’ve had instructors, you’ve had professors, you’ve had other researchers. I mean, anyone else that’s … ?

ARZANI: Many! And in different stages of your life, different people step into that role, I feel like. One of the first people for me was Jen Rexford, and she is just an amazing human being. She’s an amazing researcher, hands down. Her work is awesome, but also, she’s an amazing human being, as well. And that just makes it better.

HUIZINGA: Yeah.

ARZANI: And then another person is Mohammad Alizadeh, who’s at MIT. And actually, let’s see, I’m going to keep going …

HUIZINGA: Good.

ARZANI: … a little with people—Mark Handley. When I was a PhD student, I would read their papers, and I’d be like, wow! And, I want to be like you!

HUIZINGA: So linking that back to your love of puzzles, were these people that you admired good problem solvers or … ?

ARZANI: Oh, yeah! I think Jen is one of those who … a lot of her work is also practical, like, you know, straddles a line between both solving the puzzle and being practical and being creative and working with theorists and working with PL people. So she’s also collaborative, which is, kind of, my style of work, as well. Mohammad is more of a theorist, and I love … like more the theoretical aspect of problems that I solve. And so, like, just the fact that he was able to look at those problems and thinks about those problems in those ways. And then Mark Handley’s intuition about problems—yeah, I can’t even speak to that!

HUIZINGA: That’s so fascinating because you’ve identified three really key things for a researcher. And each one is embodied in a person. I love that. And because I know who you are, I know we’re going to get to each of those things probably in the course of all these questions that I’ll ask you. [LAUGHTER] So we just spent a little time talking about what got you here and who influenced you along the way. But your life isn’t static. And at each stage of accomplishment, you get a chance to reflect and, sort of, think about what you got right, what you got wrong, and where you want to go next. So I wonder if you could take a minute to talk about the evolution of your values as a researcher, collaborator, and colleague and then a sort of “how it started/how it’s going” thing.

ARZANI: Hmm … For me, I think what I’ve learned is to be more mindful—about all of it. But I think if I talk about the evolution, when you’re a PhD student, especially if you’re a PhD student from a place that’s not MIT, that’s not Berkeley, which is where I was from,[1] my main focus was proving myself. I mean, for women, always, we have to prove ourselves. But, like, I think if you’re not from one of those schools, it’s even more so. At least that’s how I felt. That might not be the reality, but that’s how you feel. And so you’re always running to show this about yourself. And so you don’t stop to think how you’re showing up as a person, as a researcher, as a collaborator. You’re not even, like, necessarily reflecting on, are these the problems that I enjoy solving? It’s more of, will solving this problem help me establish myself in this world that requires proving yourself and is so critical and all of that stuff? I think now I stop more. I think more, is this a problem that I would enjoy solving? I think that’s the most important thing. Would other people find it useful? Is it solving a hard technical question? And then, in collaborations, I’m being more mindful that I show up in a way that basically allows me to be a good person the way I want to be in my collaboration. So as researchers, we have to be critical because that’s how science evolves. Not all work is perfect. Not all ideas are the best ideas. That’s just fundamental truth. Because we iterate on each other’s ideas until we find the perfect solution to something. But you can do all of these things in a way that’s kind, in a way that’s mindful, in a way that respects other people and what they bring to the table. And I think what I’ve learned is to be more mindful about those things.

HUIZINGA: How would you define mindful? That’s an interesting word. It has a lot of baggage around it, you know, in terms of how people do mindfulness training. Is that what you’re talking about, or is it more, sort of, intentional?

ARZANI: I think it’s both. So I think one of the things I said—I think when I got into this booth even—was, I’m going to take a breath before I answer each question. And I think that’s part of it, is just taking a breath to make sure you’re present is part of it. But I think there is more to it than that, which is I don’t think we even think about it. I think if I … when you asked me about the evolution of how I evolved, I never thought about it.

HUIZINGA: No.

ARZANI: I was just, like, running to get things done, running to solve the question, running to, you know, find the next big thing, and then you’re not paying attention to how you’re impacting the world in the process.

HUIZINGA: Right.

ARZANI: And once you start paying attention, then you’re like, oh, I could do this better. I can do that better. If I say this to this person in that way, that allows them to do so much more, that encourages them to do so much more.

HUIZINGA: Yeah, yeah.

ARZANI: So …

HUIZINGA: You know, when you started out, you said, is this a problem I would enjoy solving? And then you said, is this a problem that somebody else needs to have solved? Which is sort of like “do I like it?”—it goes back to Behnaz at the beginning: don’t tell me what to do; I want to do what I want to do. Versus—or and is this useful to the world? And I feel like those two threads are really key to you.

ARZANI: Yes. Basically, I feel like that defines me as a researcher, pretty much. [LAUGHS] Which is, you know, I was one of the, you know, early people … I wouldn’t say first. I’m not the first, I don’t think, but I was one of the early people who was talking about using machine learning in networking. And after a while, I stopped because I wasn’t finding it fun anymore, even though there was so much hype about, you know, let’s do machine learning in networking. And it’s not because there’s not a lot of technical stuff left to do. You can do a lot of other things there. There’s room to innovate. It’s just that I got bored.

HUIZINGA: I was just going to say, it’s still cool, but Behnaz is bored! [LAUGHTER] OK, well, let’s start to talk a little bit about some of the things that you’re doing. And I like this idea of a researcher, even a person, having a North Star goal. It sounds like you’ve got them in a lot of areas of your life, and you’ve said your North Star goal, your research goal, is to make the life of a network operator as painless as possible. So I want to know who this person is. Walk us through a day in the life of a network operator and tell us what prompted you to want to help them.

ARZANI: OK, so it’s been years since I actually, like, sat right next to one of them for a long extended period of time because now we’re in different buildings, but back when I was an intern, I was actually, like, kind of, like right in the middle of a bunch of, you know, actual network operators. And what I observed … and see, this was not, like, I’ve never lived that experience, so I’m talking about somebody else’s experience, so bear that in mind …

HUIZINGA: Sure, but at least you saw it …

ARZANI: Yeah. What they do is, there’s a lot of, “OK, we design the network, configure it.” A lot of it goes into building new systems to manage it. Building new systems to basically make it better, more efficient, all of that. And then they also have to be on call so that when any of those things break, they’re the ones who have to look at their monitoring systems and figure out what happened and try to fix it. So they do all of this in their day-to-day lives.

HUIZINGA: That’s tough …

ARZANI: Yeah.

HUIZINGA: OK. So I know you have a story about what prompted you, at the very beginning, to want to help this person. And it had some personal implications. [LAUGHS]

ARZANI: Yeah! So my internship mentor, who’s an amazing person, I thought—and this is, again, my perception as an intern—the day after he was on call, he was so tired, I felt. And so grumpy … grumpier than normal! [LAUGHTER] And, like, my main motivation initially for working in this space was just, like, make his life better!

HUIZINGA: Make him not grumpy.

ARZANI: Yeah. Pretty much. [LAUGHS]

HUIZINGA: Did you have success at that point in your life? Or was this just, like, setting a North Star goal that I’m going to go for that?

ARZANI: I mean, I had done a lot of work in monitoring space, but back then—again, going back to the talk we were having about how to be mindful about problems you pick—back then it was just like, oh, this was a problem to solve, and we’ll go solve it, and then what’s the next thing? So there was not an overarching vision, if you will. It was just, like, going after the next, after the next. I think that’s a point where, like, it all came together of like, oh, all of the stuff that I’m doing can help me achieve this bigger thing.

HUIZINGA: Right. OK, Behnaz, I want to drop anchor, to use a seafaring analogy, for a second and contextualize the language that these operators use. Give us a “networking for neophytes” overview of the tools they rely on and the terminology they use in their day-to-day work so we’re not lost when we start to unpack the problems, projects, and papers that are central to your work.

ARZANI: OK. So I’m going to focus on my pieces of this just because of the context of this question. But a lot of operators … just because a lot of the problems that we work on these days to be able to manage our network, the optimal form of these problems tend to be really, really hard. So a lot of the times, we use algorithms and solutions that are approximate forms of those optimal solutions in order to just solve those problems faster. And a lot of these heuristics, some of them focus on our wide area network, which we call a WAN. Our WANs, basically what they do is they move traffic between datacenters in a way that basically fits the capacity of our network. And, yeah, I think for my work, my current work, to understand it, that’s, I think, enough networking terminology.

HUIZINGA: OK. Well, so you’ve used the term heuristic and optimal. Not with an “s” on the end of it. Or you do say “optimals,” but it’s a noun …

ARZANI: Well, so for each problem definition, usually, there’s one way to formulate an optimal solution. There might be multiple optima that you find, but the algorithm that finds the optimum usually is one. But there might be many, I guess. The ones that I’ve worked on generally have been one.

HUIZINGA: Yeah, yeah. And so in terms of how things work on a network, can you give us just a little picture of how something moves from A to B that might be a problem?

ARZANI: So, for example, we have these datacenters that generate terabytes of traffic and—terabytes per second of traffic—that wants to move from point A to point B, right. And we only have finite network capacity, and these, what we call, “demands” between these datacenters—and you didn’t see me do the air quotes, but I did the air quotes—so they go from point A to point B, and so in order to fit this demand in the pipes that we have—and these pipes are basically links in our network—we have to figure out how to send them. And there’s variations in them. So, like, it might be the case that at a certain time of the day, East US would want to send more traffic to West US, and then suddenly, it flips. And that’s why we solve this problem every five minutes! Now assume one of these links suddenly goes down. What do I do? I have to resolve this problem because maybe the path that I initially picked for traffic to go through goes exactly through that failed link. And now that it’s disappeared, all of that traffic is going to fall on the floor. So I have to re-solve that problem really quickly to be able to re-move my traffic and move it to somewhere else so that I can still route it and my customers aren’t impacted. What we’re talking about here is a controller, essentially, that the network operators built. And this controller solves this optimization problem that figures out how traffic should move. When it’s failed, then the same controller kicks in and reroutes traffic. The people who built that controller are the network operators.

HUIZINGA: And so who does the problem-solving or the troubleshooting on the fly?

ARZANI: So hopefully—and this, most of the times, is the case—is we have monitoring systems in place that the operators have built that, like, kind of, signal to this controller that, oh, OK, this link is down; you need to do something.

[MUSIC BREAK]

HUIZINGA: Much of your recent work represents an effort to reify the idea of automated network management and to try to understand the performance of deployed algorithms. So talk about the main topics of interest here in this space and how your work has evolved in an era of generative AI and large language models.

ARZANI: So if you think about it, what generative AI is going to enable, and I’m using the term “going to enable” a little bit deliberately because I don’t think it has yet. We still have to build on top of what we have to get that to work. And maybe I’ll reconsider my stance on ML now that, you know, we have these tools. Haven’t yet but might. But essentially, what they enable us to do is take automated action on our networks. But if we’re allowing AI to do this, we need to be mindful of the risks because AI in my, at least in my head of how I view it, is a probabilistic machine, which, what that means is that there is some probability, maybe a teeny tiny probability, it might get things wrong. And the thing that you don’t want is when it gets things wrong, it gets things catastrophically wrong. And so you need to put guardrails in place, ensure safety, figure out, like, for each action be able to evaluate that action and the risks it imposes long term on your network and whether you’re able to tolerate that risk. And I think there is a whole room of innovation there to basically just figure out the interaction between the AI and the network and where … and actually strategic places to put AI, even.

HUIZINGA: Right.

ARZANI: The thing that for me has evolved is I used to think we just want to take the human out of the equation of network management. The way I think about it now is there is a place for the human in the network management operation because sometimes human has context and that context matters. And so I think what the, like, for example, we have this paper in HotNets 2023 where we talk about how to put an LLM in the incident management loop, and then there, we carefully talk about, OK, these are the places a human needs to be involved, at least given where LLMs are right now, to be able to ensure that everything happens in a safe way.

HUIZINGA: So go back to this “automated network management” thing. This sounds to me like you’re in a space where it could be, but it isn’t ready yet …

ARZANI: Yeah.

HUIZINGA: … and without, sort of, asking you to read a crystal ball about it, do you feel like this is something that could be eventually?

ARZANI: I hope so. This is the best thing about research. You get to be like, yeah!

HUIZINGA: Yeah, why not?

ARZANI: Why not? And, you know, maybe somebody will prove me wrong, but until they do, that’s what I’m working towards!

HUIZINGA: Well, right now it’s an animating “what if?”

ARZANI: Yeah.

HUIZINGA: Right?

ARZANI: Yeah.

HUIZINGA: This is a problem Behnaz is interested in right now. Let’s go!

ARZANI: Yeah. Pretty much. [LAUGHTER]

HUIZINGA: OK. Behnaz, the systems and networks that we’ve come to depend on are actually incredibly complex. But for most of us, most of the time, they just work. There’s only drama when they don’t work, right? But there’s a lot going on behind the scenes. So I want you to talk a little bit about how the cycle of configuring, managing, reconfiguring, etc., helps keep the drama at bay.

ARZANI: Well … you reminded me of something! So when I was preparing my job … I’m going to tell this story really, really quickly. But when I was preparing my job talk, somebody showed me a tweet. In 2014, I think, people started calling 911 when Facebook was down! Because of a networking problem! [LAUGHS] Yeah. So that’s a thing. But, yeah, so network availability matters, and we don’t notice it until it’s actually down. But that aside, back to your question. So I think what operators do is they build systems in a way that tries to avoid that drama as much as possible. So, for example, they try to build systems that these systems configure the network. And one of my dear friends, Ryan Beckett, works on intent-driven networking that essentially tries to ensure that what the operators intend with their configurations matches what they actually push into the network. They also monitor the network to ensure that as soon as something bad happens, automation gets notified. And there’s automation also that tries to fix these problems when they happen as much as possible. There’s a couple of problems that happen in the middle of this. One of them is our networks continuously change, and what we use in our networks changes. And there’s so many different pieces and components of this, and sometimes what happens is, for example, a team decides to switch from one protocol to a different protocol, and by doing that, it impacts another team’s systems and monitoring and what expectations they had for their systems, and then suddenly it causes things to go bad …

HUIZINGA: Right.

ARZANI: And they have to develop new solutions taking into account the changes that happened. And so one of the things that we need to account for in this whole process is how evolution is happening. And like evolution-friendly, I guess, systems, maybe, is how you should be calling it.

HUIZINGA: Right.

ARZANI: But that’s one. The other part of it that goes into play is, most of the time you expect a particular traffic characteristic, and then suddenly, you have one fluke event that, kind of, throws all of your assumptions out the window, so …

HUIZINGA: Right. So it’s a never-ending job …

ARZANI: Pretty much.

HUIZINGA: It’s about now that I ask all my guests what could possibly go wrong if, in fact, you got everything right. And so for you, I’d like to earth this question in the broader context of automation and the concerns inherent in designing machines to do our work for us. So at an earlier point in your career—we talked about this already—you said you believed you could automate everything. Cool. Now you’re not so much on that. Talk about what changed your thinking and how you’re thinking now.

ARZANI: OK, so the shallow answer to that question—there’s a shallow answer, and there’s a deeper answer—the shallow answer to that question is I watched way too many movies where robots took over the world. And honestly speaking, there’s a scenario that you can imagine where automation starts to get things wrong and then keeps getting things wrong, and wrong, not by the definition of automation. Maybe they’re doing things perfectly by the objectives and metrics that you used to design them …

HUIZINGA: Sure.

ARZANI: … but they’re screwing things up in terms of what you actually want them to do.

HUIZINGA: Interesting.

ARZANI: And if everything is automated and you don’t leave yourself an intervention plan, how are you going to take control back?

HUIZINGA: Right. So this goes back to the humans-in-the-loop/humans-out-of-the-loop. And if I remember in our last podcast, we were talking about humans out of the loop.

ARZANI: Yeah.

HUIZINGA: And you’ve already talked a bit about what the optimal place for a human to be is. Is the human always going to have to be in the loop, in your opinion?

ARZANI: I think it’s a scenario where you always give yourself a way to interrupt. Like, always put a back door somewhere. When we notice things go bad, we have a way that’s foolproof that allows us to shut everything down and take control back to ourselves. Maybe that’s where we go.

HUIZINGA: How do you approach the idea of corner cases?

ARZANI: That’s essentially what my research right now is, actually! And I love it, which is essentially figuring out, in a foolproof way, all the corner cases.

HUIZINGA: Yeah?

ARZANI: Can you build a tool that will tell you what the corner cases are? Now, granted, what we focus on is performance corner cases. Nikolaj Bjørner, in RiSE—so RiSE is Research in Software Engineering—is working on, how do you do verification corner cases? But all of them, kind of, have a hand-in-hand type of, you know, Holy Grail goal, which is …

HUIZINGA: Sure.

ARZANI: … how do you find all the corner cases?

HUIZINGA: Right. And that, kind of, is the essence of this “What could possibly go wrong?” question, is looking in every corner …

ARZANI: Correct.

HUIZINGA: … for anything that could go wrong. So many people in the research community have observed that the speed of innovation in generative AI has shrunk the traditional research-to-product timeline, and some people have even said everyone’s an applied researcher now. Or everyone’s a PM. [LAUGHS] Depends on who you are! But you have an interesting take on this Behnaz, and it reminds me of a line from the movie Nanny McPhee: “When you need me but do not want me, then I will stay. When you want me but no longer need me, I have to go.” So let’s talk a little bit about your perspective on this idea-to-ideation pipeline. How and where are researchers in your orbit operating these days, and how does that impact what we might call “planned obsolescence” in research?

ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about. We have amazingly, amazingly smart, competent people in our product teams. Some of them are actually researchers. So there’s, for example, the Azure systems research group that has a lot of people that are focused on problems in our production systems. And then you have equivalents of those spread out in the networking sphere, as well. And so a lot of complex problems that maybe like 10 years ago Microsoft Research would look at nowadays they can handle themselves. They don’t need us. And that’s part of what has allowed us to now go and be like, OK, I’m going to think about other things. Maybe things that, you know, aren’t relevant to you today, but maybe in five years, you’ll come in and thank me for thinking about this!

HUIZINGA: OK. Shifting gears here! In a recent conversation, I heard a colleague refer to you as an “idea machine.” To me, that’s one of the greatest compliments you could get. But it got me wondering, so I’ll ask you: how does your brain work, Behnaz, and how do you get ideas?

ARZANI: Well, this has been, to my chagrin, one of the realities of life about my brain apparently. So I never thought of this as a strength. I always thought about it as a weakness. But nowadays, I’m like, oh, OK, I’m just going to embrace this now! So I have a random brain. It’s completely ran—so, like, it actually happens, like, you’re talking, and then suddenly, I say something that seems to other people like it came out of left field. I know how I got there. It’s essentially kind of like a Markov chain. [LAUGHTER] So a Markov chain is essentially a number of states, and there’s a certain probability you can go from one state to the other state. And, actually, one of the things I found out about myself is I think through talking for this exact reason. Because people see this random Markov chain by what they say, and it suddenly goes into different places, and that’s how ideas come about. Most of my ideas have actually come through when I’ve been talking to someone.

HUIZINGA: Really?

ARZANI: Yeah.

HUIZINGA: Them talking or you talking?

ARZANI: Both.

HUIZINGA: Really?

ARZANI: So it’s, like, basically, I think the thing that has recently … like, I’ve just noticed more—again, being more mindful does that to you—it’s like I’m talking to someone. I’m like, I have an idea. And it’s usually they said something, or I was saying something that triggered that thought coming up. Which doesn’t happen when … I’m not one of those people that you can put in a room for three days—somebody actually once told me this— [LAUGHTER] like, I’m not one of those people you can put in a room for three days and I come out with these brilliant ideas. It’s like you put me in a room with five other people, then I come out with interesting ideas.

HUIZINGA: Right. … It’s the interaction.

ARZANI: Yeah.

HUIZINGA: I want to link this idea of the ideas that you get to the conversations you have and maybe go back to linking it to the work you’ve recently done. Talk about some of the projects, how they came from idea to paper to product even …

ARZANI: Mm-hm. So like one of the works that we were doing was this work on, like, max-min fair resource allocation that recently got published in NSDI and is actually in production. So the way that came out is I was working with a bunch of other researchers on risk estimation, actually, for incident management of all things, which was, how do you figure out if you want to mitigate a particular problem in a certain way, how much risk it induces as a problem. And so one of the people who was originally … one of the original researchers who built our wide-area traffic engineering controller, which we were talking about earlier, he said, “You’re solving the max-min fair problem.” We’re like, really? And then this caused a whole, like, one-year collaboration where we all sat and evolved this initial algorithm we had into a … So initially it was not a multipath problem. It had a lot of things that didn’t fully solve the problem of max-min fair resource allocation, but it evolved into that. Then we deployed it, and it improved the SWAN solver by a factor of three in terms of how fast it solved the problem and didn’t have any performance impact, or at least very little. And so, yeah, that’s how it got born.

HUIZINGA: OK. So for those of us who don’t know, what is max-min fair resource allocation, and why is it such a problem?

ARZANI: Well, so remember I said that in our wide area network, we route traffic from one place to the other in a way that meets capacity. So one of the objectives we try to meet is we try to be fair in a very specific metric. So max-min is just the metric of fairness we use. And that basically means you cannot improve what you allocated to one piece of traffic in a way that would hurt anybody who has gotten less. So there’s a little bit of a, like, … it’s a mind bend to wrap your head a little bit around the max-min fair definition. But the reason making it faster is important is if something fails, we need to quickly recompute what the paths are and how we route traffic. So the faster we can solve this problem, the better we can adapt to failures.

HUIZINGA: So talk a little bit about some of the work that started as an idea and you didn’t even maybe know that it was going to end up in production.

ARZANI: There was this person from Azure Networking came and gave a talk in our group. And he’s a person I’ve known for years, so I was like, hey, do you want to jump on a meeting and talk? So he came into that meeting, and I was like, OK, what are some of the things you’re curious about these days? You want to answer these days? And it was like, yeah, we have this heuristic we’re using in our traffic engineering solution, and essentially what it does is to make the optimization problem we solve smaller. If a piece of traffic is smaller than a particular, like, arbitrary threshold, we just send it on a shortest path and don’t worry about it. And then we optimize everything else. And I just want to know, like, what is the optimality gap of this heuristic? How bad can this heuristic be? And then I had worked on Stackelberg games before, in my PhD. It never went anywhere, but it was an idea I played around with, and it just immediately clicked in my head that this is the same problem. So Stackelberg games are a leader-follower game where in this scenario a leader has an objective function that they’re trying to maximize, and they control one or multiple of the inputs that their followers get to operate over. The followers, on the other hand, don’t get to control anything about this input. They have their own objective that they’re trying to maximize or minimize, but they have other variables in their control, as well. And what their objective is, is going to control the leader’s payoff. And so this game is happening where the leader has more control in this game because it’s, kind of, like the followers are operating in subject to whatever the leader says, right. But the leader is impacted by what the followers do. And so this dynamic is what they call a Stackelberg game. And the way we map the MetaOpt problem to this is the leader in our problem wants to maximize the difference between the optimal and the heuristic. It controls the inputs to both the optimal and the heuristic. And now this optimal and heuristic algorithms are the followers in that game. They don’t get to control the inputs, but they have other variables they control, and they have objectives that they want to maximize or minimize.

HUIZINGA: Right.

ARZANI: And so that’s how the Stackelberg-game dynamic comes about. And then we got other researchers in the team involved, and then we started talking, and then it just evolved into this beast right now that is a tool, MetaOpt, that we released, I think, a couple of months ago. And another piece that was really cool was people from ETH Zürich came to us and were like, oh, you guys analyzed our heuristic! We have a better one! Can you analyze this one? And that was a whole fun thing we did where we analyzed their heuristics for them. And, then, yeah …

HUIZINGA: Yeah. So all these things that you’re mentioning, are they findable as papers? Were they presented …

ARZANI: Yes.

HUIZINGA: … at conferences, and where are they in anybody’s usability scenario?

ARZANI: So the MetaOpt tool that I just mentioned, that one is in … it’s an open-source tool. You can go online and search for MetaOpt. You’ll find the tool. We’re here to support anything you need; if you run into issues, we’ll help you fix it.

HUIZINGA: Great. You can probably find all of these papers under publications …

ARZANI: Yes.

HUIZINGA: … on your bio page on the website, Microsoft Research website.

ARZANI: Correct.

HUIZINGA: Cool. If anyone wants to do that. So, Behnaz, the idea of having ideas is cool to me, but of course, part of the research problem is identifying which ones you should go after [LAUGHS] and which ones you shouldn’t. So, ironically, you’ve said you’re not that good at that part of it, but you’re working at getting better.

ARZANI: Yes.

HUIZINGA: So first of all, why do you say that you’re not very good at it? And second of all, what are you doing about it?

ARZANI: So I, as I said, get attracted to puzzles, to hard problems. So most of the problems that I go after are problems I have no idea how to solve. And that tends to be a risk.

HUIZINGA: Yeah.

ARZANI: Where I think people who are better at selecting problems are those who actually have an idea of whether they’ll be able to solve this problem or not. And I never actually asked myself that question before this year. [LAUGHTER] So now I’m trying to get a better sense of, how do I figure out if a problem is solvable or not before I try to solve it? And also, just what makes a good research problem? So what I’m doing is, I’m going back to the era that I thought had the best networking papers, and I’m just trying to dissect what makes those papers good, just to understand better for myself, to be like, OK, what do I want to replicate? Replicate, not in terms of techniques, but in terms of philosophy.

HUIZINGA: So what you’re looking at is how people solve problems through the work that they did in this arena. So what are you finding? Have you gotten any nuggets of …

ARZANI: So a couple. So one of my favorite papers is Van Jacobson’s TCP paper. The intuition is amazing to me. It’s almost like he has a vision of what’s happening, is the best I can describe it. And another example of this is also early-on papers by people like Ratul Mahajan, Srikanth Kandula, those guys, where you see that they start with a smaller example that, kind of, shows how this problem is going to happen and how they’re going to solve it. I mean, I did this in my work all the time, too, but it was never conscious. It’s more of like that goes to that mindfulness thing that I said before, too. It’s like you might be doing some of these already, but you don’t notice what you’re doing. It more of is, kind of, like putting of like, oh, this is what they did. And I do this, too. And this might be a good habit to keep but cultivate into a habit as opposed to an unconscious thing that you’re just doing.

HUIZINGA: Right. You know, this whole idea of going back to what’s been done before, I think that’s a lesson about looking at history, as well, and to say, you know, what can we learn from that? What are we trying to reinvent …

ARZANI: Yeah.

HUIZINGA: … that maybe doesn’t need to be reinvented? Has it helped you to get more targeted on the kinds of problems that you say, “I’m not going to work on that. I am going to work on that”?

ARZANI: To be very, very, very fair, I haven’t done this for a long time yet! This has been …

HUIZINGA: A new thing.

ARZANI: I started this this month, yeah.

HUIZINGA: Oh my goodness!

ARZANI: So we’ll see how far I get and how useful it ends up being! [LAUGHS]

[MUSIC BREAK]

HUIZINGA: One of my favorite things to talk about on this show is what my colleague Kristina calls “outrageous” lines of research. And so I’ve been asking all my guests about their most outrageous ideas and how they turned out. So sometimes these ideas never got off the ground. Sometimes they turned out great. And other times, they’ve failed spectacularly. Do you have a story for the “Microsoft Research Outrageous Ideas” file?

ARZANI: I had this question of, if language has grammar, and grammar is what LLMs are learning, which, to my understanding of what people who are experts in this field say, this maybe isn’t that, but if it is the case that grammar is what allows these LLMs to learn how language works, then in networking, we have the equivalent of that, and the equivalent of that is essentially network protocols. And everything that happens in a network, you can define it as an event that happens in a network. You can think of those, like, the events are words in a language. And so, is it going to be the case, and this is a question which is, if you take an event abstraction and encode everything that happens in a network in that event abstraction, can you build an equivalent of an LLM for networks? Now what you would use it for—this is another reason I’ve never worked on this problem—I have no idea! [LAUGHTER] But what this would allow you to do is build the equivalent of an LLM for networking, where actually you just translate that network’s events into, like, this event abstraction, and then the two understand each other. So like a universal language of networking, maybe. It could be cool. Never tried it. Probably a dumb idea! But it’s an idea.

HUIZINGA: What would it take to try it?

ARZANI: Um … I feel like bravery is, I think, one because with any risky idea, there’s a probability that you will fail.

HUIZINGA: As a researcher here at Microsoft Research, when you have this idea, um … and you say, well, I’m not brave enough … even if you were brave enough, who would you have to convince that they should let you do it?

ARZANI: I don’t think anybody!

HUIZINGA: Really?

ARZANI: That’s the whole … that’s the whole point of me being here! I don’t like being told what to do! [LAUGHS]

HUIZINGA: Back to the beginning!

ARZANI: Yeah. The only thing is that, maybe, like, people would be like, what have you been doing in the past six months? And I wouldn’t have … that’s the risk. That’s where bravery comes in.

HUIZINGA: Sure.

ARZANI: The bravery is more of there is a possibility that I have to devote three years of my life into this, to figuring out how to make that work, and I might not be able to.

HUIZINGA: Yes …

ARZANI: And there’s other things. So it’s a tradeoff also of where you put your time.

HUIZINGA: Sure.

ARZANI: So there. Yeah.

HUIZINGA: And if, but … part of it would be explaining it in a way to convince people: if it worked, it would be amazing!

ARZANI: And that’s the other problem with this idea. I don’t know what you would use it for. If I knew what you would use it for, maybe then it would make it worth it.

HUIZINGA: All right. Sounds like you need to spend some more time …

ARZANI: Yeah.

HUIZINGA: …ruminating on it. Um, yeah. The whole cliché of the solution in search of a problem.

ARZANI: Yeah.

HUIZINGA: [LAUGHS] As we close, I want to talk a little bit about some fun things. And so, aside from your research life, I was intrigued by the fact, on your bio page, that you have a rich artistic life, as well, and that includes painting, music, writing, along with some big ideas about the value of storytelling. So I’ll take a second to plug the bio page. People, go look at it because she’s got paintings and cool things that you can link to. As we close, I wonder if you could use this time to share your thoughts on this particular creative pursuit of storytelling and how it can enhance our relationships with our colleagues and ultimately make us better researchers and better people?

ARZANI: I think it’s not an understatement to say I had a life-changing experience through storytelling. The first time I encountered it, it was the most horrific thing I had ever seen! I had gone on Meetup—this was during COVID—to just, like, find places to meet people, build connections and all that, and I saw this event called “Storytelling Workshop,” and I was like, good! I’m good at making up stories, and, you know, that’s what I thought it was. Turns out it’s, you go and tell personal stories about your life that only involve you, that make you deeply vulnerable. And, by the way, I’m Iranian. We don’t do vulnerability. It’s just not a thing. So it was the most scary thing I’ve ever done in my life. But you go on stage and basically talk about your life. And the thing it taught me by both telling my own stories and listening to other people’s stories is that it showed me that you can connect to people through stories, first of all. The best ideas come when you’re actually in it together. Like one of the things that now I say that I didn’t used to say, we, we’re all human. And being human essentially means we have good things about ourselves and bad things about ourselves. And as researchers, we have our strengths as researchers, and we have our weaknesses as researchers. And so when we collaborate with other people, we bring all of that. And collaboration is a sacred thing that we do where we’re basically trusting each other with bringing all of that to the table and being that vulnerable. And so our job as collaborators is essentially to protect that, in a way, and make it safe for everybody to come as they are. And so I think that’s what it taught me, which is, like, basically holding space for that.

HUIZINGA: Yeah. How’s that working?

ARZANI: First of all, I stumbled into it, but there are people who are already “that” in this building …

HUIZINGA: Really?

ARZANI: … that have been for years. It’s just that now I can see them for what they bring, as opposed to before, I didn’t have the vocabulary for it.

HUIZINGA: Gotcha …

ARZANI: But people who don’t, it’s like what I’ve seen is almost like they initially look at you with skepticism, and then they think it’s a gimmick, and then they are like, what is that? And then they become curious, and then they, too, kind of join you, which is very, very interesting to see. But, like, again, it’s something that already existed. It’s just me not being privileged enough to know about it or, kind of, recognize it before.

HUIZINGA: Yeah. Can that become part of a culture, or do you feel like it is part of the culture here at Microsoft Research, or … ?

ARZANI: I think this depends on how people individually choose to show up. And I think we’re all, at the end of the day, individuals. And a lot of people are that way without knowing they are that way. So maybe it is already part of the culture. I haven’t necessarily sat down and thought about it deeply, so I can’t say.

HUIZINGA: Yeah, yeah. But it would be a dream to have the ability to be that vulnerable through storytelling as part of the research process?

ARZANI: I think so. We had a storytelling coach that would say, “Tell your story, change the world.” And as researchers, we are attempting to change the world, and part of that is our stories. And so maybe, yeah! And basically, what we’re doing here is, I’m telling my story. So …

HUIZINGA: Yeah.

ARZANI: … maybe you’re changing the world!

HUIZINGA: You know, I’m all in! I’m here for it, as they say. Behnaz Arzani. It is such a pleasure—always a pleasure—to talk to you. Thanks for sharing your story with us today on Ideas.

ARZANI: Thank you.

[MUSIC]

[1] For clarification, Arzani notes that she attended and received her PhD from the University of Pennsylvania. By “which is where I was from,” Arzani meant outside of those academic institutions well known for their technical programs.