Ideas Archives - Microsoft Research http://approjects.co.za/?big=en-us/research/podcast-series/ideas/ Thu, 09 Apr 2026 16:10:40 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Ideas: Steering AI toward the work future we want http://approjects.co.za/?big=en-us/research/podcast/ideas-steering-ai-toward-the-work-future-we-want/ Thu, 09 Apr 2026 16:10:37 +0000 http://approjects.co.za/?big=en-us/research/?p=1167876 Microsoft Chief Scientist Jaime Teevan and researchers Jenna Butler, Jake Hofman, and Rebecca Janssen unpack the New Future of Work Report 2025 and explore the ideal AI-driven working world. Plus, is AI a tool or a collaborator? And why the answer matters.

The post Ideas: Steering AI toward the work future we want appeared first on Microsoft Research.

]]>

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

Since 2020, researchers across Microsoft have conducted, surfaced, and analyzed key research into how people work as part of the New Future of Work research initiative. They’ve done this through a variety of lenses—from changes caused by the pandemic to the adoption of hybrid work practices to the arrival of increasingly capable AI models—with the goal of empowering people and organizations to redefine work in real time. 

In this episode, Microsoft Chief Scientist and Technical Fellow Jaime Teevan talks with researchers Jenna ButlerJake Hofman, and Rebecca Janssen about the latest efforts: the Microsoft New Future of Work Report 2025. The group explores what the report says about AI’s adoption and impact, the intentionality needed to create a future in which people flourish, and current perceptions around AI use. Plus, is AI a tool or a collaborator? And why the answer matters.

Transcript

[MUSIC] 

JAIME TEEVAN: Really what we’ve been living through, it’s not that, like, every year work is changing in a generational manner. It’s much more that we are in the middle of a really big shift in sort of how digital technology can support people getting things done. 

JENNA BUTLER: It is not predetermined. The future of work is actively being built by us, by consumers. I love that. 

JAKE HOFMAN: It’s easy for us to say, let’s get everyone to adopt and let’s boost efficiency. Let’s make everything really quick, right. But I don’t think that that’s actually the future, like, we want to live in. 

REBECCA JANSSEN: We keep benchmarking against the past. So what can AI do, or can AI do what we already do? And I think this is, like, a mistake or maybe only the first step and the more important step comes next. 

STANDARD INTRODUCTION: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code.

[MUSIC FADES] 

JAIME TEEVAN: Hi, I’m Jaime Teevan, chief scientist and technical fellow at Microsoft, and today, we’re going to talk about the new future of work. 

So back in 2020, researchers from across Microsoft came together to try to make sense of this seismic shift in work practices that was happening as a result of the pandemic, and the next year, the group published the very first New Future of Work report. Microsoft has been publishing a new report every year since with no shortages of disruptions and major technological shifts in between. 

Joining me today to explore the latest report are my colleagues, Jenna Butler, Jake Hofman, and Rebecca Janssen, who are a few of the many authors on the report. 

Jenna, Jake, Rebecca, welcome to the podcast. 

REBECCA JANSSEN: Thanks, Jaime. 

JAKE HOFMAN: Thanks, Jaime. 

JENNA BUTLER: Thank you. 

TEEVAN: There are a lot of factors that shape the work people do and how they do it, from social factors to economic factors to technological factors. And, you know, as we’ve learned from the previous reports that we’ve written together, accounting for this complexity requires a lot of different backgrounds, knowledge bases, approaches, and research methodologies. 

So before we get into the specifics of the report, I’d love it if each of you could share a little bit about the experience and expertise that you bring to the contributions you made to the report and why the work you do matters. Jenna, why don’t you get us started? 

BUTLER: Sure, yeah, thank you, Jaime. 

So I’ve been on the report since it started in 2020, and I’m really proud of the work that we do. I think it matters for a number of reasons, but most importantly, I think, especially right now, people feel like technology is sort of happening to them and these changes are happening to them. And actually, with any technology we introduce to society, that’s a sociotechnical shift. And so how people perceive it, use it, what they want to do with it, what they’re willing to pay for—all these things matter. And so the report, I think, gives some agency to people to let them know, like, what’s happening right now, what’s the latest research, and also how are your own behaviors and views shaping the technology. 

And when it comes to expertise, I study software engineering productivity and right now very specifically how AI impacts or changes that. But my background is actually originally in bioinformatics studying cancer. And I’ve always loved multidisciplinary fields because I feel like with the type of problems we have in today’s world, the solutions often lie at the interface of multiple disciplines. And so this report with over, you know, 50 different authors from all over the world, I think, is a really fun example of just how much great stuff you can get when you bring different people like that together. 

TEEVAN: Thanks, Jenna. How about you, Jake? 

HOFMAN: Yeah, so I’ve been involved with the report since 2023, so less time than Jenna, but as an author originally on bits related to AI and cognition, which is a core research topic for our Microsoft Research New York City lab. And more recently, I’ve co-led a workstream across the whole company called Thinking and Learning with AI, or TALA for short, with Richard Banks, another researcher. 

And so Jenna and Rebecca and company, who really drive and lead the report, were kind enough to invite me to be a section editor this year. And I gladly accepted because I know how widely read and impactful the report is. And I think it’s just a wonderful opportunity to showcase research not only from Microsoft but from all around, from a coherent viewpoint and voice. 

TEEVAN: Thanks, Jake, and Rebecca? 

JANSSEN: Yeah, and we were really glad to have you join us as section editor, Jake, just to say that. 

Yes, so I joined Microsoft full time in October 2024, so, kind of, like the new joiner among the three of us. And already during my PhD, I was interested in, like, AI and its impacts on work and society, in particular from the economics perspective. So I was always really excited about that group’s work and was, yeah, just, like, really looking forward to leaning in not only on the economics perspective and those sections but also, like, more broadly with, like, editing the report overall. 

And to the point of, like, why it matters, I think what is so exciting about the report is the variety of, like, different people, different backgrounds, and different topics. And there’s, like, so much you can talk about, speak about, but also realize, oh, AI is impacting work but also, like, so many different other parts of life. 

TEEVAN: Rebecca, I love your story, too, about how you had been reading the report from outside of Microsoft and then got to come in to engage. I know there were a number of people involved this year who said that. It, kind of, was cool, like, to feel it become something of an institution. 

JANSSEN: Yeah, yeah, exactly. 

TEEVAN: Yeah, no, super cool. But for listeners who are new to the New Future of Work Report, can you share a little about what it is, who it’s for, what people can use it for? 

BUTLER: Yeah, I can take that one. So obviously I’m biased—I think it’s for everyone. But perhaps it’s not. But the idea is to, sort of, showcase the research that’s been happening over the last year. So we release it annually, usually in December, on these big shifts that have been happening, and so the last couple of years, AI has been a big part of it. And the idea is to take research not just from Microsoft but from external places, as well, all around the world, and try and, sort of, sum it up in small statements that we can back up with research. And we are very careful to make sure we’re only doing this in areas where we have a researcher and we can make a pretty bold claim and where we feel confident in the data and that it backs up what we’re saying. 

And so if you just want to read one, albeit somewhat long, report, you’ll get an idea of what’s happening in the world of AI and work fairly broadly. So from the economy to adoption, to thinking and learning, to specific industries and what leading experts outside the company are thinking and predicting, as well. So it should be broadly accessible to any sort of academic audience. You don’t need to be an AI expert to read it. And hopefully, it’ll help with all different areas. 

TEEVAN: You know, one of the things that jumped out to me, Jenna, sort of reflecting on the past five years—this is our fifth report—so on the past … over the ones we’ve done is every time when we go to release it, it’s like, “Oh my gosh, work has changed. It will never be the same again.” [LAUGHTER] I was actually, like, reading the past introductions to the report. 

In 2021, during, you know, thinking about the pandemic, I was like, “Work will never again be the same!” In 2022, as we were shifting to hybrid work, I said, “Work is changing faster than it has in a generation.” 2023—we’ve been living through not one but two generational shifts in how we work. And then, you know, more recently, obviously, we’ve been talking a lot about the transformative impact on AI and productivity

And one thing that was fun about doing this report was sort of looking at these what felt like different shifts over time and, like, being able to see the through threads and the connections. Because really what we’ve been living through, it’s not that, like, every year work is changing in a generational manner. It’s much more that we are in the middle of a really big shift in sort of how digital technology can support people getting things done. 

And I’d be curious about what changes in attitudes and understanding of AI and work you all have witnessed in these past five years across industry and academia and even, like, on an individual level, like how it’s changed for you personally. 

HOFMAN: I can kick us off with that maybe. I think it’s pretty amazing, like, in the last three years, to think about just how much in the research world has changed on generative AI and work. 

You know, like, I remember, like, January 2023, you know, people were just off to the races. Everyone was doing everything they could to just evaluate a model in isolation because that’s what people had access to. But there was very little in terms of, like, humans in the loop and people evaluating what happens when it’s not just a model taking a standardized test or a benchmark. And so that was something that we immediately focused on because it really hit our expertise in the lab here. And, you know, there were others, but it was still, kind of, limited in terms of who had access to the models and who had the capability to, like, design and run experiments that involved, you know, real people, right. And even then, it was, kind of, limited to laboratory experiments, right. 

And now, you know, fast-forward three years, and we have pretty much everyone has access to any model they want to. They have amazing tools to build and design experiments, and they can run them in the field, right. And I think there’s also been a shift from, OK, how much does this tool speed us up to what are the bigger, broader effects— which is all the exciting stuff, I think, for thinking and learning in particular—that these tools have beyond just efficiency. 

So I think it’s just amazing. In no other time have you seen this leap from, you know, a three-year period from like a few people doing small lab studies to like lots of people doing field experiments with, you know, wide-reaching implications. 

TEEVAN: Yeah. Rebecca or Jenna, have you observed in your own work practices, sort of, Jake’s talking about how his research is changing. Have you been observing things like that, as well? 

JANSSEN: Yeah, definitely. I would say it’s just so interesting to see how these tools can help you. I mean, when I started or, like, I finished my PhD kind of like throughout this wave of, like, AI really picking up and just, like, even in this short time seeing, “Oh, where does it help me? Where does it not help me that much?” But also the stress of it: “Oh, where do I want to stay involved?” And I think that’s still, like, an ongoing progress or process, at least for me, to figure this out. And I think that’s also what I hear from other people, that they’re, like, experimenting a lot, playing around with this and figuring out, OK, where does it actually change things and change workflows on the broader level. 

BUTLER: Yeah, I think, Rebecca, to that point of, like, where does it help me or where does it not, something that has struck me over the last five years of the report is how nuanced it is and how we anticipated certain things and it wasn’t necessarily like that. 

Like when we all went remote, we thought, oh, people will be lonely. And there were studies looking at this, and it was like, wait, some people are really thriving. What’s that about? And then hybrid work, like, we don’t all need to go back or we need to go back sometimes

And then with AI: “This incredible tool—everyone’s going to benefit.” And then we saw, oh, there’s so many factors as to who benefits and how they benefit, and whether they believe it’s going to be useful even impacts it and what kind of tasks they’re doing and what their problem-solving style is. So I think the uniqueness of all of this and how each worker is different and there was no single answer has been really fun to see and watch, as well—and tricky but keeps us employed. 

TEEVAN: Yeah, yeah, yeah. No, so I like this thinking about the different ways that people … like, even just listening to the three of you and seeing the variation in the ways that you’re thinking about your work practices changing, adoption clearly matters a lot, and I know that’s something that we center in [on in] the report. 

Jake, you talked about how everybody has access to models. But not everybody is actually using the models and we’re certainly not using them in the same way. 

I was wondering if you could tell us a little bit about what the report says about today’s level of adoption and like who’s using it and how. 

JANSSEN: So what we see in the research—and this is mainly based on, like, surveys being conducted in different countries and then, of course, also some more, like, field experiment studies—what we see is that AI adoption is definitely increasing overall, but it’s really heterogeneous and more nuanced in depth, like who is using it and also, like, for which purposes. 

So a German survey found that about, like, 38% of the respondents were using AI for work (opens in new tab). But this is just, like, the average. And we do see, like, lots of differences across, like, industries. 

So there were other surveys where the results showed that IT and procurement were example industries or, like, sectors which were more open to use AI than maybe marketing or operations (opens in new tab)

There also has been some evidence on men being more open to using it than women (opens in new tab). I don’t know how the gap looks, like, right now. I hope this is, like, converging even more. But this is maybe, like, on the high level, like, about AI adoption levels. 

And for the question of, like, what people use this for, there are now more studies also, like, using chat conversations to see, “Oh, what are actually, like, the user intents and goals.” And we have a group also within Microsoft who has done something similar, and they found that information retrieving but also communicating has been or have been among the top user intents. There’s definitely a lot of, like, writing related or there are a lot of writing-related tasks that are conducted with chat tools, and I think that’s, like, the big picture we see. 

But maybe even there, I think, it also depends a lot on which AI tool people are using. So maybe Anthropic’s work sometimes shows more, a heavier weight on, like, coding and developer use cases. So there’s definitely, like, some variety. 

TEEVAN: And, Jake, I know you’ve done a lot of studying in the education context, as well. Can you share a little about that? 

HOFMAN: Yeah, I mean, the report, I think, gives really definitive numbers in this regard in that recent surveys show that, like, 80% of students, sorry, 80% of [K-12] teachers and 90% of [K-12] students report having used, you know, generative AI for schoolwork (opens in new tab), you know, with use growing year over year, right. 

What’s interesting is that, you know, there are, like, myriad educational, like, tools and specific versions of generative AI products and all these startups, and yet almost all of the reporting shows that people are using the generic off-the-shelf Copilot, ChatGPT, Claude, Gemini, and so on (opens in new tab) not necessarily even in like a learn mode, right, and so I think this speaks to, like, the bigger sort of policy and training gap that’s out there in terms of the fact that everyone is using these tools, but there’s not amazing guidance for how to use them constructively. 

The good news there, I think, is that we’ve seen, like, big efforts this year. So with the American Federation of Teachers in partnership with Microsoft and OpenAI and Anthropic, there’s actually a big program to try to re-skill teachers and give them the training to use this technology appropriately (opens in new tab). So I think there’s a lot of hope there, but I think it’s also really something we should keep our eye on in terms of making sure that we’re using these tools in the right way. 

TEEVAN: Yeah, and one of the challenges is that the tools are changing so fast. Like, it’s very hard to provide any guidance … 

HOFMAN: For sure, yeah. 

TEEVAN: … when it’s going to be different tomorrow. Yeah, I find that, too. 

Like, people are always asking me, they’re like, “Ooh, what surprises you most about how people are using AI?” And it’s funny because almost as soon as something surprises me, like a week later, everybody’s like, “That’s obvious” because things are changing so fast. 

But I’m going to turn that question on to all three of you, and I would like you each to answer this. I’m curious what you have found particularly surprising about how people and organizations are leveraging AI right now. Maybe, Jenna, you want to kick us off? 

BUTLER: Sure, yeah. I do a lot of studies looking at how organizational behavior is changing with AI, and something that is somewhat surprising but I think might really surprise others is just how much influence individual people have on the adoption of these technologies. 

So lots of studies have shown that how individuals talk about it with their colleagues will change whether they’re willing to use it or what tasks they use it for (opens in new tab) and how leadership demonstrates and discusses these tools will impact whether their people feel like they can use them (opens in new tab)

And so while we did just give everyone like, “Hey, here’s access to these absolutely incredible tools,” as you said, Jaime, we didn’t exactly have a guidebook for these people because they’re changing all the time. And so a lot of the best use cases have just been figured out by people using them and sharing that sort of from a ground-up point of view. And so I feel like it’s been a technology where individuals have had a lot of opportunity to help shape how it’s used and how it’s spread through an organization. 

HOFMAN: Yeah. You know, I think it’s not … like, the bottom up is super cool, as you mentioned, Jenna, but also the fact that, like, how much experimentation people are doing and how creative people are getting with these tools, I think, is just been itself really surprising to me. 

I think, you know, it’s sort of this thing that builds on itself because, you know, there used to be kind of a high barrier from translating an idea … like, if you had some boring, repetitive thing that you did at work and you wanted to automate it, right, you probably needed to know how to code and needed to know how to do a bunch of obscure things to, like, make that real and then share it with other people, right? And now, that barrier is much lower, and so you see all the creative ideas and the democratization of that happening and then people sharing it really quickly and easily with their colleagues and then all of a sudden, everyone is like, “Did you hear what so-and-so did? I’m going to start doing that,” right? 

On the other hand, I think it is a little bit terrifying just how fast the experimentation is going and sometimes how reckless people are, right, especially with some of the agentic stuff where people give, like, all permissions to their agents and they let them go do all kinds of crazy things. And sometimes, that leads to interesting outcomes and sometimes undesirable outcomes. So I think it’s been exciting to see things change so fast, but I hope we can find, like, a good balance of move fast and hopefully not break things. [LAUGHS] 

JANSSEN: Yeah, definitely agree on, like, their experimentation part there, Jake. 

I think for me, what is especially surprising but also fascinating was the learning about the new ways of interacting with these tools. So we talked about, a lot about, like, multimodal models. So, like, OK, you can generate text, you can generate videos, but also like the way of interacting with AI. 

So throughout the report, I learned also about some user research which is looking at like, we are so used to using text-based artifacts, but maybe I want to emphasize something or, like, something speaks to me in particular and I find it important, so I double-click on this, and this way the tool then knows, oh, this is something I need to dive deeper into. So just, like, these new ways of interacting with them (opens in new tab), with the tools, I think, is something really, really encouraging because it also speaks to the fact that individuals are just really different and everyone has their own needs or preferences and some of the tools can help just meeting the different preferences there. 

TEEVAN: So we’ve been talking a lot about adoption, and I want to switch now a little bit to talk about the impact that AI is actually having on how people get things done. And obviously impact is heavily mediated by adoption. 

Is there anything that we can say based on the adoption findings or anything else about what we actually know about the changes that AI is bringing about? 

BUTLER: Yeah, I think we’re seeing a lot of things. So while on the one hand, there’s still so much we don’t know, we are able to observe a lot as we go. 

We do see that a lot of tasks are able to be impacted by AI, and so when we think about it, we don’t necessarily think about whole jobs, like how the jobs are shifting as a single whole, but more like the tasks different people do are shifting over time. 

So specifically in the software engineering field, we’re already seeing that software engineers are spending a lot more time interacting with code in ways that feel fun for them, like the harder problems. They’re getting to think more; they’re getting to solve more problems and do less boilerplate or boring work to them. But then we also see that that’s driving some burnout or some cognitive overload where they feel like I only ever am doing the exciting hard problems, and my brain never gets a break from that. 

So this shift in how each job is doing tasks differently is something we’re really observing, and we see it a lot with white-collar workers and jobs that involve information (opens in new tab) and being on a computer. They have a lot of tasks that are amenable to this technology. 

TEEVAN: I love the concern about only ever doing the hard, interesting, exciting problems, because I totally feel it. Like, it’s real. It’s just funny, you know. [LAUGHS] 

BUTLER: Yeah. 

JANSSEN: Yeah, I can maybe add to some of the adoption, like, impact side, also, like, on the labor market or, like, what we see in those areas in the sections of the report. 

I think for first … for the first part, it’s we do have more insights now into, like, individual productivity effects. There have been, like, multiple studies, field experiments, lab experiments, or different, like, occupations where some groups are using AI, others do not, and how this impacts then their work. And what we usually see there is that people tend to be faster at completing tasks and also oftentimes lead to better outcomes (opens in new tab) or, like, complete … are able to provide better outcomes. 

That being said, there are also studies where this is not the case (opens in new tab) or which raise these issues or issues about overreliance and that people also need to make sure, like, to still be engaged and making sure, oh, is this actually a good task that AI can really help me with or am I just relying on the AI tool too much there? So there is some, like, jagged frontier (opens in new tab) of what AI can do and cannot do and, like, how people, yeah, how they interact with that. 

On the broader level, on the labor market side—that’s also something that we have emphasized in the report—we do not see large impacts or effects overall based on some labor market studies that are looking at both employment rates (opens in new tab) but also job postings (opens in new tab) and these kinds of things (opens in new tab). Maybe if you’re looking at specific, like, online labor platforms or just, like, the system or, like, the ecosystem is a little bit different, it might be different. But overall, I would say that the effects are still, like, modest. 

One subgroup where we have early insights now that they might be especially, like, impacted is the group of, like, early-career workers (opens in new tab) where maybe AI can do some of their tasks more easily than for later stages in their careers. But even there, I think we still need more time and evidence to say explicitly, “Oh, this is because of AI,” and not just, like, macroeconomic trends. 

TEEVAN: And when do you think we’re going to be able to, you know, start seeing that impact? Do you think it’s because the impact isn’t happening at that macro level, or do you think it’s just a kind of temporal thing? 

JANSSEN: I think it’s probably both. And I would also say that AI is a technology, but we are living in systems and we are living or working in organizations, and organizations will adopt in one way or the other. And I think we do need some more time but also, I think, time for people and organizations to really think about, “Oh, how do we want this to change our work settings?” 

TEEVAN: That’s great. Actually, I love … I think it’s fun for us to dive into, what do we want a little bit? You know, I think often we talk about things as sort of cut and dry or black and white. And, you know, where is the nuance in what’s happening and how can we start, you know, how can we lean into that to shape a future that we’re excited about? 

JANSSEN: So oftentimes, people say, “Oh, AI is having this impact or this effect.” And I think there was something that all the authors and also editors of the report were always like, “Well, it’s not that black and white.” 

So individual productivity effects might not equal group productivity effects because it’s just, like, really different to work on your own than working in the group. It’s also not “the more AI you use, the better,” or, like, more… using AI more doesn’t necessarily lead to productivity effect. 

But as Jake already said and is probably able to speak even more about, it’s a lot about, like, how are people using AI and in which ways? When do they use them? Do they use them before they’re thinking about doing tasks themselves or only after? So I think these would be two things that come to mind to me. 

TEEVAN: And we’ve certainly seen historically that technology, like, to your point, Rebecca, the way that it gets adopted isn’t necessarily the obvious ways, you know, as you sort of bring it into systems. Jake, I know you’ve done a lot of thinking in that space, as well, with things like social media. 

HOFMAN: Yeah, and I think, you know, in some way, you could think of this moment as AI’s like social media moment, right? 

Social media sort of was developed super rapidly. It was adopted super rapidly. It was, you know, optimized for what seemed like the obvious thing of like adoption and engagement at the time. But I think there are these, you know, side effects of sort of myopically optimizing for one thing, and, you know, we’re now decades later and we, you know, it’s hard to disentangle what happened and why, right. 

And so I think when we think about AI and we think about the risks and think about things being, you know, is this a cut-and-dry case? Is it good? Is it bad? So on and so forth, right, I think it’s important to step back and say, actually, it’s up to us in terms of what future we design with it. And the key to doing that is to not myopically focus on just the easy things, right. It’s easy for us to say, let’s get everyone to adopt and let’s boost efficiency. Let’s make everything really quick. Right? But I don’t think that that’s actually the future, like, we want to live in, where everything is just fast, fast, fast. And so it’s really important for us to realize we’re in control of this and to put in ability to measure and monitor the broader effects that these tools are having so that we can steer things to the right course, right. So I think it’s, like, a real opportunity to learn from the past and to try to do something different, to steer our future in a good direction. 

TEEVAN: Yeah, and are there specific things you’re doing in your research right now to try and get ahead of that or look to that? 

HOFMAN: Yeah, I mean, I think the biggest challenge is to say, you know, in a, look, in a lab experiment or in some very targeted field experiment, actually measuring effects on people is something you can do somewhat well. It’s a hard social science problem all the time. But now if you step back and you think about, how do we do that in, like, the products that we create as, you know, a big company at scale? I think that’s a really interesting, really hard research challenge. 

And, you know, it’s, like, it’s … the answer is going to be a combination of technical things and social things and automated telemetry and surveys and tying all these things together, and figuring out how to do this in a way that actually works for an organization making and shipping products, I think, is really, you know, really important and really challenging. 

TEEVAN: Yeah, I wonder if there’s things organizational leaders or even individuals should be doing in this space, as well. 

HOFMAN: Yeah, maybe I’ll just say one more thing on this. I think the more that leaders can emphasize that this is an important aspect of product design, the better off we will all be. Because I think short of hearing that from leaders, it’s hard for that to happen bottom up because people have so much pressure to just build things and get them out there. And so that’s one thing that I think could make a real difference. 

TEEVAN: Yeah, and some of this in some ways is, like, really building, like, complex AI literacy that isn’t just short-term focused or myopic. And, you know, in some ways, AI literacy shows up as a theme throughout the report. 

Jenna, I know that’s something that you’ve done a lot of thinking about, as well. I was wondering if you could talk to how AI literacy relates to some of the themes we’ve been talking about and, like, has impact at the individual and organizational level, particularly as things are changing so fast. 

BUTLER: Yeah. I love what Jake was saying about how, like, we need to be asking the right questions and not just looking at how fast things work and understanding how people actually use it because people’s own views of these tools impacts how they use it. And so we really want people to understand, like, all people, at a basic level what these tools are, what they’re good for, what they might not be as good for, what the pros and cons are, what the risks are. And we all are seeing this play out in various ways. 

So we saw in a study of software engineers this concept called the productivity pressure paradox. And basically, they said to us, “Hey, we were given these tools; we were told we’re going to be so productive, but we don’t know how they work and we don’t know how to be more productive with them, but our bosses are awaiting more things. So I’m just going to double down on what I already know and work even harder.” And so there was this lift where when the tool was introduced, they looked more productive, but it wasn’t because they’d actually changed how they work to take advantage of it, because they didn’t know how to do that. 

And we also know how people feel about these tools, like what they think they’ll be good at … I think everyone enjoyed the meme of asking ChatGPT how many r’s were in strawberry. And those of us who know how they work, it’s, like, it’s not really funny. Of course, it’s terrible at that, right? But if you don’t know that, then you’re not going to ask the right questions. 

And so we really want people to have sort of a basic understanding of, hey, what are the inherent biases here that I need to be aware of if I use the model? Is it going to point me down a certain path because it wants to make me feel great about myself, or should I probe it a little bit more and be like, really, is this a good idea? Like, how do I use it to make me most effective? 

And I think we need to give people a bit of time to learn that. And I think we definitely see this in organizations where the rollout has been quick and the excitement has been high, but not everyone has had the time to really learn to understand how, within their own workflow and what they do every day and the way they work, how these things can affect them and be productive for them. 

JANSSEN: Maybe actually picking up one thing that Jenna just said on this fact of how do people feel about using AI or when they’re just, like, asked to use it: I think this is also, like, a growing area of, like, research also within Microsoft but also beyond. And really important is, like, what are the psychological influences of using AI on people, on users, also, like, across different maybe age groups? What are the risks? What do we need to care about? And kind of, like, where do we need to set guardrails or similar? Because I think there are these effects, as well, and we need to be researching those similarly as we are, oh, what are the productivity effects of these things. 

There’s also one interesting finding, I think, from the report was about the social perceptions when people are using AI (opens in new tab) that users that use AI are sometimes seen, I don’t know, [as] lazy, less valuable (opens in new tab) when they’re using AI. At the same time, everyone’s like, oh yeah, but I’m also asked to use it. Or there are also maybe some trust issues around, oh, should I make it transparent that I use AI or not? So I think these areas of research are also growing in importance but also in how common they are. 

TEEVAN: Yeah, I mean, we’ve been really focused up until now … a lot of the research has been like how individuals use the tool, but what you’re sort of hinting at there, Rebecca, is, like, what it means in social contexts and in the larger system to use a tool. What’s some of the early research that has been showing up around sort of AI’s use in collaborative contexts? 

BUTLER: I mean, this is a really exciting space, right? Like, we kind of, the report, the first AI report was a lot more on individuals, and then we started looking at in the real world, and in the real world, we work with other people. And so how these tools interact and collaborate and mediate collaboration is definitely interesting. 

I think one thing we’ve seen that Rebecca alluded to is that there’s a lot of issues with perception. So one study found if the same, like, writing material was given and you said a woman used AI and wrote it or a man used AI and wrote it, the woman was judged as being less competent, even though the text was the same (opens in new tab). So some of these things that have always been around in our world, some of the biases people hold, are, like, translating into this new world of AI, and how then … how I receive work that someone else did is being impacted by that. 

And one positive we see there is it seems as AI becomes more ubiquitous and people are like, yeah, it’s a tool and it’s great, they have less judgment (opens in new tab) against others using it (opens in new tab). But right now, some people are still nervous about what do I use, what do I signal when I’m using it, and how am I going to be perceived? So even just within how humans relate to each other, we’re seeing it starting to have an impact on how they want to use it. 

TEEVAN: Yeah, it’s interesting. You know, I think the metaphor we use for AI is super interesting, and I sort of hear us playing around with different metaphors. And in some ways, you know, it’s really important that we think about AI somewhat differently in that previously, all of our interactions with a computer were deterministic, and we would, like, tell the computer exactly what we wanted it to do. And it, like, was screwing up if it couldn’t count the right number of r’s in strawberry. And that’s very different now. We have these stochastic models that we can communicate with in natural language. 

In many ways, they’re much more powerful, but they’re also not deterministic. So I think sometimes we think of human metaphors. Sometimes we call AI a collaborator. Sometimes, Jenna, as I saw you were just doing, we’re, like, thinking of AI as a tool and something we get things done [with]. 

I’d be kind of interested in, like, what the different metaphors you play around with in your research and how you think that shapes the way … either the way that your research evolves and the questions you ask or the way that people think about that. 

HOFMAN: Yeah, Jaime, I think it’s a great point. 

I mean, I think personally, and this is more just individual experience, but it leaks over into some of the research designs and things we investigate. We do have tremendous experience in dealing with, like, stochastic and not fully perfect systems in people, right? [LAUGHS] 

And so one thing that I think has been interesting to reflect on being in a research org is like we’re very used to having, you know, interns or students who have a lot of expertise but don’t always get everything right. And a lot of the time, thinking about how to interact with and investigate what that student has done is very similar to me in thinking about how to interact with and investigate what an AI tool has done. And I think it’s made for a really comfortable transition to using AI tools in a research org that I’ve seen in other contexts like in artistic or creative settings where, you know, these tools are totally, you know, sort of off limits or, you know, seen as bad or undesirable. 

And I think developing this skill of interacting with a system, like, this is going to be increasingly important. And I think it is a useful metaphor. How would you describe this to a very skilled but imperfect collaborator? 

JANSSEN: Yeah, we are actually currently writing up a paper from a study that we did last year where we gave two different trainings to two different groups, framing the AI either as a tool to collaborate with or more like a training which focused on the technical capabilities of the tool. And we actually did see that then the group who was interacting with the tool in a more collaborative way or thought of this, of the tool more collaboratively, did have a better experience but also led to different outcomes there. 

So I do think there’s a difference in how we experience and also in which mindset we approach these tools. And, yeah, I individually usually try to see it as a tool but want to, like, interact with the tool and, like, go back and forth and not maybe just like accepting the first output, but just, like, really iterating. And I think this is also something that studies and research has shown that this might be helpful for users

TEEVAN: Yeah. 

JANSSEN: And maybe also adding also to your question about, like, individual and collaboration, I think one aspect that we also saw that I was, I really find interesting is, like, how much more difficult it is to build tools for collaboration or like group settings than for individuals, because it brings like so many new layers to it. It’s like, oh, we need to think about social intelligence. What does the group environment is, which is, like, not there for, like, an individual use case. When do we want to use … when do we want AI maybe to step in in a group setting? How do we think about memory of the group? What is, like, some underlying, maybe emotional settings or, like, emotional context that the AI needs to be aware of. 

And it’s just, like, so much more difficult. And I think we also learn a lot about collaboration itself through this process because recently I was like, what does collaboration actually mean? Does it mean I work with someone, or does it mean I work for someone? So even finding out these nuances, I think, is really, really interesting. 

TEEVAN: Yeah, I think that’s a really good point, Rebecca, is, like, in some ways the collaborative search space is so much larger than the individual productivity search space, and we already have seen how much scale was necessary for a model just to start to learn some of the emergent underlying pieces of individual interactions with a model, that that’s a real challenge and opportunity as we start thinking larger. 

You know, Jenna, I was wondering in the software development space whether you’re seeing, especially in collaborative contexts, sort of interesting metaphors or ways that people are using AI, because that’s a place where we see super early adoption and can get good insight for future productivity tasks, as well. 

BUTLER: Yeah, we did a fun study this past summer where we looked at people who had the same context—they’re in the same team; they work in the same code; they have the same manager—but where one used it a lot and one didn’t. And we interviewed them to understand their kind of perceptions and how they viewed this. And what we found is that the people who use it more do view it more as a collaborator and less as a tool. The folks who saw it as a tool then assumed it had a purpose. So, like, you know the expression “when all you have is a hammer, everything’s a nail”? So if this is just a tool, then I got to find the nails and that’s the only place I can use it. 

But if it’s a collaborator, then if it’s not working, they would take on a position of, maybe it’s me, like I should try prompting it differently. I should give it new context. Like there’s got to be some way to get this thing to work in this context, and so I’m not going to give up. 

So we found that the people who viewed it in that way, as a collaborator, where it could get to the right answer. And we even see with the model sometimes you just have to encourage them and tell them like, “No, you can do this,” and then it’ll give you the answer. It’s really funny. [LAUGHTER] 

TEEVAN: The little model that could. 

BUTLER: And so we’ve seen––yes!––with the developers, the ones that just kind of stick with it and as Jake was saying, see it as a collaborator that can do different things, they tend to benefit from the tool a lot more and they have a broader idea of what it could potentially do and they use it in a lot more context, and so then they enjoy using it more. 

TEEVAN: So I like the, you know, I think it’s useful to think about we want to break out of the deterministic context. And so it’s useful to think of AI as a collaborator. It’s certainly aligned with our notion of, like, AI helps bring out the best in people. I wonder if this sort of slightly anthropomorphic metaphor limits our imagination in some ways, as well. AI certainly can do things that humans can’t. 

There’s, you know, it can operate at scale. All of a sudden, you can have natural language across hundreds or thousands of people easily synthesized. It operates super fast. You can generate new ideas and different perspectives very quickly. I’ve been trying to think of, like, what are the next metaphors that will help us break out of our sort of limitations of thinking about working with people? I don’t know if you all have any thoughts on that space. 

JANSSEN: Not yet! I would be interested if you have already, Jaime. [LAUGHTER] 

TEEVAN: I don’t have an answer yet. 

BUTLER: Well, Jaime, I saw your post on, like, how AI is not like a human (opens in new tab) and how considering those differences is more, can be effective or can help us break out of it. And I found that really exciting because something we’re seeing, I think, is a lot of companies and people are looking to automate something a human already does and do it faster. Like what Jake was saying, do we just want to be faster at everything? 

And that’s easy because we can observe what a human does. We’ve probably already been measuring what a human … 

TEEVAN: We can just hire more people, too. 

BUTLER: Yeah, so we can do that. But when we start to think about what can it do that humans can’t do, that’s sort of where I think we need that imagination, where we start to think, OK, this is totally different than anything I’ve done before. 

And I love space, and it makes me think a lot about space exploration. Like, it’s not like we used to go to space slowly when we didn’t have electricity and computers, right? We just didn’t go to space. [LAUGHTER] Like, you looked up there and you thought, “That would be cool someday.” And then this whole field opened when we got this new technology. 

So I do think a lot about what are not just things that I can do better, faster, or in parallel, but what could I have never done before that I can now? And I think that’s where all of the open and exciting parts come to be. I just don’t know the answer. 

TEEVAN: Oh, and I love your metaphor, Jenna, because I actually keep watching Star Trek: The Next Generation, and, like, actually talking about these different chapters that the New Future of Work Report has, it’s been amazing because, like, when I watched it during the pandemic, it was perfect because in some ways, it’s just like this really small, closed community that travels the world, you know, so it’s sort of like exploring but like being a small community and then now obviously with AI, the computer and data and all the ways, and I do think that they offer, that offers a really positive sort of view of the future. 

And, you know, as we begin to close here, I thought it might be fun for us to take a moment to really think about this moment that we’re in—how we work, how we see other people working, the research that we’re reading and doing—and think about what the ideal new future of work looks like. What are we creating, and how do you want to contribute to it? Jenna, maybe you want to kick us off? 

BUTLER: Yes, with this easy question. [LAUGHTER] 

TEEVAN: So, yeah, just solve the future of work. 

BUTLER: If we could just do that. 

HOFMAN: Softball. 

BUTLER: Yeah. Well, what’s great about it is that we can ask the question, right? Like, it’s not predetermined. The future of work is actively being built by us, by consumers. I love that. And so I do like to picture a future of work where humans are flourishing with AI and where humans still get to do meaningful work. 

So one of the workstreams we have in the [New] Future of Work is on meaningful work, and we know that when people do work that they feel connected to, societies function better and people are happier. And so I don’t want a future where we replace work with agents. I really want a future where AI allows humans to thrive more, to still be front and center, and to be doing things that change the world. So I’d be very excited for AI doctors working alongside humans to maybe cure cancer. You know, that’d be excellent. That was my first crack. I didn’t succeed when I tried, so maybe now we can. 

But that’s kind of the future where it’s both economically valuable, but it’s also meaningful for humans in the world. And that’s the future that I’m hoping that we’re painting with our reports and with our research. 

 TEEVAN: Thanks. Jake? 

HOFMAN: Yeah, yeah, Jenna, I think, like, a huge plus one to the human flourishing aspect. And I think sort of in a way that this is, like, the broadest and best interpretation of Microsoft’s, like, mission statement, to empower everyone to achieve more, right. I don’t think it means, like, write more documents and check off more tasks. I don’t think that’s the version we should be going for. I think it means, do more of the stuff you’re passionate about and less of the stuff that you’re not, so that, like, the future of work is that it doesn’t feel like “real work.” It doesn’t feel like the slog, and you get to do the stuff that you’re, like, flowing and enjoying, and time flies by because you’re just loving what you’re doing. 

And I think that’s the future we want. I don’t think it’s going to happen by accident if we just work on the more faster sort of thing, and so I really hope that the work and research that we all do can contribute to that version of the future because I think we’d all be much happier in it. 

JANSSEN: Yeah, I think the two of you have already said this really beautifully, and I say just, like, plus one to that. 

I also see, like, the … I would love the new future of work to be a future where AI makes the human parts of work more visible but also more valued, and a future where humans are able to bring in their creativity or explore new ways of creativity, bring in their human judgment, guide directions, setting like intentions. I think this would be really great. And yes, the two of you have already said like humans or seeing humans flourishing and feeling that their work is meaningful. I think it’s just, like, great. 

TEEVAN: Great, good. And then finally, to wrap things up, I’ve got a couple of lightning questions. They’re quick questions, quick answers, but they’re actually quite hard questions. So just share what’s top of mind for you. Don’t worry about it. I’ll ask them and then, like, Rebecca, we’ll start with you, then Jenna, then Jake, just so, Jake, you’ve got it easiest. We’re giving you a few seconds to think about things. [LAUGHS] 

HOFMAN: What they said. [LAUGHS] 

TEEVAN: But, yes, just what’s top of mind for you. 

What’s one misconception about AI at work that you wish you could retire today? 

JANSSEN: The more you use AI, the more productive you are. 

BUTLER: I think that’s similar to mine, which is that if you give someone these tools, they’ll all be 10x more productive because the tool itself is good. There’s so many other factors— how they perceive it, how others perceive it, how it fits into their workflow. It’s not just giving people an amazing tool that’s going to change productivity. 

 HOFMAN: And mine is just to pull up, I think what both Rebecca and Jenna have already said earlier, which is, like, it’s not all good and it’s not all bad. And how we design and use it really matters. That’s up to us and we can steer it to be better or worse. 

TEEVAN: Great. Question No. 1. Now we’re on Question No. 2. What’s one finding from the report that you hope becomes widely understood? 

JANSSEN: I think we keep benchmarking against the past. So what can AI do, or can AI do what we already do? And I think this is, like, a mistake or maybe only the first step and the more important step comes next. Like, what can AI do or help us with that we can’t do yet? 

BUTLER: For me, as the editor, I have snuck the same slide into the report for the last three years, and that is Erik Brynjolfsson (opens in new tab)‘s diagram of the space of innovation. And the idea there is just that the opportunities for augmenting humans are far greater than for replacing or automating them (opens in new tab) and that there’s more opportunity, more tasks, more economic opportunity in that bigger space. 

HOFMAN: I love that and totally agree. And I’ll just point to one of my favorite slides in the deck, which is on, like, the future of computer science education. And I think, you know, there’s this thought of, like, you know, the dawn of AI is the end of computer science education, or people needing to know computer science. This, I think this slide that we have in there does a great job of talking about how it’s actually just a redefinition of what we mean by computer science and pulling things to a higher level of abstraction, thinking about computational thinking, problem solving, thinking clearly and breaking things down, you know, algorithmically. And I think that’s a great shift and I’m excited to embrace it. 

TEEVAN: Awesome. Third and final question, and, Jake, you’re already half of … part of the way there. What is one thing you are genuinely excited to research next? 

HOFMAN: Yeah, so I can tie it into something that I’ve personally been working on, that computer science angle, and I think giving teachers the ability to control and have visibility into what their students are doing is something we have not broadly done and made accessible to people. It’s something I developed and tested for my own teaching this year and have also worked with a bunch of academic collaborators on randomized controlled trials with. And I think just the sooner we can get that into every teacher’s hands so that they are not just subject to whatever their students are doing with whatever tools, the better we can correct what’s going on. So I am very excited to work on that going forward. 

JANSSEN: Yeah, I would say we have spent, or we as a community, both like in companies, but also academia, have spent a lot of time now on what AI can automate. But I would be excited and love to learn more about what people want AI to maybe help them with and kind of like leading to, going back to the question of like, what does the new future of work, the ideal new future of work, look like for like the human workers and the individuals? And learning more about these impacts and guiding in these directions. 

BUTLER: Oh, for me, I think in the software world, we are seeing that since people can do so much more and they don’t have to do the boring tasks, their brains are just never getting a break and people are feeling sort of burnt out. And I’m very curious about how we can take advantage of AI and do more without running ourselves into the ground because we’re not AI, right? We’re people and we have requirements and needs. So I’m really excited to see how we can take advantage of what is uniquely AI and then what is uniquely human and help people to flourish like we talked about. 

TEEVAN: Thanks, Jenna, Jake, Rebecca. I appreciate all your time today. 

[MUSIC]

And to our audience, thank you as well. If you want to learn more about the report and how AI is changing how people work, visit aka.ms/nfw (opens in new tab)

And that’s it for now. Until next time. 

[MUSIC FADES]

The post Ideas: Steering AI toward the work future we want appeared first on Microsoft Research.

]]>
Ideas: Community building, machine learning, and the future of AI http://approjects.co.za/?big=en-us/research/podcast/ideas-community-building-machine-learning-and-the-future-of-ai/ Mon, 01 Dec 2025 19:18:20 +0000 As the Women in Machine Learning Workshop (WiML) marks its 20th annual gathering, cofounders, friends, and collaborators Jenn Wortman Vaughan and Hanna Wallach reflect on WiML’s evolution, navigating the field of ML, and their work in responsible AI.

The post Ideas: Community building, machine learning, and the future of AI appeared first on Microsoft Research.

]]>

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In 2006, three PhD students organized the Women in Machine Learning Workshop, or WiML, to provide a space for women in ML to connect and share their research. The event has been held every year since, growing in size and mission.

In this episode, two of the WiML cofounders, Jenn Wortman Vaughan, a Microsoft senior principal research manager, and Hanna Wallach, a Microsoft vice president and distinguished scientist, reflect on the 20th workshop. They discuss WiML’s journey from a potential one-off event to a nonprofit supporting women and nonbinary individuals worldwide; their friendship and collaborations, including their contributions to defining responsible AI at Microsoft; and the advice they’d give their younger selves.

Transcript

[MUSIC]

SERIES INTRODUCTION: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

JENN WORTMAN VAUGHAN: Hello, and welcome. I’m Jenn Wortman Vaughan. This week, machine learning researchers around the world will be attending the annual Conference on Neural Information Processing Systems, or NeurIPS. I am especially excited about NeurIPS this year because of a co-located event, the 20th annual workshop for Women in Machine Learning (opens in new tab), or WiML, which I am going to be attending both as a mentor and as a keynote speaker.

So to celebrate 20 years of WiML, I’m here today with my long-term collaborator, colleague, close friend, and my cofounder of the workshop for Women in Machine Learning, Hanna Wallach.

You know, you and I have known each other for a very long time at this point. And in many ways, we followed very parallel and often intersecting paths before we both ended up here working in responsible AI at Microsoft. So I thought it might be fun to kick off this podcast with a bit of the story of our interleaving trajectories.

So let’s start way back 20 years ago, around the time we first had the idea for WiML. Where were you, and what were you up to?

HANNA WALLACH: Yeah, so I was a PhD student at the University of Cambridge, and I was working with the late David MacKay. I was focusing on machine learning for analyzing text, and at that point in time, I’d actually just begun working on Bayesian latent variable models for text analysis, and my research was really focusing on trying to combine ideas from n-gram language modeling with statistical topic modeling in order to come up with models that just did a better job at modeling text.

I was also doing this super-weird two-country thing. So I was doing my PhD at Cambridge, but at the end of the first year of my PhD, I spent three months as a visiting graduate student at the University of Pennsylvania, and I loved it, so much so that at the end of the three months I said, can I extend for a full year? Cambridge said yes; Penn said yes. So I did that and actually ended up then extending another year and then another year and another year and so on and so forth.

But during my first full year at Penn, that was when I met you, and it was at the visiting students weekend, and I had been told by the faculty in the department that I had to work really hard on recruiting you. I had no idea that that was actually going to be the start of a 20-plus-year friendship.

WORTMAN VAUGHAN: Yeah, I still remember that visiting weekend very well. I actually met you; I met my husband, Jeff; and I met my PhD advisor, Michael Kearns, all on the same day at that visiting student weekend. So I didn’t know it at the time, but it was a very big day for me.

So around that time when I started my PhD at Penn, I was working in machine learning theory and algorithmic economics. So even then, you know, just like I am now, I was interested in the intersection of people and AI systems. But since my training was in theory, my “people” tended to be these mathematically ideal people with these well-defined preferences and beliefs who behaved in very well-defined ways.

Working in learning theory like this was appealing to me because it was very neat and precise. There was just none of the mess of the real world. You could just write down your model, which contained all of your assumptions, and everything else that followed from there was in some sense objective.

So I was really enjoying this work, and I was also so excited to have you around the department at the time. You know, honestly, I also loved Penn. It was just such a great environment. I was just actually back there a few weeks ago, visiting to give a talk. I had an amazing time. But it was, I will say, very male dominated in the computer science department at the time. In my incoming class of PhD students, we had 20 incoming PhDs, and I was the only woman there. But we managed to build a community. We had our weekly ladies brunch, which I loved, and things like that really kept me going during my PhD.

WALLACH: Yeah, I loved that ladies brunch. That made a huge difference to me and, kind of, kept me going through the PhD, as well.

And, like you, I’d always been interested in people. And during the course of my PhD, I realized that I wasn’t interested in analyzing text for the sake of text, right. I was interested because text is one of these ways that people communicate with each other. You know, people don’t write text for the sake of writing text. They write it because they’re trying to convey something. And it was really that that I was interested in. It was these, kind of, social aspects of text that I found super interesting.

So coming out of the PhD, I then got a postdoc job focused on analyzing texts as part of these, sort of, broader social processes. From there, I ended up getting a faculty job, also at UMass, as one of four founding members of UMass’s Computational Social Science Institute (opens in new tab). So there was me in computer science, then there was another assistant professor in statistics, another in political science, and another in sociology. And in many ways, this was my dream job. I was being paid to develop and use machine learning methods to study social processes and answer questions that social scientists wanted to study. It was pretty awesome. You, I think, started a faculty position at the same time, right?

WORTMAN VAUGHAN: Yeah. So I also did a postdoc. First, I spent a year as a postdoc at Harvard, which was super fun. And then I started a tenure track position in computer science at UCLA in 2010.

Again, you know, it was a very male-dominated environment. My department was mostly men. But maybe even more importantly than this, I just didn’t really have a network there. You know, it was lonely. One exception to this was Mihaela van der Schaar. She was at UCLA at the time, though not in my department, and she, kind of, took me under her wing. So I’m very grateful that I had that support. But overall, this position just wasn’t a great fit for me, and I was under more stress then than I think I have been at any other point in my life that I could really remember.

WALLACH: Yeah. So at that point, then, you ended up transitioning to Microsoft Research, right?

WORTMAN VAUGHAN: Yep.

WALLACH: Why did you end up choosing MSR [Microsoft Research]?

WORTMAN VAUGHAN: Yeah, so this was back in 2012. MSR had just opened up this new New York City lab at the time, and working in this lab was basically my dream job. I think I actually tried to apply before they had even officially opened the lab, like when I just heard it was happening.

So this lab focused in three areas at the time. It focused in machine learning, algorithmic economics, and computational social science. And my research at the time cut across all three of these areas. So it felt just like this perfect opportunity to work in the space where my work would fit in so well and be really appreciated.

The algorithmic economics group at the time actually was working on building prediction markets to aggregate information about future events, and they were already, in doing this, building on top of some of my theoretical research, which was just super cool to see. So that was exciting. And I already knew a couple of people here. I knew John Langford and Dave Pennock, who was in the economics group at the time, because I’d done an internship actually with the two of them at Yahoo Research before they came to Microsoft. And I was really excited to come back and work with them again, as well.

You know, even here at the time that I joined the lab, it was 13 men and me. So once again, not great numbers. And I think that in some ways this was especially hard on me because I was just naturally, like, a very shy person and I hadn’t really built up the confidence that I should have at that point in my career. But on the other hand, I found the research fit just so spot-on that I couldn’t say no. And I suspect that this is something that you understand yourself because you actually came and joined me here in the New York lab a year or two later. So why did you make this switch?

WALLACH: Yeah, so I anticipated that I was going to love my faculty job. It was focusing on all this stuff that I was so excited about. And much to my surprise, though, I kind of didn’t. And it wasn’t like there was any one particular thing that I didn’t like. It was more of a mixture of things. I did love my research, though. That was pretty clear to me. But I wasn’t happy. So I spent a summer talking to as many people as possible in all different kinds of jobs, really just with the goal of figuring out what their day-to-day lives looked like. You were one of the people I spoke to, but I spoke to a ton of other people, as well.

And from doing that, at the end of that summer, I ended up deciding to apply to industry jobs, and I applied to a bunch of places and got a bunch of offers. But I ended up deciding to join Microsoft Research New York City because of all the places I was considering going, they were the only place that said, “We love your research. We love what you do. Do you want to come here and do that same research?”

And that was really appealing to me because I loved my research. Of course, I wanted to come there and do my same research and especially with all of these amazing people like you, Duncan Watts, who’d for many years been somebody I’d really looked up to. He was there, as well, at that point in time. There was this real focus on computational social science but with a little bit more of an industry perspective. There were also these amazing machine learning researchers. Just for many of the same reasons as you, I was just really excited to join that lab and particularly excited to be working in the same organization as you again.

WORTMAN VAUGHAN: Yeah, I’m happy to take at least a little bit of the credit for …

WALLACH: Oh yeah.

WORTMAN VAUGHAN: … recruiting you to Microsoft here many years ago.

WALLACH: Oh yeah.

WORTMAN VAUGHAN: Yeah. I was really excited to have you join, too, though I think the timing actually worked out so that I missed your first couple of months because I was on maternity leave with my first daughter at the time. I should say I’ve got two daughters, and I’m very proud to share in the context of this podcast that they’re both very interested in math and reading, as well.

WALLACH: Yeah, they’re both great.

Um, so then we ended up working in the same place. But despite that, it still took us several years to end up actually collaborating on research. Do you remember how we ended up working together?

WORTMAN VAUGHAN: Yeah. So I used to tell this story a lot. Actually, I was at this panel on AI in society back in, I think, it was probably 2016. It was taking place in DC. And someone on this panel made this statement that soon our AI systems are just going to be so good that all of the uncertainty is going to be taken out of our decision-making, and something about this statement just, like, really set me off. I got so mad about it because I thought it was just …

WALLACH: I remember.

WORTMAN VAUGHAN: … such an irresponsible thing to be saying. So I came back to New York, and I think I was ranting to you about this in the lab, and this conversation ended up getting us started on this whole longer discussion about the importance of communicating uncertainty and about explaining the assumptions that are behind the predictions that you’re making and all of this.

WALLACH: So this was something … I was really excited about this because this was something that had really been drummed into me for years as a Bayesian. So Bayesian statistics, which forms a lot of the foundation of the type of machine learning that I was doing, is all about explicitly stating assumptions and quantifying uncertainty. So I just felt super strongly about this stuff.

WORTMAN VAUGHAN: Yeah. So somehow all of these discussions we were having led us to read up on this literature that was coming out of the machine learning community on interpretability at the time. There are a bunch of these papers coming out that were making claims about models being interpretable without stopping to define who they were interpretable to or for what purpose. Never actually taking these models and putting them down in front of real people. And we wanted to do something about this. So we started running controlled experiments with real people and found that we often can’t trust our intuition about what makes a model interpretable.

WALLACH: Yeah, one of the things that came up a lot in that work was, sort of, how to measure these squishy abstract human concepts, like interpretability, that are really hard to define, let alone quantify and measure and stuff like that.

WORTMAN VAUGHAN: Absolutely. So I think one of the first things that we really struggled with in this line of work was what it even means to be interpretable or intelligible or any of these terms that were getting thrown around at the time.

Um, we ended up doing some research, which is still one of my favorite papers, …

WALLACH: Me, too.

WORTMAN VAUGHAN: … with our colleagues Forough Poursabzi, Jake Hofman, and Dan Goldstein. And in this work, we found it really useful to think about interpretability as a latent property that can be, kind of, influenced by different properties of a model or system’s design. So things like the number of features the model has or whether the model’s linear or even things like the user interface of the model.

This was kind of a gateway project for me in the sense that it’s one of the first projects that I got really excited about that was more of a human-computer interaction, or HCI, project rather than a theory project like I’d been working on in the past. And it just set off this huge spark of excitement in me. It felt to me at the time more important than other things that I was doing, and I just wanted to do more and more of this work.

I would say the other project that had a really similar effect on me, which we also worked on together right around the same time, was our work with Ken Holstein mapping out challenges that industry practitioners were facing in the space of AI fairness.

WALLACH: Oh yeah. OK, yep. That project, that was so fun, and I learned so much from it. If I recall correctly, we originally hired Ken, who I think was an HCI PhD student at CMU at the time, as an intern …

WORTMAN VAUGHAN: Yep.

WALLACH: … to work with us on creating, sort of, user experiences for fairness tools like the Fairlearn toolkit (opens in new tab). And we started that project—so that was in collaboration with Miro Dudík and Hal Daumé—we started that project by having Ken talk to a whole bunch of practitioners at Microsoft but at other organizations, as well, to get a sense for how they were and weren’t using fairness toolkits like Fairlearn.

And I want to point out that at that point in time, the academic research community was super focused on all of these, like, simple quantitative metrics for assessing the fairness in the context of predictions and predictive machine learning models with this, kind of, understanding that these tools could then be built to help practitioners assess the fairness of their predictive models and maybe even make fairer predictions. And so that’s the kind of stuff that this Fairlearn toolkit was originally developed to do. So we ended up asking all of these practitioners originally just as, sort of, the precursor to what we thought we were going to end up doing with this project.

We also asked these practitioners about their current practices and challenges around fairness in their work and about their additional needs for support. So where did they feel like they had the right tools and processes and practices and where did they feel like they were missing stuff. And this was really eye-opening because what we found was so different than what we were expecting. And there’s two things that really stood out to us.

So the first thing was that we found a much, much wider range of applications beyond prediction. So we’d come into this assuming that all these practitioners were doing stuff with predictive machine learning models, but in fact, we were finding they were doing all kinds of stuff. There was a bunch of unsupervised stuff; there was a bunch of, you know, language-based stuff—all of this kind of thing. And in hindsight, that probably doesn’t sound very surprising nowadays because of the rise of generative AI, and really the entire machine learning and AI field is much less focused on prediction in that, kind of, narrow, kind of, classification-regression kind of way. But at the time, this was really surprising, especially in light of the academic literature’s focus on predictions when thinking about fairness.

The second thing that we found was that practitioners often struggled to use existing fairness research, in part because these quantitative metrics that were all the rage at that point in time, just weren’t really amenable to the types of real-world complex scenarios that these practitioners were facing. And there was a bunch of different reasons for this, but one of the things that really stood out to us was that this wasn’t so much about the underlying models and stuff like that, but it was actually that there were a variety of data challenges involved here around things like data collection, collection of sensitive attributes, which you need in order to actually use these fairness metrics.

So putting all this together, the upshot of all this was that we never did what we originally set out to do with that [LAUGHS], that internship project. We … because we uncovered this really large gap between research and practice, we ended up publishing this paper that characterized this gap and then surfaced important directions for future research. The other thing that the paper did was emphasize the importance of doing this kind of qualitative work to actually understand what’s happening in practice rather than just making assumptions about what practitioners are and aren’t doing.

The other thing that came out of it, of course, was that the four of us—so you, me, Miro and Hal—learned a ton about HCI and about qualitative research from Ken, which was just, uh, so fun.

WORTMAN VAUGHAN: Yeah, and I started to be confronted with the fact that I could no longer reasonably ignore all of these messes of the real world because, you know, in some ways, responsible AI is really all about the messes.

So I think this project was really a big shift for both of us. And in some ways, working on this and the interpretability work really led us to be active in these early efforts that were happening within Microsoft in the responsible AI space. Um, the research that we were doing was feeding directly into company policy, and it felt like it was just, like, a huge place where we could have some impact. So it was very exciting.

So switching gears a bit. Hanna, do you remember how we first got the idea for WiML?

WALLACH: Yes, I do. So we were at NeurIPS. This was back in 2005. It was a … so NeurIPS was a very different conference back then. Now it’s like tens of thousands of people. It’s held in a massive convention center. Yes, there are researchers there, but there’s a variety of people from across the tech industry who attend, but that is not what it was like back then.

So in around … in 2005, it was more like 600 people thereabouts in total[1], and the main conference would be held every year in Vancouver, and then everybody at the conference would pile onto these buses, and we would all head up to Whistler for the workshops.

WORTMAN VAUGHAN: Yep.

WALLACH: So super different to what’s happening nowadays. It was my third time. I think that’s right. I think it was my third time attending the conference. But it was my first time sharing a hotel room with other women. And I remember up at the workshops, up in Whistler, there were five of us sitting around in a hotel room, and we were talking about how amazing it was that there were five of us sitting around talking, women. And we, kind of, couldn’t believe there were five of us. We’re all PhD students at the time. And so we decided to make this list, and we started trying to figure out who the other women in machine learning were. And we came up with about 10 names, and we were kind of amazed that there were even 10 women in machine learning. We thought this was a huge number. We were very excited. And we started talking about how it might be really fun to just bring them all together sometime.

So we returned from NeurIPS, and you and I ended up getting lunch to strategize. I still remember walking out of the department together to go get lunch and you were walking ahead of me. I can visualize the coat you were wearing as you were walking in front of me. And so we strategized a bit and ended up deciding, along with one of the other women, Lisa Wainer, to submit a proposal to the Grace Hopper conference for a session in which women in machine learning would give short talks about their research.

We reached out to the 10 names that we had written down in the hotel room and through that process actually ended up finding out about more women in machine learning and eventually had something like 25 women listed on the final proposal. I think there’s an email somewhere where one or another of us is saying to the other one, “Oh my gosh! I can’t believe there are so many women in machine learning.”

So we submitted this proposal, and ultimately, the proposal was rejected by the Grace Hopper conference. But we were so excited about the idea and just really invested in it by that point that we decided to hold our own co-located event the day before the Grace Hopper conference. And I’ve got to say, you know, 20 years later, I don’t know what we were thinking. Like, that was a bold move on the part of three PhD students. And it turned out to be a huge amount of work that we had to do entirely ourselves, as well.

WORTMAN VAUGHAN: Yeah.

WALLACH: We had no idea what we were doing. But the Grace Hopper folks very nicely connected us with the venue that the conference was going to be held at, and somehow, we managed to pull it off. Ultimately, that first workshop had around 100 women, and there was this … rather than just, like, a single short session, which is what we’d originally had in mind, we had this full day’s worth of talks. I actually have the booklet of abstracts from all of those talks at my desk in the office. I still have that today. And it was just an amazing experience.

WORTMAN VAUGHAN: Yeah, it was. And, you know, you mentioned how bold we were. I just, I really don’t think that any of us at the time realized how bold we were being here, getting this workshop rejected and then saying, you know, no, we think this is important. We’re going to do it anyway. On our own. As grad students.

So I’ve already talked a little bit about some of the spaces that I was in throughout my career where there just weren’t a lot of women around in the room with me. How had you experienced a lack of community or network of women in machine learning before the founding of WiML? And, you know, why do you think it’s important to have that kind of community?

WALLACH: So I felt it in a number of different ways. I think I mentioned a few minutes ago that, like, it was my third time at NeurIPS but my first time sharing a hotel room with another woman. But there were many places over the years where I’d felt this.

So first, as an undergraduate. Then, I did a lot of free and open-source software development, and I was pretty involved in stuff to do with the Debian Linux distribution. And back then, the percentage of women involved in free and open-source software development was about 1 percent, 1.5% (opens in new tab), and the percentage involved actually in Debian was even less than that. So that had led me and some others to start this Debian Women Project (opens in new tab). And then, again, of course, I faced this in machine learning.

I just didn’t know that many other women in machine learning. I didn’t … there weren’t a large number of senior women, for example, to look up to as role models. There weren’t a large number of female PhD students. And this, kind of, made me sad because I was really excited about machine learning, and I hoped to spend my entire career in it. But because I didn’t see so many other women around, particularly more senior women, that really made me question whether that would even be possible, and I just didn’t know.

Um, I think, you know, thinking about this, and I’ve obviously reflected on this a lot over the years, but I think having a diverse community in any area, be it free and open-source software development, be it machine learning, any of these kinds of things, is just so important for so many reasons. And some of those reasons are little things like finding people that you would feel comfortable sharing a hotel room with.

But many of these things are bigger things that can then have, like, even, kind of, knock-on cumulative effects. Like feeling valued in the community, feeling welcome in the community, having role models, being able to, sort of, see people and say, “Oh, I want to be kind of like that person when I grow up; I could do this.” And then even just representation of different perspectives in the work itself is so important.

The flip side of that is that there are a whole bunch of things that can go wrong if you don’t have a diverse community. You can end up with gatekeeping, with toxic or unsafe cultures, obviously attrition as people just leave these kinds of spaces because they feel that they’re not welcome there and won’t be valued there. And then to that point of having representation of different perspectives, with a really homogeneous community, you can end up with, kind of, blind spots around the technology itself, which can then lead to harms.

WORTMAN VAUGHAN: 100%. So did you ever imagine during all of this that WiML would still be around 20 years later and we would be sitting here on a podcast talking about this?

WALLACH: [LAUGHS] No, absolutely not. I didn’t even think that WiML would necessarily be around for a second year. I thought it was probably going to be, like, a one-off event. And I certainly don’t think that I thought that I would still be involved in the machine learning community 20 years later, as well. So very unexpected.

I’ve got a question for you, though. What do you remember most about that first workshop?

WORTMAN VAUGHAN: I remember a lot of things. I remember that, you know, when we were planning this, we always really wanted the focus to be the research. And, you know, if you think back to what this first workshop looked like, it was a lot of us just giving talks or presenting posters about our own research to other people.

And, you know, I remember thinking at the poster session, like, the vibe was just so much different and better, healthier really than other poster sessions I had been to. Everyone was so supportive and encouraging, but it really was all about the research. I also remember being blown away just walking into that conference room in the morning and seeing all of these women gathered in one place and knowing that somehow, we had actually made this happen.

Um, I remember we also faced some challenges with the workshop early on. What are the challenges that stand out to you most?

WALLACH: Yeah, so a lot of people really got it, right. And they were super supportive. So, for example, folks at Penn totally got it, and they actually funded a bunch of that first workshop. But others in the community didn’t get it and didn’t see the point, didn’t see why it was necessary.

I remember having dinner with one machine learning researcher and him telling me that he didn’t think this kind of workshop was necessary because women’s experiences were no different to men’s experiences. And then later on in the conversation, he talked about—like, you know, this is, like, an hour and a half later or something—he talked about how he and a friend of his had gone to the bar at an all-women’s college and he’d felt so awkward and out of place. And I ended up pointing out to him [LAUGHS] that he just, kind of, explained to himself why we needed WiML. So, yeah, there were some people who didn’t get it, and it took a lot of, sort of, talking to people and, kind of, explaining.

WORTMAN VAUGHAN: Yep.

WALLACH: Another challenge was figuring out how to fund it in an ongoing manner once we decided that we wanted to do this more than once.

So as I said, Penn funded a lot of that first workshop, but that wasn’t a sustainable model, and it wasn’t going to be realistic for Penn to keep funding it. So in the end, we worked with Amy Greenwald to obtain a National Science Foundation grant that would cover a lot of costs, and we also received donations from other organizations.

Um, a third challenge was figuring out where to hold the workshop given that we did want that focus to be on research. So the first two times, we held the workshop at the Grace Hopper conference, but we started to feel that that wasn’t really the right venue given that we wanted that focus to be on research. So we ended up moving it to NeurIPS, and this had a bunch of benefits, some of which I don’t think we’d even fully thought through when we made that decision.

So one of the benefits was that attendees’ WiML travel funding—so we would give them this travel funding to enable them to pay the cost of attending WiML, stay in hotel rooms, all this kind of stuff—this would actually enable them to attend NeurIPS, as well, if we co-located with NeurIPS.

WORTMAN VAUGHAN: Yep.

WALLACH: Another main benefit was that we held WiML on the day before NeurIPS. So then throughout the rest of the conference, WiML attendees would see familiar faces throughout the crowd and wouldn’t necessarily feel so alone.

WORTMAN VAUGHAN: So you’re talking about these challenges. How have these challenges changed over time? Or, you know, more broadly, can you talk about how the workshop and Women in Machine Learning as an organization as a whole, kind of, evolved over the years? I know that you served a term as the WiML president.

WALLACH: Yeah. So it’s changed a lot. So first, obviously, most importantly, it evolved from being, kind of, this one-off event where we were just seeing what would happen to being really a robust organization. And the first step in that was creating the WiML board. And, as you just said, I served as the first president of that.

But there have been a bunch of other steps since then. And one of the things I want to flag about the WiML board was that this was really important because the board members could focus on the long-term health of the organization and these, sort of, like, you know, things that spanned multiple years, like how to get sustainable funding sources, this kind of thing, versus the actual workshop organizers, who would focus on things like running the call for submissions and stuff like that. And being able to separate those roles made it really just reduce the burden on the workshop organizers meant that we could take this, kind of, longer-term perspective.

Another really important step was becoming, officially becoming a non-profit. So that happened a few years ago. And again, it was the natural thing to do at that point in time and just another step towards creating this, sort of, durable, robust organization.

But it’s really taken on a life of its own. I’m honestly not super actively involved nowadays, which I think is fantastic. The organization doesn’t need me. That’s great. It’s also wild to me that because it’s been around for 20 years at this point that there are women in the field who don’t know what it’s like to not have WiML.

So a bunch of other affinity groups got created. So Timnit Gebru cofounded Black in AI when she was actually a postdoc at Microsoft Research New York City. So you and I got to actually see the founding of that affinity group up close. And then now there are a ton of other affinity groups. So there’s LatinX in AI (opens in new tab); there’s Queer in AI (opens in new tab), Muslims in ML (opens in new tab), Indigenous in AI and ML (opens in new tab), New In ML (opens in new tab), just to name a few.

WORTMAN VAUGHAN: Yeah, and all of these are growing, too, every year.

You know, this year, WiML had over 400 submissions. They accepted 250 to be presented. It’s amazing.

WALLACH: That’s wild.

WORTMAN VAUGHAN: Yeah, yep. And there’s going to be a WiML presence this year actually at all three of the NeurIPS venues. So there’s going to be a presence in Mexico City, in Copenhagen, and, of course, in San Diego for the main workshop. So it’s pretty great.

And, you know, on top of that, I think the organization now, as you were saying, is able to do so much more than just the workshop alone. So for instance, WiML now runs this worldwide mentorship program for women and nonbinary individuals in machine learning, where they’re matched with a mentor and they can participate in these one-to-one mentoring meetings and seminars and panel discussions, which happens all throughout the year. I think they have about 50 mentors signing up each year, but I’m sure they could always use more. Um, so it’s just really amazing to look back and see how much the WiML community has done and how much it’s grown.

And, you know, on the one hand, I think that honestly, like, founding WiML was one of the things that I’ve done over the course of my career, if not the thing, that I am most proud of …

WALLACH: Oh yeah, me, too.

WORTMAN VAUGHAN: … to this day, but at the same time, like, we can’t take credit for any of this. It’s, like, a community effort.

WALLACH: No.

WORTMAN VAUGHAN: It’s the community that has really kept us going …

WALLACH: Yes.

WORTMAN VAUGHAN: … for the last 20 years,

WALLACH: Yes.

WORTMAN VAUGHAN: … so it’s great. Going to stop gushing now, but it’s amazing.

WALLACH: And it’s not just WiML that’s changed over the years. The entire industry has changed a ton, as well.

How has your research evolved as a result of these changes to the entire field of AI and machine learning and also from your own change from academia to industry?

WORTMAN VAUGHAN: It’s a great question. You know, we’ve touched on this a little bit, but our research paths really evolved differently but ended up in these very similar places where we’re working on responsible AI, we’re advocating for interdisciplinary approaches, incorporating techniques from HCI, and so on. And I think that part of this was because of shifts of the community and also what’s happening in industry. Working in responsible AI in industry, there’s definitely not ever a shortage of interesting problems to solve, right.

And I think that for both of us, our research interests in recent years really have been driven by these really practical challenges that we’re seeing. We were both involved early on in defining what responsible AI means within Microsoft, shaping our internal Responsible AI Standard (opens in new tab). I led this internal companywide working group on AI transparency, which was focused both on model interpretability like we were talking about earlier but also other forms of transparency like data sheets for datasets and the transparency notes that Microsoft now releases with all of our products. And at the same time, you are leading this internal working group on fairness.

WALLACH: Yeah, taking on that internal working group was, kind of, a big transition point in my career. You know, when I joined Microsoft, I was focusing on computational social science and I was also entirely doing research and wasn’t really that involved in stuff in the rest of the company.

Then at the end of my first year at Microsoft, I attended the first Fairness, Accountability, and Transparency in Machine Learning workshop (opens in new tab), which was co-located with NeurIPS. It was one of the NeurIPS workshops. And I got really excited about that and thought, great, I’m going to spend like 20% of my time, maybe one day a week, doing research on topics in the space of fairness and accountability and transparency. Um, that is not what ended up happening.

Over the next couple of years, I ended up doing more and more research on responsible AI, you know, as you said, on topics to do with fairness, to do with interpretability. And then in early 2018, I was asked to co-chair this internal working group on fairness, and that was the point where I started getting much more involved in responsible AI stuff across Microsoft, so outside of just Microsoft Research.

And this was really exciting to me because responsible AI was so new, which meant that research had a really big role to play. It wasn’t like this was kind of an established area where folks in engineering and policy knew exactly what they were doing. And so that meant that I got to branch out from this very, sort of, research-focused work into much more applied work in collaboration with folks from policy, from engineering, and so on.

Now, in fact, as well as being a researcher, I actually run a small applied science team, the Sociotechnical Alignment Center, or STAC for short, within Microsoft Research that focuses specifically on bridging research and practice in responsible AI.

WORTMAN VAUGHAN: Yeah. Do you think that your involvement in WiML has played a role in this work?

WALLACH: Yes, definitely. [LAUGHS] Yeah, without a doubt. So particularly when working on topics related to fairness, I’ve ended up focusing a bunch on stuff to do with marginalized groups as part of my responsible AI work.

So there’s been this, sort of, you know, focus on marginalized groups, particularly women, in the context of machine learning and with my WiML, kind of, work and then in my research work thinking about fairness, as well.

The other way that WiML has really, sort of, affected what I do is that I work with a much more varied group of people nowadays than I did back when I was just focusing on, kind of, machine learning, computational social science, and stuff like that. And many of my collaborators are people that I’ve met through WiML over the years.

WORTMAN VAUGHAN: And, of course, there has been another big shift within industry recently, which is just all the excitement around generative AI. Can you say a bit about how that has changed your research?

WALLACH: OK, yeah. So this is another big one. There are so many ways that this changed my work. One of the biggest ways, though, is that generative AI systems are now everywhere. They’re being used all over the place for all kinds of things. And, you know, you see all these news headlines about GenAI systems, you know, diagnosing illnesses, solving math problems, and writing code, stuff like that. And also headlines about various different risks that can occur when you’re using generative AI. So fabricating facts, memorizing copyrighted data, generating harmful content, you know, these kinds of things. And with all this attention, it’s really natural to ask, what is the evidence behind these claims? So where is this evidence coming from, and should we trust it?

It turns out that much of the evidence comes from GenAI evaluations that involve measuring the capabilities, the behaviors, and the impacts of GenAI systems, but the current evaluation practices that are often used in the space don’t really have as much scientific rigor as we would like, and that’s, kind of, a problem.

So one of the biggest challenges is that the concepts of interest when people are, sort of, doing these GenAI evaluations—so things like diagnostic ability, memorization, harmful content, concepts like that—are much more abstract than the concepts like prediction accuracy that underpinned machine learning evaluations before the generative AI era.

And when we look at these new concepts that we need to be able to focus on in order to evaluate GenAI systems, we see that they’re actually much more reminiscent of these abstract contested concepts—these, kind of, fuzzy, squishy concepts—that are studied in the social sciences. So things like democracy and political science or personality traits and psychometrics. So there’s really that, sort of, connection there to these, kind of, squishier things.

So when I was focusing primarily on computational social science, most of my work was focused on developing machine learning methods to help social scientists measure abstract contested concepts. So then when GenAI started to be a big thing and I saw all of these evaluative claims involving measurements of abstract concepts, it seemed super clear to me that if we were going to actually be able to make meaningful claims about what AI can do and can’t do, we’re going to need to take a different approach to GenAI evaluation.

And so I ended up, sort of, drawing on my computational social science work around measurement and I started advocating for adopting a variant of the framework that social scientists use for measuring abstract contested concepts. And my reason for doing this was that I believed—I still believe—that this is an important way to improve the scientific rigor of GenAI evaluations.

You know all of this, of course, because you and I, along with a bunch of other collaborators at Microsoft Research and Stanford and the University of Michigan published a position paper on this framework entitled “Evaluating GenAI Systems is a Social Science Measurement Challenge” at ICML [International Conference on Machine Learning] this past summer.

What are you excited about at the moment?

WORTMAN VAUGHAN: Yeah, so lately, I have been spending a lot of time thinking about AI and critical thought: how can we design AI systems to support appropriate reliance, preserve human agency, and really encourage critical engagement on the part of the human, right?

So this is an area where I think we actually have a huge opportunity, but there are also huge risks. If I think about my most optimistic possible vision of the future of AI —which is not something that is easy for me to do, as I’m not a natural optimist, as you know—it would be a future in which AI helps people grow and flourish, in which it, kind of, enriches our own human capabilities. It deepens our own human thinking and safeguards our own agency.

So in this future, you know, we could build AI systems that actually help us brainstorm and learn new knowledge and skills, both in formal educational settings and in our day-to-day work, as well. But I think we’re not going to achieve this future by default. It’s something that we really need to design for if we want to get there.

WALLACH: You mentioned that there are risks. What are the risks that you can see here?

WORTMAN VAUGHAN: Yeah, there’s so much at stake here. You know, in the short term, there are things like overreliance—depending on the output of an AI system even when the system’s wrong. This is something that I’ve worked on a bunch myself. There’s a risk of loss of agency or the ability to make and execute independent decisions and to ensure that our outcomes of AI systems are aligned with personal or professional values of the humans who are using those systems. This is something that I’ve been looking at recently in the context of AI tools for journalism (opens in new tab). There’s diminished innovation, by which I mean a loss of creativity or diversity of ideas.

You know, longer term, we risk atrophied skills—people just losing or simply never developing helpful skills for their career or their life because of prolonged use of AI systems. The famous example that people often bring up here is pilots losing the ability to perform certain actions in flight because of dependence on autopilot systems. And I think we’re already starting to see the same sort of thing happen across all sorts of fields because of AI.

And, you know, finally, another risk that I’ll mention that seems to resonate with a lot of folks I talk to is what I would just call loss of joy, right. What happens when we are delegating to AI systems the parts of our activities that we really take pleasure and find this satisfaction in doing ourselves.

WALLACH: So then as a community, what should we be doing if we’re worried about these risks?

WORTMAN VAUGHAN: Yeah, I mean, I think this is going to have to be a big community effort if we want to achieve this. This is a big goal. But there are a few places I think we especially need work.

So I think we need generalized principles and practices for AI system builders for how they can build AI systems in ways that promote human agency and encourage critical thought. We also need principles and practices for system users. So how do we teach the general population to use AI in ways that amplify their skills and capabilities and help them learn new things?

And then, you know, close to your heart, I’m sure, I think that we need more work on measurement and evaluation, right. We are once again back to these squishy human properties.

You know, I mentioned I’ve done some work on overreliance in generative AI systems, and I started there because on the grand scale of risks here, overreliance is something that is relatively easy to measure, at least in the short term. But how do we start thinking about measuring people’s critical thinking when using AI across all sorts of contexts and at scale and over long-time horizons, right? How do we measure the, sort of, longitudinal effect of AI systems just on our critical thought as a population?

And by the way, if anyone listening is going to be at the WiML workshop, I’ll actually be giving a keynote on this topic. And this is something I’m just incredibly excited about because first, I’m incredibly excited about this topic, but also, in the whole 20 years of WiML, I’ve given opening remarks and similar several times, but this is actually the very first time that I will be talking about my own research there. So this is like my dream. I’m thrilled that this is happening.

WALLACH: That’s awesome. Oh, that’s so exciting. Excellent.

So one last question for you. If you could go back and talk to yourself 20 years ago and give yourself some advice, what would you say?

WORTMAN VAUGHAN: Yeah, OK, I’ve thought about this one a bit over the past week, and there are three things here I want to mention.

So first, I would tell myself to be brave about speaking up. You know I’m about as introverted as it gets and I’m naturally very shy, and this has always held me back. It still holds me back now. It was really embarrassingly late in my career that I decided to do something about this and start to develop strategies to help myself speak up more. And eventually, it started to grow into something that’s a little bit more natural.

WALLACH: What kind of, um, what kind of strategies?

WORTMAN VAUGHAN: Yeah, so you know, one example is I use a lot of notes. For this podcast, I have a lot of notes here. I’m a big notes person, and things like that really help me.

The second thing that I would tell myself is to, you know, work on the problems that you really want to see solved. As researchers, we have this amazing freedom to choose our own direction. And early on, you know, a lot of the problems that I worked on were problems that I really enjoyed thinking about on a day-to-day basis. It was a lot of fun. They were like little math puzzles to me. But I often found that, you know, when I would be at conferences and people would ask me about my work, I didn’t really want to talk about these problems. I just in some sense, you know, I had fun doing it, but I didn’t really care. I wasn’t passionate about it. I didn’t care that I had solved the problem.

And so once, many years ago now, when I was thinking about my research agenda, I got some good advice from our former lab director, Jennifer Chayes, who suggested that I go through my recent projects and sort them into projects where I really liked working on them—it was a fun experience day-to-day—and projects that I liked talking about after the fact and, kind of, felt good about the results and then see where the overlap is. And this is something that, like, it kind of sounds, kind of, obvious when I say it now, but at the time, it was really eye-opening for me.

WALLACH: That’s so cool. And now I, kind of, want to do that with all of my projects, particularly at the moment. I actually just took five months, as you know, five months off of work for parental leave because I just had a baby. And so I’m, sort of, taking a big, kind of, inventory of everything as I get back into all of this now, and I love this idea. I think this is really cool.

WORTMAN VAUGHAN: It’s changed really my whole approach to research. Like, you know, we were talking about this, but most of the work I do now is more HCI than machine learning because I found that the problems that really motivate me, that I want to be talking to people about at conferences, are the people problems.

The third piece of advice I would give myself is that you should bring more people into your work, right.

So there’s this kind of vision on the outside of research being this solo endeavor, and it can feel so competitive at times, right. We all feel this. But time and time again, I’ve seen that the best research comes from collaborations and from bringing people together with diverse perspectives who can challenge each other in a way that is respectful but makes the work better.

Is there advice that you would give to your former self of 20 years ago?

WALLACH: Yeah. OK. So I’ve also been thinking about this a bunch over the past week. There’s actually a lot of advice I think I would give my former self, [LAUGHS] but there are three things that I keep coming back to.

OK, so first—and this is similar to your second point—push for doing the work that you find to be most fulfilling even if that means taking a nontraditional path. So in my case, I’ve always been interested in the social sciences. Back when I was a student, you know, even when I was a PhD student, doing research that combined computer science and the social sciences just wasn’t really a thing. And so as a result, it would have been really easy for me to just be like, “Oh well, I guess that isn’t possible. I’ll just focus on traditional computer science problems.”

But that’s not what I ended up doing. Instead, and often in ways that made my career, kind of, harder than it probably would have been otherwise, I ended up pushing. I kept pushing, and in fact, I keep pushing, even nowadays, to bring these things together—computer science and the social sciences—in an interdisciplinary fashion. And this hasn’t been easy. But cumulatively, the effect has been that I’ve been able to do much more impactful work than I think I would have been able to do otherwise, and the work I’ve done, I’ve just enjoyed so much more than would otherwise have been the case.

OK, so second, be brave and share your work. So this is actually advice for my current self and my former self, as this is something that I definitely still struggle with.

WORTMAN VAUGHAN: As do I, you know, and actually, I think it’s funny to hear you say this because I would say that you are much better at this than I am.

WALLACH: I still, I think I have a lot of work to do on this one. Yeah, it’s hard. It’s really hard.

As you know, I am a perfectionist, and this is good in some ways, but this is also bad in other ways. And one way in which this is bad is that I tend to be really anxious about sharing and publicizing my work, especially when I feel it’s not perfect.

So as an example, I wrote this massive tutorial on computational social science for ICML in 2015, but I never put the slides … and I wrote a whole script for it … I never put the slides or the script online as a resource for others because I felt it needed more work. And I actually went back and looked at it earlier this year, when we were working on the ICML paper, and I was stunned because it’s great. Why didn’t I put this online? All these things that I thought were problems 10 years ago, no, they’re not a big deal. I should have just shared it.

As another example, STAC, my applied science team, was using LLMs as part of our approach to GenAI evaluation back in 2022, way before the sort of “LLM-as-a-judge” paradigm was widespread. But I was really worried that others would think negatively of us for doing this, so we didn’t share that much about what we were doing, and I regret that because we missed out on an opportunity to kick off an industrywide discussion about this, sort of, LLM-as-a-judge paradigm.

OK, so then my third point is that the social side of research is just as valuable as the technical side. And by this, I’m actually not talking about social science and computer science. I actually mean that the how of doing research, including who you talk to, who you collaborate with, and how you approach those interactions, is just as important as the research itself.

As a PhD student, I felt really bad about spending time socializing with other researchers, especially at conferences, because I thought that I was supposed to be listening to talks, reading papers, and discussing technical topics with researchers and not socializing. But in hindsight, I think that was wrong. Many of those social connections have ended up being incredibly valuable to my research, both because I’ve ended up collaborating with and in some cases even hiring the people who I first got to know socially …

WORTMAN VAUGHAN: Yeah.

WALLACH: … but also because the friendships that I’ve built, like our friendship, for example, have served as a crucial support network over the years, especially when things have felt particularly challenging.

WORTMAN VAUGHAN: Yeah, absolutely. I agree with all of that so much.

And with that, I will say thank you so much for doing this podcast with me today.

WALLACH: Thank you.

WORTMAN VAUGHAN: It was a lot of fun to reflect on the last 20 years of WiML, but also the last 20 years of our careers and friendship and all of this, so it’s great, and I never would have agreed to do this if it had been with anyone but you.

WALLACH: Likewise. [LAUGHS]

So thank you, everybody, for listening to us, and hopefully some of you will join for the 20th annual workshop for Women in Machine Learning (opens in new tab), which is taking place on Dec. 2. And of course, Jenn and I will both be there in person. We’ll also be at NeurIPS afterwards. So feel free to reach out to us if you want to chat with us or to learn more about anything that we covered here today.

[MUSIC]

OUTRO: You’ve been listening to Ideas, a Microsoft Research Podcast. Find more episodes of the podcast at aka.ms/researchpodcast (opens in new tab).

[MUSIC FADES]


[1] Wallach later clarified that the number of registrants for the 2005 Conference on Neural Information Processing Systems was around 900.

The post Ideas: Community building, machine learning, and the future of AI appeared first on Microsoft Research.

]]>
Ideas: More AI-resilient biosecurity with the Paraphrase Project http://approjects.co.za/?big=en-us/research/podcast/ideas-more-ai-resilient-biosecurity-with-the-paraphrase-project/ Mon, 06 Oct 2025 14:04:34 +0000 http://approjects.co.za/?big=en-us/research/?p=1151021 Microsoft’s Eric Horvitz and guests Bruce Wittmann, Tessa Alexanian, and James Diggans discuss the Paraphrase Project—a red-teaming effort that exposed and secured a biosecurity vulnerability in AI-driven protein design. The work offers a model for addressing AI’s dual-use risks.

The post Ideas: More AI-resilient biosecurity with the Paraphrase Project appeared first on Microsoft Research.

]]>

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

AI has been described as a “dual use” technology: the capabilities that can be leveraged for good can also potentially be used to cause harm. In this episode, Microsoft Chief Scientific Officer Eric Horvitz and his guests—Bruce Wittmann, a senior applied scientist at Microsoft; Tessa Alexanian (opens in new tab), a technical lead at the International Biosecurity and Biosafety Initiative for Science (IBBIS); and James Diggans (opens in new tab), a vice president at Twist Bioscience—explore this idea in the context of AI-powered protein design.

With Horvitz at the lead, Alexanian, Diggans, and Wittmann were part of a cross-sector team that demonstrated toxic protein candidates could be designed with help from AI—and that they could bypass the systems in place to defend against their creation. The project, known as the Paraphrase Project, culminated in a cybersecurity-style response, a more robust protein screening system, and a modified approach to peer review with implications for how we think about and tackle AI risk more broadly. The work was recently published in Science.

Transcript

[MUSIC]

ERIC HORVITZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Eric Horvitz, Microsoft’s chief scientific officer, and in this series, we explore the technologies shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

Today, I’m excited to talk about the Paraphrase Project, an effort I co-led exploring how advances in AI tools for protein design might impact biosecurity. The results were reported in our recent paper, “Strengthening nucleic acid biosecurity screening against generative protein design tools,” (opens in new tab) published in Science on Oct. 2. 

Joining me are three of the larger set of coauthors on that paper: Bruce Wittmann, senior applied scientist at Microsoft; James Diggans, vice president at Twist Bioscience and chair of the board for the International Gene Synthesis Consortium; and Tessa Alexanian, technical lead at the International Biosecurity and Biosafety Initiative for Science, also known as IBBIS

Now, let’s rewind two years. Almost to the day, Bruce and I uncovered a vulnerability. While preparing a case study for a workshop on AI and biosecurity, we discovered that open-source AI protein design tools could be used to redesign toxic proteins in ways that could bypass biosecurity screening systems, systems set up to identify incoming orders of concern. 

Now in that work, we created an AI pipeline from open-source tools that could essentially “paraphrase” the amino acid sequences—reformulating them while working to preserve their structure and potentially their function. 

These paraphrased sequences could evade the screening systems used by major DNA synthesis companies, and these are the systems that scientists rely on to safely produce AI-designed proteins. 

Now, experts in the field described this finding as the first “zero day” for AI and biosecurity. And this marked the beginning of a deep, two-year collaborative effort to investigate and address this challenge. 

With the help of a strong cross-sector team—including James, Tessa, Bruce, and many others—we worked behind the scenes to build AI biosecurity red-teaming approaches, probe for vulnerabilities, and to design practical fixes. These “patches,” akin to those in cybersecurity, have now been shared with organizations globally to strengthen biosecurity screening. 

This has been one of the most fascinating projects I’ve had the privilege to work on, for its technical complexity, its ethical and policy dimensions, and the remarkable collaboration across industry, government, and nonprofit sectors. 

The project highlights that the same AI tools capable of incredible good can also be misused, requiring us to be vigilant, thoughtful, and creative so we continue to get the most benefit out of AI tools while working to ensure that we avoid costly misuses. 

With that, let me officially welcome our guests.

Bruce, James, Tessa, welcome to the podcast.

BRUCE WITTMANN: Thanks, Eric.

JAMES DIGGANS: Thanks for having us.

HORVITZ: It’s been such a pleasure working closely with each of you, not only for your expertise but also for your deep commitment and passion about public health and global safety.

Before we dive into the technical side of things, I’d like to ask each of you, how did you get into this field? What inspired you to become biologists and then pursue the implications of advances in AI for biosecurity? Bruce?

WITTMANN: Well, I’ve always liked building things. That’s where I would say I come from. You know, my hobbies when I’m not working on biology or AI things—as you know, Eric—is, like, building things around the house, right. Doing construction. That kind of stuff.

But my broader interests have always been biology, chemistry. So I originally got into organic chemistry. I found that was fascinating. From there, went to synthetic biology, particularly metabolic engineering, because that’s kind of like organic chemistry, but you’re wiring together different parts of an organism’s metabolism rather than different chemical reactions. And while I was working in that space, I, kind of, had the thought of there’s got to be an easier way to do this [LAUGHS] because it is really difficult to do any type of metabolic engineering. And that’s how I got into the AI space, trying to solve these very complicated biological problems, trying to build things that we don’t necessarily even understand using our understanding from data or deriving understanding from data.

So, you know, that’s the roundabout way of how I got to where I am—the abstract way of how I got to where I am.

HORVITZ: And, Tessa, what motivated you to jump into this area and zoom into biology and biosciences and helping us to avoid catastrophic outcomes?

ALEXANIAN: Yeah, I mean, probably the origin of me being really excited about biology is actually a book called [The] Lives of [a] Cell (opens in new tab) by Lewis Thomas, which is an extremely beautiful book of essays that made me be like, Oh, wow, life is just incredible. I think I read it when I was, you know, 12 or 13, and I was like, Life is incredible. I want to work on this. This is the most beautiful science, right. And then I, in university, I was studying engineering, and I heard there was this engineering team for engineering biology—this iGEM (opens in new tab) team—and I joined it, and I thought, Oh, this is so cool. I really got to go work in this field of synthetic biology.

And then I also tried doing the wet lab biology, and I was like, Oh, but I don’t like this part. I don’t actually, like, like babysitting microbes. [LAUGHTER] I think there’s a way … some people who are great wet lab biologists are made of really stern stuff. And they really enjoy figuring out how to redesign their negative controls so they can figure out whether it was contamination or whether it was, you know, temperature fluctuation. I’m not that, apparently.

And so I ended up becoming a lab automation engineer because I could help the science happen, but I … but my responsibilities were the robots and the computers rather than the microbes, which I find a little bit intransigent.

HORVITZ: Right. I was thinking of those tough souls; they also used their mouths to do pipetting and so on of these contaminated fluids …

WITTMANN: Not anymore. ALEXANIAN: It’s true. [LAUGHTER]

DIGGANS: Not anymore. [LAUGHS]

ALEXANIAN: They used to be tougher. They used to be tougher.

HORVITZ: James.

DIGGANS: So I did my undergrad in computer science and microbiology, mostly because at the time, I couldn’t pick which of the two I liked more. I liked them both. And by the time I graduated, I was lucky enough that I realized that the intersection of the two could be a thing. And so I did a PhD in computational biology, and then I worked for five years at the MITRE Corporation. It’s a nonprofit. I got the chance to work with the US biodefense community and just found an incredible group of people working to protect forces and the population at large from biological threats and just learned a ton about both biology and also dual-use risk. And then so when Twist called me and asked if I wanted to join Twist and set up their biosecurity program, I leapt at the chance and have done that for the past 10 years.

HORVITZ: Well, thanks everyone.

I believe that AI-powered protein design in particular is one of the most exciting frontiers of modern science. It holds promise for breakthroughs in medicine, public health, even material science. We’re already seeing it lead to new vaccines, novel therapeutics, and—on the scientific front—powerful insights into the machinery of life.

So there’s much more ahead, especially in how AI can help us promote wellness, longevity, and the prevention of disease. But before we get too far ahead, while some of our listeners work in bioscience, many may not have a good understanding of some of the foundations.

So, Bruce, can you just give us a high-level overview of proteins? What are they? Why are they important? How do they figure into human-designed applications?

WITTMANN: Sure. Yeah. Fortunately, I used to TA a class on AI for protein design, so it’s right in my wheelhouse. [LAUGHS]

HORVITZ: Perfect, perfect background. [LAUGHS]

WITTMANN: It’s perfect. Yeah. I got to go back to all of that. Yeah, so from the very basic level, proteins are the workhorses of life.

Every chemical reaction that happens in our body—well, nearly every chemical reaction that happens in our body—most of the structure of our cells, you name it. Any life process, proteins are central to it.

Now proteins are encoded by what are known as … well, I shouldn’t say encoded. They are constructed from what are called amino acids—there are 20 of them—and depending on the combination and order in which you string these amino acids together, you get a different protein sequence. So that’s what we mean when we say protein sequence.

The sequence of a protein then determines what shape that protein folds into in a cell, and that shape determines what the protein does. So we will often say sequence determines structure, which determines function.

Now the challenge that we face in engineering proteins is just how many possibilities there are. For all practical purposes, it’s infinite. So we have 20 building blocks. There are on average around 300 amino acids in a protein. So that’s 20 to the power of 300 possible combinations. And a common reference point is that it’s estimated there are around 10 to the 80 particles in the observable universe. So beyond astronomical numbers of possible combinations that we could have, and the job of a protein engineer is to find that one or a few of the proteins within that space that do what we want it to do.

So when a human has an idea of, OK, here’s what I want a protein to do, we have various techniques of finding that desired protein, one of which is using artificial intelligence and trying to either sift through that milieu of potential proteins or, as we’ll talk about more in this podcast, physically generating them. So creating them in a way, sampling them out of some distribution of reasonable proteins.

HORVITZ: Great. So I wanted to throw it to James now to talk about how protein design goes from computer to reality—from in silico to test tubes. What role does Twist Bioscience (opens in new tab) play in transforming digital protein designs into synthesized proteins? And maybe we can talk also about what safeguards are in place at your company and why do we need them.

DIGGANS: So all of these proteins that Bruce has described are encoded in DNA. So the language that our cells use to kind of store the information about how to make these proteins is all encoded in DNA. And so if you as an engineer have designed a protein and you want to test it to see if it does what you think it does, the first step is to have the DNA that encodes that protein manufactured, and companies like Twist carry out that role.

So we are cognizant also, however, that these are what are called dual-use technologies. So you can use DNA and proteins for an incredible variety of amazing applications. So drug development, agricultural improvements, bioindustrial manufacturing, all manner of incredible applications. But you could also potentially use those to cause harm so toxins or other, you know, sort of biological misuse.

And so the industry has since at least 2010 recognized that they have a responsibility to make sure that when we’re asked to make some sequence of DNA that we understand what that thing is encoding and who we’re giving it for. So we’re screening both the customer that’s coming to us and we’re screening the sequence that they’re requesting.

And so Twist has long invested in a very, sort of, complicated system for essentially reverse engineering the constructs that we’re asked to make so that we understand what they are. And then a system where we engage with our customers and make sure that they’re going to use those for legitimate purpose and responsibly.

HORVITZ: And how do the emergence of these new generative AI tools influence how you think about risk?

DIGGANS: A lot of the power of these AI tools is they allow us to make proteins or design proteins that have never existed before in nature to carry out functions that don’t exist in the natural world. That’s an extremely powerful capability.

But the existing defensive tools that we use at DNA synthesis companies generally rely on what’s called homology, similarity to known naturally occurring sequences, to determine whether something might pose risk. And so AI tools kind of break the link between those two things.

HORVITZ: Now you also serve as chair of the International Gene Synthesis Consortium (opens in new tab). Can you tell us a little bit more about the IGSC, its mission, how it supports global biosecurity?

DIGGANS: Certainly. So the IGSC was founded in 2010[1] and right now has grown to more than 40 companies and organizations across 10 countries. And the IGSC is essentially a place where companies who might be diehard competitors in the market around nucleic acid synthesis come together and design and develop best practices around biosecurity screening to, kind of, support the shared interest we all have in making sure that these technologies are not subject to misuse.

HORVITZ: Thanks, James. Now, Tessa, your organization, IBBIS (opens in new tab) is focused—it’s a beautiful mission—on advancing science while minimizing catastrophic risk, likelihood of catastrophic risk. When we say catastrophic risk, what do we really mean, Tessa, in the context of biology and AI? And how is that … do you view that risk landscape as evolving as AI capabilities are growing?

ALEXANIAN: I think the … to be honest, as a person who’s been in biosecurity for a while, I’ve been surprised by how much of the conversation about the risks from advances in artificial intelligence has centered on the risk of engineered biological weapons and engineered pandemics.

Even recently, there was a new discussion on introducing redlines for AI that came up at the UN General Assembly. And the very first item they list in their list of risks, if I’m not mistaken, was engineered pandemics, which is exactly the sort of thing that people fear could be done, could be done with these biological AI tools.

Now, I think that when we talk about catastrophic risk, we talk about, you know, something that has an impact on a large percentage of humanity. And I think the reason that we think that biotechnologies pose a catastrophic risk is that we believe there, as we’ve seen with many historical pandemics, there’s a possibility for something to emerge or be created that is beyond our society’s ability to control.

You know, there were a few countries in COVID that managed to more or less successfully do a zero-COVID policy, but that was not, that was not most countries. That was not any of the countries that I lived in. And, you know, we saw millions of people die. And I think we believe that with something like the 1918 influenza, which had a much higher case fatality rate, you could have far more people die.

Now, why we think about this in the context of AI and where this connects to DNA synthesis is that, you know, there is a … these risks of both, sort of, public health risks, pandemic risks, and misuse risks—people deliberately trying to do harm with biology, as we’ve seen from the long history of biological weapons programs—you know, we think that those might be accelerated in a few different ways by AI technology, both the potential … and I say potential here because as everyone who has worked in a wet lab—which I think is everyone on this call—knows, engineering biology is really difficult. So there’s maybe a potential for it to become easier to develop biological technology for the purposes of doing harm, and there’s maybe also the potential to create novel threats.

And so I think people talk about both of those, and people have been looking hard for possible safeguards. And I think one safeguard that exists in this biosecurity world that, for example, doesn’t exist as cleanly in the cybersecurity world is that none of these biological threats can do harm until they are realized in physical reality, until you actually produce the protein or produce the virus or the microorganism that could do harm. And so I think at this point of production, both in DNA synthesis and elsewhere, we have a chance to introduce safeguards that can have a really large impact on the amount of risk that we’re facing—as long as we develop those safeguards in a way that keeps pace with AI.

HORVITZ: Well, thanks, Tessa. So, Bruce, our project began when I posed a challenge to you of the form: could current open-source AI tools be tasked with rewriting toxic protein sequences in a way that preserves their native structure, and might they evade today’s screening systems?

And I was preparing for a global workshop on AI and biosecurity that I’d been organizing with Frances Arnold, David Baker, and Lynda Stuart, and I wanted a concrete case study to challenge attendees. And what we found was interesting and deeply concerning.

So I wanted to dive in with you, Bruce, on the technical side. Can you describe some about the generative pipeline and how it works and what you did to build what we might call an AI and biosecurity red-teaming pipeline for testing and securing biosecurity screening tools?

WITTMANN: Sure. Yeah. I think the best place to start with this is really by analogy.

An analogy I often use in this case is the type of image generation AI tools we’re all familiar with now where I can tell the AI model, “Hey, give me a cartoonish picture of a dog playing fetch.” And it’ll do that, and it’ll give us back something that is likely never been seen before, right. That exact image is new, but the theme is still there. The theme is this dog.

And that’s kind of the same technology that we’re using in this red-teaming pipeline. Only rather than using plain language, English, we’re passing in what we would call conditioning information that is relevant to a protein.

So our AI models aren’t at the point yet where I can say, “Give me a protein that does x.” That would be the dream. We’re a long way from that. But what instead we do is we pass in things that match that theme that we’re interested in. So rather than saying, “Hey, give me back the theme on a dog,” we pass in information that we know will cause or at least push this generative model to create a protein that has the characteristics that we want.

So in the case of that example you just mentioned, Eric, it would be the protein structure. Like I mentioned earlier, we usually say structure determines function. There’s obviously a lot of nuance to that, but we can, at a first approximation, say structure determines function. So if I ask an AI model, ”Hey, here’s this structure; give me a protein sequence that folds to this structure,” just like with that analogy with the dog, it’s going to give me something that matches that structure but that is likely still never been seen before. It’s going to be a new sequence.

So you can imagine taking this one step further. In the red-teaming pipeline, what we would do is take a protein that should normally be captured by DNA synthesis screening—that would be captured by DNA synthesis screening—find its structure, pass it through one of these models, and get variants on the theme of that structure so these new sequences, these synthetic homologs that you mentioned, paraphrased, reformulated, whatever phrase we want to use to describe them.

And they have a chance or a greater chance than not of maintaining the structure and so maintaining the function while being sufficiently different that they’re not detected by these tools anymore.

So that’s the nuts and bolts of how the red-teaming pipeline comes together. We use more tools than just structure. I think structure is the easiest one to understand. But we have a suite of tools in there, each pass different conditioning information that causes the model to generate sequences that are paraphrased versions of potential proteins of concern.

HORVITZ: But to get down to brass tacks, what Bruce did for the framing study was … we took the toxic, well-known toxic protein ricin, as we described in a framing paper that’s actually part of the appendix now to the Science publication, and we generated through this pipeline, composed of open-source tools, thousands of AI-rewritten versions of ricin.

And this brings us to the next step of our project, way back when, at the early … in the early days of this effort, where Twist Bioscience was one of the companies we approached with what must have seemed like an unusual question to your CEO, in fact, James: would you be open to testing whether current screening systems could detect thousands of AI-rewritten versions of ricin, a well-known toxic protein?

And your CEO quickly connected me with you, James. So, James, what were your first thoughts on hearing about this project, and how did you respond to our initial framing study?

DIGGANS: I think my first response was gratitude and excitement. So it was fantastic that Microsoft had really leaned forward on this set of ideas and had produced this dataset. But to have it, you know, show up on our doorstep in a very concrete way with a partner that was ready to, sort of, help us try and address that, I think was a really … a valuable opportunity. And so we really leapt at that.

HORVITZ: And the results were that both for you and another company, major producer IDT [Integrated DNA Technologies], those thousands of variants flew through … flew under the radar of the biosecurity screening software as we covered in that framing paper.

Now, after our initial findings on this, we quietly shared the paper with a few trusted contacts, including some in government. Through my work with the White House Office of Science and Technology Policy, or OSTP, we connected up with biosecurity leads there, and it was an OSTP biosecurity lead who described our results as the first zero day in AI and biosecurity. And now in cybersecurity, a zero day is a vulnerability unknown to defenders generally, meaning there’s no time to respond before it could be exploited should it be known.

In that vein, we took a cybersecurity approach. We stood up a CERT—C-E-R-T—a cybersecurity [computer] emergency response team approach used in responding to cybersecurity vulnerabilities, and we implemented this process to address what we saw as a vulnerability with AI-enabled challenges to biosecurity.

At one point down the line, it was so rewarding to hear you say, James, “I’m really glad Microsoft got here first.” I’m curious how you think about this kind of AI-enabled vulnerability compared to other ones, biosecurity threats, you’ve encountered, and I’d love to hear your perspective on how we handled the situation from the early discovery to the coordination and outreach.

DIGGANS: Yeah, I think in terms of comparison known threats, the challenge here is really there is no good basis on which we can just, sort of, say, Oh, I’ll build a new tool to detect this concrete universe of things, right. This was more a pattern of I’m going to use tools—and I love the name “Paraphrase”; it’s a fantastic name—I can paraphrase anything that I would normally think of as biological … as posing biological risk, and now that thing is harder to detect for existing tools. And so that really was a very eye-opening experience, and I think the practice of forming this CERT response, putting together a group of people who were well versed not just in the threat landscape but also in the defensive technologies, and then figuring out how to mitigate that risk and broaden that study, I think, was a really incredibly valuable response to the entire synthesis industry.

HORVITZ: Yeah, and, Bruce, can you describe a little bit about the process by which we expanded the effort beyond our initial framing study to more toxins and then to a larger challenge set and then the results that we pursued and achieved?

WITTMANN: Yeah, of course. So, you know, using machine learning lingo, you don’t want to overfit to a single example. So early on with this, as part of the framing study, we were able to show or I should say James and coworkers across the screening field were able to show that this could be patched, right. We needed to just make some changes to the tools, and we could at the very least detect ricin or reformulated versions of ricin.

So the next step of course was then, OK, how generalizable are these patches? Can they detect other reformulated sequences, as well? So we had to expand the set of proteins that we had reformulated. We couldn’t just do 10s of thousands of ricins. We had to do 10s of thousands of name your other potentially hazardous …

HORVITZ: I think we had 72, was it?

WITTMANN: It was 72 in the end that we ended up at. I believe, James, it was you and maybe Jake, another one of the authors on the list … on the paper, who primarily put that list together …

HORVITZ: This is Jacob Beal … Jacob Beal at Raytheon BBN.

WITTMANN: I think James actually might be the better one to answer how this list was expanded.

DIGGANS: Initially the focus [was] on ricin as a toxin so that list expanded to 62 sort of commonly controlled toxins that are subject to an export control restriction or other concern. And then on top of that, we added 10 viral proteins. So we didn’t really just want to look at toxins. We also wanted to look at viral proteins, largely because those proteins tend to have multiple functions. They have highly constrained structures. And so if we could work in a toxin context, could Paraphrase also do the same for viral proteins, as well.

HORVITZ: And, Bruce, can you describe some about how we characterize the updates and the, we’ll say, the boost in capabilities of the patched screening tools?

WITTMANN: So we had, like you said, Eric, 72 base proteins or template proteins. And for each of those, we had generated a few 100 to a couple thousand reformulated variants of them. The only way to really get any sense of validity of those sequences was to predict their structures. So we predicted protein structures for I think it was 70ish thousand protein structures in the end that we had to predict and score them using in silico metrics. So things like, how similar is this to that template, wild-type protein structure that we used as our conditioning information?

We put them on a big grid. So we have two axes. We have on the x-axis—and this is a figure in our paper—the quality of the prediction. It’s essentially a confidence metric: how realistic is this protein sequence? And on the other axis is, how similar is the predicted structure of this variant to the original? And ultimately, what we were wanting to see was the proteins that scored well in both of those metrics, so that showed up in the top right of that diagram, were caught primarily, because these are again the ones that are most likely, having to say most likely, to retain function of the original.

So when you compare the original tools—Tool Series A, right, the unpatched tools—what you’ll find is varying degrees of success in the top right. It varied by tool. But in some cases, barely anything being flagged as potentially hazardous. And so improvement is then in the next series—Series B, the patched version of tools—we have more flagged in that upper-right corner.

HORVITZ: And we felt confident that we had a more AI-resilient screening solution across the companies, and, James, at this point, the whole team decided it was time to disclose the vulnerability as well as the patch details and pointers to where to go for the updated screening software and to communicate this to synthesis companies worldwide via the IGSC. This was probably July, I think, of 2024. What was that process like, and how did members respond?

DIGGANS: I think members were really grateful and excited. To present to that group, to say, hey, this activity (a) has gone on, (b) was successful, and (c) was kept close hold until we knew how to mitigate this, I think everyone was really gratified by that and comforted by the fact that now they had kind of off-the-shelf solutions that they could use to improve their resilience against any incoming heavily engineered protein designs.

HORVITZ: Thanks, James.

Now, I know that we all understand this particular effort to be important but a piece of the biosecurity and AI problem. I’m just curious to … I’ll ask all three of you to just share some brief reflections.

I know, Bruce, you’ve been on … you’ve stayed on this, and we’ve—all of us on the original team—have other projects going on that are pushing on the frontiers ahead of where we were with this paper when we published it.

Let me start with Tessa in terms of, like, what new risks do you see emerging as AI accelerates and maybe couple that with thoughts about how do we proactively get ahead of them.

ALEXANIAN: Yeah, I think with the Paraphrase’s work, as Bruce explained so well, you know, I sometimes use the metaphor of the previous response that the IGSC had to do, the synthesis screening community, where it used to be you could look for similarities to DNA sequences, and then everyone started doing synthetic biology where they were doing codon optimization so that proteins could express more efficiently in different host organisms, and now all of a sudden, well, you’ve scrambled your DNA sequence and it doesn’t look very similar even though your protein sequence actually still looks, you know, very similar or often the same once it’s been translated from DNA to protein, and so that was a, you know, many, many in the industry were already screening both DNA and protein, but they had to start screening … everybody had to start screening protein sequences even just to do the similarity testing as these codon optimization tools became universal.

I feel like we’re, kind of, in a similar transition phase with protein-design, protein-rephrasing, tools where, you know, these tools are still in many cases drawing from the natural distribution of proteins. You know, I think some of the work we saw in, you know, designing novel CRISPR enzymes, you go, OK, yeah, it is novel; it’s very unlike any one CRISPR enzyme. But if you do a massive multiple sequence alignment of every CRISPR enzyme that we know about, you’re like, OK, this fits in the distribution of those enzymes. And so, you know, I think we’re not … we’re having to do a more flexible form of screening, where we look for things that are kind of within distribution of natural proteins.

But I feel like broadly, all of the screening tools were able to respond by doing something like that. And I think … I still feel like the clock is ticking down on that and that as the AI tools get better at predicting function and designing, sort of, novel sequences to pursue a particular function, you know—you have tools now that can go from Gene Ontology terms to a potential structure or potential sequence that may again be much farther out of the distribution of natural protein—I think all of us on the screening side are going to have to be responding to that, as well.

So I think I see this as a necessary ongoing engagement between people at the frontier of designing novel biology and people at the frontier of producing all of the materials that allow that novel biology to be tested in the lab. You know, I think this feels like the first, you know, detailed, comprehensive zero day disclosure and response. But I think that’s … I think we’re going to see more of those. And I think what I’m excited about doing at IBBIS is trying to encourage and set up more infrastructure so that you can, as an AI developer, disclose these new discoveries to the people who need to respond before the publication comes out.

HORVITZ: Thank you, Tessa.

The, the … Bruce, I mean, you and I are working on all sorts of dimensions. You’re leading up some efforts at Microsoft, for example, on the foundation model front and so on, among other directions. We’ve talked about new kinds of embedding models that might go beyond sequence and structure. Can you talk a little bit about just a few of the directions that just paint the larger constellation of the kinds of things that we talk about when we put our worry hats on?

WITTMANN: I feel like that could have its own dedicated podcast, as well. There’s a lot … [LAUGHTER] there’s a lot to talk about.

HORVITZ: Yeah. We want to make sure that we don’t tell the world that the whole problem is solved here.

WITTMANN: Right, right, right. I think Tessa said it really, really well in that most of what we’re doing right now, it’s a variant on a known theme. I have to know the structure that does something bad to be able to pass it in as context. I have to know some existing sequence that does something bad to pass it in.

And obviously the goal is to move away from that in benign applications, where when I’m designing something, I often want to design it because nothing exists [LAUGHS] that already does it. So we are going to be heading to this space where we don’t know what this protein does. It’s kind of a circular problem, right, where we’re going to need to be able to predict what some obscure protein sequence does in order to be able to still do our screening.

Now, the way that I think about this, I often think about it beyond just DNA synthesis screening. It’s one line of defense, and there needs to be many lines of defense that come into play here that go beyond just relying on this one roadblock. It’s a very powerful roadblock. It’s a very powerful barrier. But we need to be proactively thinking about how we broaden the scope of defenses. And there are lots of conversations that are ongoing. I won’t go into the details of them. Again, that would be its own podcast.

But primarily my big push—and I think this is emerging consensus in the field, though I don’t want to speak for everybody—is it needs to … any interventions we have need to come more at the systems level and less at the model level, primarily because this is such dual-use technology. If it can be used for good biological design, it can be used for bad biological design. Biology has no sense of morality. There is no bad protein. It’s just a protein.

So we need to think about this differently than how we would maybe think about looking at the outputs of that image generator model that I spoke about earlier, where I can physically look at an image and say, don’t want my model producing that, do want my model producing that. I don’t have that luxury in this space. So it’s a totally different problem. It’s an evolving problem. Conversations are happening about it, but the work is very much not done.

HORVITZ: And, James, I want to give you the same open question, but I’d like to apply what Bruce just said on system level and so on and in the spirit of the kind of things that you’re very much involved with internationally to also add to it, just get some comments on programs and policies that move beyond technical solutions for governance mechanisms—logging, auditing nucleic acid orders, transparency, various kinds—that might complement technical approaches like Paraphrase and their status today.

DIGGANS: Yeah, I’m very gratified that Bruce said that we, the synthesis industry, should not be the sole bulwark against misuse. That is very comforting and correct.

Yeah, so the US government published a guidance document in 2023 that essentially said you, the entire biotech supply chain, have a responsibility to make sure that you’re evaluating your customers. You should know your customer; you know that they’re legitimate. I think that’s an important practice.

Export controls are designed to minimize the movement of equipment and materials that can be used in support of these kinds of misuse activities. And then governments have really been quite active in trying to incentivize, you know, sort of what we would think of as positive behavior, so screening, for example, in DNA synthesis companies. The US government created a framework in 2024, and it’s under a rewrite now to basically say US research dollars will only go to companies who make nucleic acid who do these good things. And so that is using, kind of, the government-funding carrot to, kind of, continue to build these layers of defense against potential misuse.

HORVITZ: Thanks. Now, discussing risk, especially when it involves AI and biosecurity, isn’t always easy. As we’ve all been suggesting, some worry about alarming the public or arming bad actors. Others advocate for openness as a principle of doing science with integrity.

A phase of our work as we prepared our paper was giving serious thought to both the benefits and the risks of transparency about what it was that we were doing. Some experts encouraged full disclosure as important for enhancing the science of biosecurity. Other experts, all experts, cautioned against what are called information hazards, the risk of sharing the details to enable malevolent actions with our findings or our approach.

So we faced a real question: how can we support open science while minimizing the risk of misuse? And we took all the input we got, even if it was contradictory, very seriously. We carefully deliberated about a good balance, and even then, once we chose our balance and submitted our manuscript to Science, the peer reviewers came back and said they wanted some of the more sensitive details that we withheld with explanations as to why.

So this provoked some thinking out of the box about a novel approach, and we came up with a perpetual gatekeeping strategy where requests for access to sensitive methods and data and even the software across different risk categories would be carefully reviewed by a committee and a process for access that would continue in perpetuity.

Now, we brought the proposal to Tessa and her team at IBBIS—this is a great nonprofit group; look at their mission—and we worked with Tessa and her colleagues to refine a workable solution that was accepted by Science magazine as a new approach to handling information hazards as first demonstrated by our paper.

So, Tessa, thank you again for helping us to navigate such a complex challenge. Can you share your perspective on information hazards? And then walk us through how our proposed system ensures responsible data and software sharing.

ALEXANIAN: Yeah. And thanks, Eric.

It’s all of the long discussions we had among the group of people on this podcast and the other authors on the paper and many people we engaged, you know, technical experts, people in various governments, you know, we heard a lot of contradictory advice.

And I think it showed us that there isn’t a consensus right now on how to handle information hazards in biotechnology. You know, I think … I don’t want to overstate how much of a consensus there is in cybersecurity either. If you go to DEF CON, you’ll hear people about how they’ve been mistreated in their attempts to do responsible disclosure for pacemakers and whatnot. But I think we’re … we have even less of a consensus when it comes to handling biological information.

You know, you have some people who say, oh, because the size of the consequences could be so catastrophic if someone, you know, releases an engineered flu or something, you know, we should just never share information about this. And then you have other people who say there’s no possibility of building defenses unless we share information about this. And we heard very strong voices with both of those perspectives in the process of conducting this study.

And I think what we landed on that I’m really excited about and really excited to get feedback on now that the paper is out, you know, if you go and compare our preprint, which came out in December of 2024, and this paper in October 2025, you’ll see a lot of information got added back in.

And I’m excited to see people’s reaction to that because even back in January 2025, talking with people who were signatories to the responsible biodesign commitments, they were really excited that this was such an empirically concrete paper because they’d maybe read a number of papers talking about biosecurity risks from AI that didn’t include a whole lot of data, you know, often, I think, because of concerns about information hazards. And they found the arguments in this paper are much more convincing because we are able to share data.

So the process we underwent that I felt good about was trying to really clearly articulate, when we talk about an information hazard, what are we worried about being done with this data? And if we put this data in public, completely open source, does it shift the risk at all? You know, I think doing that kind of marginal contribution comparison is really important because it also let us make more things available publicly.

But there were a few tiers of data that after a lot of discussion amongst the authors of the paper, we thought, OK, potentially someone who wanted to do harm, if they got access to this data, it might make it easier for them. Again, not necessarily saying it, you know, it opens the floodgates, but it might make it easier for them. And when we thought about that, we thought, OK, you know, giving all of those paraphrased protein sequences, maybe, maybe that, you know, compared to having to set up the whole pipeline with the open-source tools yourself, just giving you those protein sequences, maybe that makes your life a bit easier if you’re trying to do harm.

And then we thought, OK, giving you those protein sequences plus whether or not they were successfully flagged, maybe that makes your life, you know, quite a bit easier. And then finally, we thought, OK, the code that we want to share with some people who might try to reproduce these results or might try to build new screening systems that are more robust, we want to share the code with them. But again, if you have that whole code pipeline just prepared for you, it might really help make your life easier if you’re trying to do harm.

And so we, sort of, sorted the data into these three tiers and then went through a process actually very inspired by the existing customer screening processes in nucleic acid synthesis about how to determine, you know, we tried to take an approach not of what gets you in but what gets you out. You know, for the most part, we think it should be possible to access this data.

You know, if you have an affiliation with a recognizable institution or some good explanation of why you don’t have one right now, you know, if you have a reason for accessing this data, it shouldn’t be too hard to meet those requirements, but we wanted to have some in place. And we wanted it to be possible to rule out some people from getting access to this data. And so we’ve tried to be extremely transparent about what those are. If you go through our data access process and for some reason you get rejected, you’ll get a list of, “Here’s the reasons we rejected you. If you don’t think that’s right, get back to us.”

So I’m really excited to pilot this in part because I think, you know, we’re already in conversations with some other people handling potential bio-AI information hazard about doing a similar process for their data of, you know, tiering it, determining which gates to put in which tiers, but I really hope a number of people do get access through the process or if they try and they fail, they tell us why. Because I think as we move toward this world of potentially, you know, biology that is much easier to engineer, partly due to dual-use tools, you know, my dream is it’s, like, still hard to engineer harm with biology, even if it’s really easy to engineer biology. And I think these, kind of, new processes for managing access to things, this sort of like, you know, open but not completely public, I think those can be a big part of that layered defense.

HORVITZ: Thanks, Tessa. So we’re getting close to closing, and I just thought I would ask each of you to just share some reflections on what we’ve learned, the process we’ve demonstrated, the tools, the policy work that we did, this idea of facing the dual-use dilemma with … even at the information hazard level, with sharing information versus withholding it. What do you think about how our whole end to end of the study, now reaching the two-year point, can help other fields facing dual-use dilemmas?

Tessa, Bruce, James … James, have you ever thought about that? And we’ll go to Bruce and then Tessa.

DIGGANS: Yeah, I think it was an excellent model. I would like to see a study like this repeated on a schedule, you know, every six months because from where I sit, you know, the tools that we used for this project are now two years old. And so capabilities have moved on. Is the picture the same in terms of defensive capability? And so using that model over time, I think, would be incredibly valuable. And then using the findings to chart, you know, how much should we be investing in alternative strategies for this kind of risk mitigation for AI tool … the products of AI tools?

HORVITZ: Bruce.

WITTMANN: Yeah, I think I would extend on what James said. The anecdote I like to point out about this project is, kind of, our schedule. We found the vulnerability and it was patched within a week, two weeks, on all major synthesis screening platforms. We wrote the paper within a month. We expanded on the paper within two months, and then we spent a year and a half to nearly two years [LAUGHS] trying to figure out what goes into the paper; how do we release this information; you know, how do we do this responsibly?

And my hope is similar to what James said. We’ve made it easier for others to do this type of work. Not this exact work; it doesn’t have to necessarily do with proteins. But to do this type of work where you are dealing with potential hazards but there is also value in sharing and that hopefully that year and a half we spent figuring out how to appropriately share and what to share will not be a year and a half for other teams because these systems are in place or at least there is an example to follow up from. So that’s my takeaway.

HORVITZ: Tessa, bring us home—bring us home! [LAUGHS]

ALEXANIAN: Bring us home! Let’s do it faster next time. [LAUGHTER] Come talk to any of us if you’re dealing with this kind of stuff. You know, I think IBBIS, especially, we want to be a partner for building those layers of defense and, you know, having ripped out our hair as a collective over the past year and a half about the right process to follow here, I think we all really hope it’ll be faster next time.

And I think, you know, the other thing I would encourage is if you’re an AI developer, I would encourage you to think about how your tool can strengthen screening and strengthen recognition of threats.

I know James and I have talked before about how, you know, our Google search alerts each week send us dozens of cool AI bio papers, and it’s more like once a year or maybe once every six months, if we’re lucky, that we get something that’s like applying AI bio to biosecurity. So, you know, if you’re interested in these threats, I think we’d love to see more work that’s directly applied to facing these threats using the most modern technology.

HORVITZ: Well said.

Well, Bruce, James, Tessa, thank you so much for joining me today and for representing the many collaborators, both coauthors and beyond, who made this project possible.

It’s been a true pleasure to work with you. I’m so excited about what we’ve accomplished, the processes and the models that we’re now sharing with the world. And I’m deeply grateful for the collective intelligence and dedication that really powered the effort from the very beginning. So thanks again.

[MUSIC]

WITTMANN: Thanks, Eric.

DIGGANS: Thank you.

ALEXANIAN: Thank you.

[MUSIC FADES]


[1] The original organization was founded in 2009 and became the International Gene Synthesis Consortium in 2010.

The post Ideas: More AI-resilient biosecurity with the Paraphrase Project appeared first on Microsoft Research.

]]>
Ideas: Accelerating Foundation Models Research: AI for all http://approjects.co.za/?big=en-us/research/podcast/ideas-accelerating-foundation-models-research-ai-for-all/ Mon, 31 Mar 2025 13:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1134446 Innovative AI research often depends on access to resources. Microsoft wants to help. Technical Advisor Evelyne Viegas and distinguished faculty from two Minority Serving Institutions discuss the benefits of Microsoft’s Accelerating Foundation Models Research program in their lives and research.

The post Ideas: Accelerating Foundation Models Research: AI for all appeared first on Microsoft Research.

]]>
Microsoft Research Podcast | Ideas: Evelyne Viegas, Muhammed Idris, Cesar Torres

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

In this episode, host Gretchen Huizinga talks with three researchers about Accelerating Foundation Models Research (AFMR), a global research network and resource platform that allows members of the larger academic community to push the boundaries of AI foundation models and explore exciting and unconventional collaborations across disciplines and institutions. Evelyne Viegas, a technical advisor at Microsoft Research, shares her vision for the program from the Microsoft perspective, while Cesar Torres (opens in new tab), an assistant professor of computer science at the University of Texas at Arlington, and Muhammed Idris (opens in new tab), an assistant professor in the departments of medicine and public health at the Morehouse School of Medicine, tell their stories of how access to state-of-the-art foundation models is helping creative practitioners find inspiration from both their physical and virtual environments and making cancer-related health information more accessible and culturally congruent. The three recount their research journeys, including both frustrations and aspirations, and relate how AFMR resources have provided game-changing opportunities for Minority Serving Institutions and the communities they serve. 

  


Learn more:

Accelerating Foundation Models Research
Collaboration homepage

The Hybrid Atelier (opens in new tab)
Homepage, The University of Texas at Arlington

Announcing recipients of the AFMR Minority Serving Institutions grant
Microsoft Research Blog, January 30, 2024

 AI ‘for all’: How access to new models is advancing academic research, from astronomy to education (opens in new tab)
Microsoft Blog, March 12, 2024

The Morehouse Model: How One School of Medicine Revolutionized Community Engagement and Health Equity (opens in new tab) 
Book, July 10, 2020 

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOG]  

EVELYNE VIEGAS: So AFMR is really a program which enabled us to provide access to foundation models, but it’s also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that we’re not even really thinking about, right? So that’s how we started with AFMR.

CESAR TORRES: One of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And it’s a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, it’s typically the case that the tool is making something easier for us. So my big idea is to actually think about tools that are purposely making us slower, that have friction, that have errors, that have failures. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards.

MUHAMMED IDRIS: For me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these or how can these emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication. 

[TEASER ENDS] 

[MUSIC PLAYS] 

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and big ideas that propel them forward.

[MUSIC FADES] 

I’m excited to share the mic today with three guests to talk about a really cool program called Accelerating Foundation Models Research, or AFMR for short. With me is Cesar Torres, an assistant professor of computer science at the University of Texas, Arlington, and the director of a program called The Hybrid Atelier. More on that soon. I’m also joined by Muhammed Idris, an assistant professor of medicine at the Morehouse School of Medicine. And finally, I welcome Evelyne Viegas, a technical advisor at Microsoft Research. Cesar, Muhammed, Evelyne, welcome to Ideas! 

EVELYNE VIEGAS: Pleasure. 

CESAR TORRES: Thank you. 

MUHAMMED IDRIS: Thank you. 

HUIZINGA: So I like to start these episodes with what I’ve been calling the “research origin story” and since there are three of you, I’d like you each to give us a brief overview of your work. And if there was one, what big idea or larger than life person inspired you to do what you’re doing today? Cesar let’s start with you and then we’ll have Muhammed and Evelyne give their stories as well. 

CESAR TORRES: Sure, thanks for having me. So, I work at the frontier of creativity especially thinking about how technology could support or augment the ways that we manipulate our world and our ideas. And I would say that the origin of why I happened into this space can really come back down to a “bring your kid to work” day. [LAUGHTER] My dad, who worked at Maquiladora, which is a factory on the border, took me over – he was an accountant – and so he first showed me the accountants and he’s like look at the amazing work that these folks are doing. But the reality is that a lot of what they do is hidden behind spreadsheets and so it wasn’t necessarily the most engaging. Suffice to say I did not go into accounting like my dad! [LAUGHTER] But then he showed us the chemical engineer in the factory, and he would tell me this chemical engineer holds the secret formula to the most important processes in the entire company. But again, it was this black box, right? And I got a little bit closer when I looked at this process engineer who was melting metal and pulling it out of a furnace making solder and I thought wow, that’s super engaging but at the same time it’s like it was hidden behind machinery and heat and it was just unattainable. And so finally I saw my future career and it was a factory line worker who was opening boxes. And the way that she opened boxes was incredible. Every movement, every like shift of weight was so perfectly coordinated. And I thought, here is the peak of human ability. [LAUGHTER] This was a person who had just like found a way to leverage her surroundings, to leverage her body, the material she was working with. And I thought, this is what I want to study. I want to study how people acquire skills. And I realized … that moment, I realized just how important the environment and visibility was to being able to acquire skills. And so from that moment, everything that I’ve done to this point has been trying to develop technologies that could get everybody to develop a skill in the same way that I saw that factory line worker that day. 

HUIZINGA: Wow, well, we’ll get to the specifics on what you’re doing now and how that’s relevant in a bit. But thank you for that. So Muhammed, what’s the big idea behind your work and how did you get to where you are today? 

MUHAMMED IDRIS: Yeah, no. First off, Cesar, I think it’s a really cool story. I wish I had an origin story [LAUGHTER] from when I was a kid, and I knew exactly what my life’s work was going to be. Actually, my story, I figured out my “why” much later. Actually, my background was in finance. And I started my career in the hedge fund space at a company called BlackRock, really large financial institution you might have heard of. Then I went off and I did a PhD at Penn State. And I fully intended on going back. I was going to basically be working in spreadsheets for the rest of my life. But actually during my postdoc at the time I was living in Montreal, I actually had distant relatives of mine who were coming to Montreal to apply for asylum and it was actually in helping them navigate the process, that it became clear to me, you know, the role, it was very obvious to me, the role that technology can play in helping people help themselves. And kind of the big idea that I realized is that, you know, oftentimes, you know, the world kind of provides a set of conditions, right, that strip away our rights and our dignity and our ability to really fend for ourselves. But it was so amazing to see, you know, 10-, 12-year-old kids who, just because they had a phone, were able to help their families navigate what shelter to go to, how to apply for school, and more importantly, how do they actually start the rest of their lives? And so actually at the time, I, you know, got together a few friends, and, you know, we started to think about, well, you know, all of this information is really sitting on a bulletin board somewhere. How can we digitize it? And so we put together a pretty, I would say, bad-ass team, interdisciplinary team, included developers and refugees, and we built a prototype over a weekend. And essentially what happened was we built this really cool platform called Atar. And in many ways, I would say that it was the first real solution that leveraged a lot of the natural language processing capabilities that everyone is using today to actually help people help themselves. And it did that in three really important ways. The first way is that people could essentially ask what they needed help with in natural language. And so we had some algorithms developed that would allow us to identify somebody’s intent. Taking that information then, we had a set of models that would then ask you a set of questions to understand your circumstances and determine your eligibility for resources. And then from that, we’d create a customized checklist for them with everything that they needed to know, where to go, what to bring, and who to talk to in order to accomplish that thing. And it was amazing to see how that very simple prototype that we developed over a weekend really became a lifeline for a lot of people. And so that’s really, I think, what motivated my work in terms of trying to combine data science, emerging technologies like AI and machine learning, with the sort of community-based research that I think is important for us to truly identify applications where, in my world right now, it’s really studying health disparities. 

HUIZINGA: Yeah. Evelyne, tell us how you got into doing what you’re doing as a technical advisor. What’s the big idea behind what you do and how you got here? 

EVELYNE VIEGAS: So as a technical advisor in Microsoft Research, I really look for ideas out there. So ideas can come from anywhere. And so think it of scanning the horizon to look for some of those ideas out there and then figuring out, are there scientific hypotheses we should be looking at? And so the idea here is, once we have identified some of those ideas, the goal is really to help nurture a healthy pipeline for potential big bets. What I do is really about “subtle science and exact art” and we discover as we do and it involves a lot of discussions and conversations working with our researchers here, our scientists, but of course with the external research community. And how I got here … well first I will say that I am so excited to be alive in a moment where AI has made it to industry because I’ve looked and worked in AI for as long as I can remember with very different approaches. And actually as important, importantly for me is really natural languages which have enabled this big evolution. People sometimes also talk about revolution in AI, via the language models. Because when I started, so I was very fortunate growing up in an environment where my family, my extended family spoke different languages, but then it was interesting to see the different idioms in those natural languages. Just to give you an example, in English you say, it rains cats and dogs. Well, in France, in French it doesn’t mean anything, right? In French, actually, it rains ropes, right? Which probably doesn’t mean anything in English. [LAUGHTER] And so I was really curious about natural languages and communication. When I went to school, being good at math, I ended up doing math, realizing very quickly that I didn’t want to do a career in math. You know, proofs all that is good in high school, doing a full career, was not my thing, math. You know, proofs, all that. It’s good in high school, but doing a full career, it was not my thing, math. But there was that class I really, really enjoyed, which was mathematical logic. And so little by little, I started discovering people working in that field. And at the same time, I was still restless with natural languages. And so I also took some classes in linguistics on the humanity university in Toulouse in France. And I stumbled on those people who were actually working in … some in linguistics, some in computer science, and then there was this lab doing computational linguistics. And then that was it for me. I was like, that’s, you know, so that’s how I ended up doing my PhD in computational linguistics. And the last aspect I’ll talk about, because in my role today, the aspect of working with a network of people, with a global network, is still so important to me, and I think for science as a whole. At the time, there was this nascent field of computational lexical semantics. And for me, it was so important to bring people together because I realized that we all had different approaches, different theories, not even in France, but across the world, and actually, I worked with somebody else, and we co-edited the first book on computational lexical semantics, where we started exposing what it meant to do lexical semantics and the relationships between words within a larger context, with a larger context of conversations, discourse, and all those different approaches. And that’s an aspect which for me to this day is so important and that was also really important to keep as we develop what we’re going to talk about today, Accelerating Foundation Models Research program. 

HUIZINGA: Yeah, this is fascinating because I didn’t even know all of these stories. I just knew that there were stories here and this is the first time I’m hearing them. So it’s like this discovery process and the sort of pushing on a door and having it be, well, that’s not quite the door I want. [LAUGHTER] Let’s try door number two. Let’s try door number three. Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that. Evelyne, I want to stay with you on this for a minute because I’m curious as to how this initiative even came to exist and what it hopes to achieve. So, maybe start out with a breakdown of the title. It might be confusing for some people, Accelerating Foundation Models Research. What is it? 

VIEGAS: Yeah, thank you for the question. So I think I’m going to skip quickly on accelerate research. I think people can understand it’s just like to bring … 

HUIZINGA: Make it faster … 

VIEGAS: … well, faster and deeper advances. I mean, there are some nuances there, but I think the terms like foundation models, maybe that’s where I’ll start here. So when we talk about foundation models, just think about any model which has been trained on broad data, and which actually enables you to really do any task. That’s, I think, the simplest way to talk about it. And indeed, actually people talk a lot about large language models or language models. And so think of language models as just one part, right, for those foundation models. The term was actually coined at Stanford when people started looking at GPTs, the generative pre-trained transformers, this new architecture. And so that term was coined like to go not just talk about language models, but foundation models, because actually it’s not just language models, but there are also vision models. And so there are other types of models and modalities really. And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay. 

HUIZINGA: Yeah. Not to be confused with ASMR, which is that sort of tingly feeling you get in your head when you hear a good sound, but AFMR, yes. 

VIEGAS: So with the AFMR, so actually I need to come a little bit before that and just remind us that actually that this is not just new. The point I was making earlier about it’s so important to engage with the external research community in academia. So Microsoft Research has been doing it for as long as I’ve been at Microsoft and I’ve been 25 years, I just did 25 in January. 

HUIZINGA: Congrats! 

VIEGAS: And so, I … thank you! …  and so, it’s really important for Microsoft Research, for Microsoft. And so we had some programs even before the GPT, ChatGPT moment where we had engaged with the external research community on a program called the Microsoft Turing Academic Program where we provided access to the Turing model, which was a smaller model than the one then developed by OpenAI. But at that time, it was very clear that we needed to be responsible, to look at safety, to look at trustworthiness of those models. And so we cannot just drink our own Kool-Aid and so we really had to work with people externally. And so we were already doing that. But that was an effort which we couldn’t scale really because to scale an effort and having multiple people that can have access to the resources, you need more of a programmatic way to be able to do that and rely on some platform, like for instance, Azure, which has security and privacy, confidentiality which enables to scale those type of efforts. And so what happens as we’re developing this program on the Turing model with a small set of academic people, then there was this ChatGPT moment in November 2022, which was the moment like the “aha moment,” I think, as I mentioned, for me, it’s like, wow, AI now has made it to industry. And so for us, it became very clear that we could not with this moment and the amount of resources needed on the compute side, access to actually OpenAI that new that GPT, at the beginning of GPT-3 and then 4 and then … So how could we build a program? First, should we, and was there interest? And academia responded “Yes! Please! Of course!” right? [LAUGHTER] I mean, what are you waiting for? So AFMR is really a program which enabled us to provide access to foundation models, but it’s also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that we were not even really thinking about, right? So that’s how we started with AFMR. 

HUIZINGA: This is funny, again, on the podcast, you can’t see people shaking their heads, nodding in agreement, [LAUGHTER] but the two academic researchers are going, yep, that’s right. Well, Muhammed, let’s talk to you for a minute. I understand AFMR started a little more than a year ago with a pilot project that revolved around health applications, so this is a prime question for you. And since you’re in medicine, give us a little bit of a “how it started, how it’s going” from your perspective, and why it’s important for you at the Morehouse School of Medicine. 

IDRIS: For sure. You know, it’s something as we mentioned that really, I remember vividly is when I saw my first GPT-3 demo, and I was absolutely blown away. This was a little bit before the ChatGPT moment that Evelyne was mentioning, but just the possibilities, oh my God, were so exciting! And again, if I tie that back to the work that we were doing, where we were trying to kind of mimic what ChatGPT is today, there were so many models that we had to build, very complex architectures, edge cases that we didn’t even realize. So you could imagine when I saw that, I said, wow, this is amazing. It’s going to unlock so many possibilities. But at the same time, this demo was coming out, I actually saw a tweet about the inherent biases that were baked into these models. And I’ll never forget this. I think it was at the time he was a grad student at Stanford, and they were able to show that if you asked the model to complete a very simple sentence, a sort of joke, “Two Muslims walk into a bar …” what is it going to finish? And it was scary.  

HUIZINGA: Wow. 

IDRIS: Two thirds, it was about 66% of the time, the responses referenced some sort of violence, right? And that really was an “aha moment” for me personally, of course, not being that I’m Muslim, but beyond that, that there are all of these possibilities. At the same time, there’s a lot that we don’t know about how these models might operate in the real world. And of course, the first thing that this made me do as a researcher was wonder how do these emerging technologies, how may they unintentionally lead to greater health disparities? Maybe they do. Maybe they don’t. The reality is that we don’t know. 

HUIZINGA: Right. 

IDRIS: Now I tie that back to something that I’ve been fleshing out for myself, given my time here at Morehouse School of Medicine. And kind of what I believe is that, you know, the likely outcome, and I would say this is the case for really any sort of emerging technology, but let’s specifically talk about AI, machine learning, large language models, is that if we’re not intentional in interrogating how they perform, then what’s likely going to happen is that despite overall improvements in health, we’re going to see greater health disparities, right? It’s almost kind of that trickle-down economics type model, right? And it’s really this addressing of health disparities, which is at the core of the mission of Morehouse School of Medicine. It is literally the reason why I came here a few years ago. Now, the overarching goal of our program, without getting too specific, is really around evaluating the capabilities of foundation models. And those, course, as Evelyne mentioned, are large language models. And we’re specifically working on facilitating accessible and culturally congruent cancer-related health information. And specifically, we need to understand that communities that are disproportionately impacted have specific challenges around trust. And all of these are kind of obstacles to taking advantage of things like cancer screenings, which we know significantly reduce the likelihood of mortality. And it’s going very well. We have a pretty amazing interdisciplinary team. And I think we’ve been able to develop a pretty cool research agenda, a few papers and a few grants. I’d be happy to share about a little bit later. 

HUIZINGA: Yeah, that’s awesome. And I will ask you about those because your project is really interesting. But I want Cesar to weigh in here on sort of the goals that are the underpinning of AFMR, which is aligning AI with human values, improving AI-human interaction, and accelerating scientific discovery. Cesar, how do these goals, writ large, align with the work you’re doing at UT Arlington and how has this program helped? 

TORRES: Yeah, I love this moment in time that everybody’s been talking about, that GPT or large language model exposure. Definitely when I experienced it, the first thing that came to my head was, I need to get this technology into the hands of my students because it is so nascent, there’s so many open research questions, there’s so many things that can go wrong, but there’s also so much potential, right? And so when I saw this research program by Microsoft I was actually surprised. I saw that, hey, they are actually acknowledging the human element. And so the fact that there was this call for research that was looking at that human dimension was really refreshing. So like what Muhammad was saying, one of the most exciting things about these large language models is you don’t have to be a computer scientist in order to use them. And it reminded me to this moment in time within the arts when digital media started getting produced. And we had this crisis. There was this idea that we would lose all the skills that we have learned from working traditionally with physical materials and having to move into a digital canvas.  

HUIZINGA: Right. 

TORRES: And it’s kind of this, the birth of a new medium. And we’re kind of at this unique position to guide how this medium is produced and to make sure that people develop that virtuosity in being able to use that medium but also understand its limitations, right? And so one of the fun projects that we’ve done here has been around working with our glass shop. Specifically, we have this amazing neon-bending artists here at UTA, Jeremy Scidmore and Justin Ginsberg. We’ve been doing some collaborations with them, and we’ve been essentially monitoring how they bend glass. I run an undergraduate research program here and I’ve had undergrads try to tackle this problem of how do you transfer that skill of neon bending? And the fact is that because of AFMR, here is just kind of a way to structure that undergraduate research process so that people feel comfortable to ask those dumb questions exactly where they are. But what I think is even more exciting is that they start to see that questions like skill acquisition is still something that our AI is not able to do. And so it’s refreshing to see; it’s like the research problems have not all been solved. It just means that new ones have opened and ones that we previously thought were unattainable now have this groundwork, this foundation in order to be researched, to be investigated. And so it’s really fertile ground. And I really thank AFMR … the AFMR program for letting us have access to those grounds. 

HUIZINGA: Yeah. I’m really eager to get into both your projects because they’re both so cool. But Evelyne, I want you to just go on this “access” line of thought for a second because Microsoft has given grants in this program, AFMR, to several Minority Serving Institutions, or MSIs, as they’re called, including Historically Black Colleges and Universities and Hispanic Serving Institutions, so what do these grants involve? You’ve alluded to it already, but can you give us some more specifics on how Microsoft is uniquely positioned to give these and what they’re doing? 

VIEGAS: Yes. So the grant program, per se, is really access to resources, actually compute and API access to frontier models. So think about Azure, OpenAI … but also now actually as the program evolves, it’s also providing access to even our research models, so Phi, I mean if you … like smaller models … 

HUIZINGA: Yeah, P-H-I. 

VIEGAS: Yes, Phi! [LAUGHTER] OK! So, so it’s really about access to those resources. It’s also access to people. I was talking about this global research network and the importance of it. And I’ll come back to that specifically with the Minority Serving Institutions, what we did. But actually when we started, I think we started a bit in a naive way, thinking … we did an open call for proposals, a global one, and we got a great response. But actually at the beginning, we really had no participation from MSIs. [LAUGHTER] And then we thought, why? It’s open … it’s … and I think what we missed there, at the beginning, is like we really focused on the technology and some people who were already a part of the kind of, this global network, started approaching us, but actually a lot of people didn’t even know, didn’t think they could apply, right? And so we ended up doing a more targeted call where we provided not only access to the compute resources, access to the APIs to be able to develop applications or validate or expand the work which is being done with foundation models, but also we acknowledged that it was important, with MSIs, to also enable the students of the researchers like Cesar, Muhammed, and other professors who are part of the program so that they could actually spend the time working on those projects because there are some communities where the teaching load is really high compared to other communities or other colleges. So we already had a good sense that one size doesn’t fit all. And I think what came also with the MSIs and others, it’s like also one culture doesn’t fit all, right? So it’s about access. It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together. 

HUIZINGA: Yeah. Cesar let’s go over to you because big general terms don’t tell a story as well as specific projects with specific people. So your project is called, and I’m going to read this, AI-Enhanced Bricolage: Augmenting Creative Decision Making in Creative Practices. That falls under the big umbrella of Creativity and Design. So tell our audience, and as you do make sure to explain what bricolage is and why you work in a Hybrid Atelier, terms I’m sure are near and dear to Evelyne’s heart … the French language. Talk about that, Cesar. 

TORRES: So at UTA, I run a lab called The Hybrid Atelier. And I chose that name because “lab” is almost too siloed into thinking about scientific methods in order to solve problems. And I wanted something that really spoke to the ethos of the different communities of practice that generate knowledge. And so The Hybrid Atelier is a space, it’s a makerspace, and it’s filled with the tools and knowledge that you might find in creative practices like ceramics, glass working, textiles, polymer fabrication, 3D printing. And so every year I throw something new in there. And this last year, what I threw in there was GPT and large language models. And it has been exciting to see how it has transformed. But speaking to this specific project, I think the best way I can describe bricolage is to ask you a question: what would you do if you had a paperclip, duct tape, and a chewing gum wrapper? What could you make with that, right? [LAUGHTER] And so some of us have these MacGyver-type mentalities, and that is what Claude Lévi-Strauss kind of terms as the “bricoleur,” a person who is able to improvise solutions with the materials that they have at hand. But all too often, when we think about bricolage, it’s about the physical world. But the reality is that we very much live in a hybrid reality where we are behind our screens. And that does not mean that we cannot engage in these bricoleur activities. And so this project that I was looking at, it’s both a vice and an opportunity of the human psyche, and it’s known as “functional fixation.” And that is to say, for example, if I were to give you a hammer, you would see everything as a nail. And while this helps kind of constrain creative thought and action to say, okay, if I have this tool, I’m going to use it in this particular way. At the same time, it limits the other potential solutions, the ways that you could use a hammer in unexpected ways, whether it’s to weigh something down or like jewelers to texturize a metal piece or, I don’t know, even to use it as a pendulum … But my point here is that this is where large language models can come in because they can, from a more unbiased perspective, not having the cognitive bias of functional fixation say, hey, here is some tool, here’s some material, here’s some machine. Here are all the ways that I know people have used it. Here are other ways that it could be extended. And so we have been exploring, you know, how can we alter the physical and virtual environment in such a way so that this information just percolates into the creative practitioner’s mind in that moment when they’re trying to have that creative thought? And we’ve had some fun with it. I did a workshop at an event known as OurCS here at DFW. It’s a research weekend where we bring a couple of undergrads and expose them to research. And we found that it’s actually the case that it’s not AI that does better, and it’s also not the case that the practitioner does better! [LAUGHTER] It’s when they hybridize that you really kind of lock into the full kind of creative thought that could emerge. And so we’ve been steadily moving this project forward, expanding from our data sets, essentially, to look at the corpus of video tutorials that people have published all around the web to find the weird and quirky ways that they have extended and shaped new techniques and materials to advance creative thought. So … 

HUIZINGA: Wow.  

TORRES: … it’s been an exciting project to say the least. 

HUIZINGA: Okay, again, my face hurts because I’m grinning so hard for so long. I have to stop. No, I don’t because it’s amazing. You made me think of that movie Apollo 13 when they’re stuck up in space and this engineer comes in with a box of, we’ll call it bricolage, throws it down on the table and says, we need to make this fit into this using this, go. And they didn’t have AI models to help them figure it out, but they did a pretty good job. Okay, Cesar, that’s fabulous. I want Muhammed’s story now. I have to also calm down. It’s so much fun. [LAUGHTER] 

IDRIS: No, know I love it. I love it and actually to bring it back to what Evelyne was mentioning earlier about just getting different perspectives in a room, I think this is a perfect example of it. Actually, Cesar, I never thought of myself as being a creative person but as soon as you said a paperclip and was it the gum wrapper … 

HUIZINGA: Duct tape. 

IDRIS: … duct tape or gum wrapper, I thought to myself, my first internship I was able to figure out how to make two paper clips and a rubber band into a … this was of course before AirPods, right? But something that I could wrap my wires around and it was perfect! [LAUGHTER] I almost started thinking to myself, how could I even scale this, or maybe get a patent on it, but it was a paper clip … yeah. Uh, so, no, no, I mean, this is really exciting stuff, yeah. 

HUIZINGA: Well, Muhammed, let me tee you up because I want to actually … I want to say your project out loud … 

IDRIS: Please. 

HUIZINGA: … because it’s called Advancing Culturally Congruent Cancer Communication with Foundation Models. You might just beat Cesar’s long title with yours. I don’t know. [LAUGHTER] You include alliteration, which as an English major, that makes my heart happy, but it’s positioned under the Cognition and Societal Benefits bucket, whereas Cesar’s was under Creativity and Design, but I see some crossover. Evelyne’s probably grinning too, because this is the whole thing about research is how do these things come together and help? Tell us, Muhammed, about this cultury … culturally … Tell us about your project! [LAUGHTER] 

IDRIS: So, you know, I think again, whenever I talk about our work, especially the mission and the “why” of Morehouse School of Medicine, everything really centers around health disparities, right? And if you think about it, health disparities usually comes from one of many, but let’s focus on kind of three potential areas. You might not know you need help, right? If you know you need help, you might not know where to go. And if you end up there, you might not get the help that you need. And if you think about it, a lot of like the kind of the through line through all of these, it really comes down to health communication at the end of the day. It’s not just what people are saying, it’s how people are saying it as well. And so our project focuses right now on language and text, right? But we are, as I’ll talk about in a second, really exploring the kind of multimodal nature of communication more broadly and so, you know, I think another thing that’s important in terms of just background context is that for us, these models are more than just tools, right? We really do feel that if we’re intentional about it that they can be important facilitators for public health more broadly. And that’s where this idea of our project fitting under the bucket at benefiting society as a whole. Now, you know, the context is that over the past couple of decades, how we’ve talked about cancer, how we’ve shared health information has just changed dramatically. And a lot of this has to do with the rise, of course, of digital technologies more broadly, social media, and now there’s AI. People have more access to health information than ever before. And despite all of these advancements, of course, as I keep saying over and over again, not everyone’s benefiting equally, especially when it comes to cancer screening. Now, breast and cervical cancer, that’s what we’re focusing on specifically, are two of the leading causes of cancer-related deaths in women worldwide. And actually, black and Hispanic women in the US are at particular risk and disproportionately impacted by not just lower screening rates, but later diagnoses, and of course from that, higher mortality rates as well. Now again, an important part of the context here is COVID-19. I think there are, by some estimates, about 10 million cancer screenings that didn’t happen. And this is also happening within a context of just a massive amount of misinformation. It’s actually something that the WHO termed as an infodemic. And so our project is trying to kind of look for creative emerging technologies-based solutions for this. And I think we’re doing it in a few unique ways. Now the first way is that we’re looking at how foundation models like the GPTs but also open-source models and those that are, let’s say, specifically fine-tuned on medical texts, how do they perform in terms of their ability to generate health information? How accurate are they? How well is it written? And whether it’s actually useful for the communities that need it the most. We developed an evaluation framework, and we embedded within that some qualitative dimensions that are important to health communications. And we just wrapped up an analysis where we compared the general-purpose models, like a ChatGPT, with medical and more science-specific domain models and as you’d expect, the general-purpose models kind of produced information that was easier to understand, but that was of course at the risk of safety and more accurate responses that the medically tuned models were able to produce. Now a second aspect of our work, and I think this is really a unique part of not what I’ve called, but actually literally there’s a book called The Morehouse Model, is how is it that we could actually integrate communities into research? And specifically, my work is thinking about how do we integrate communities into the development and evaluation of language models? And that’s where we get the term “culturally congruent.” That these models are not just accurate, but they’re also aligned with the values, the beliefs, and even the communication styles of the communities that they’re meant to serve. One of the things that we’re thinking, you know, quite a bit about, right, is that these are not just tools to be published on and maybe put in a GitHub, you know, repo somewhere, right? That these are actually meant to drive the sort of interventions that we need within community. So of course, implementation is really key. And so for this, you know, not only do you need to understand the context within which these models will be deployed, the goal here really is to activate you and prepare you with information to be able to advocate for yourself once you actually see your doctor, right? So that again, I think is a good example of that. But you also have to keep in mind Gretchen that, you know, our goal here is, we don’t want to create greater disparities between those who have and those who don’t, right? And so for example, thinking about accessibility is a big thing and that’s been a part of our project as well. And so for example, we’re leveraging some of Azure API services for speech-to-text and we’re even going as far as trying to leverage some of the text-to-image models to develop visuals that address health literacy barriers and try to leverage these tools to truly, truly benefit health. 

HUIZINGA: One of the most delightful and sometimes surprising benefits of programs like AFMR is that the technologies developed in conjunction with people in minority communities have a big impact for people in majority communities as well, often called the Curb Cut Effect. Evelyne, I wonder if you’ve seen any of this happen in the short time that AFMR has been going? 

VIEGAS: Yeah, so, I’m going to focus a bit more maybe on education and examples there where we’ve seen, as Cesar was also talking about it, you know for scaling and all that. But we’ve seen a few examples of professors working with their students where English is not the first language.  

HUIZINGA: Yeah … 

VIEGAS: Another one I would mention is in the context of domains. So for domains, what I mean here is application domains, like not just in CS, but we’ve been working with professors who are, for instance, astronomers, or lawyers, or musicians working in universities. So they started looking actually at these LLMs as more of the “super advisor” helping them. And so it’s another way of looking at it. And actually they started focusing on, can we actually build small astronomy models, right? And I’m thinking, okay, that could … maybe also we learn something which could be potentially applied to some other domain. So these are some of the things we are seeing. 

HUIZINGA: Yes. 

VIEGAS: But I will finish with something which may, for me, kind of challenges this Curb Cut Effect to certain extent, if I understand the concept correctly, is that I think, with this technology and the way AI and foundation models work compared to previous technologies, I feel it’s kind of potentially the opposite. It’s kind of like the tail catching up with the head. But here I feel that with the foundation models, I think it’s a different way to find information and gain some knowledge. I think that actually when we look at that, these are really broad tools that now actually can be used to help customize your own curb, as it were! So kind of the other way around. 

HUIZINGA: Oh, interesting … 

VIEGAS: So I think it’s maybe there are two dimensions. It’s not just I work on something small, and it applies to everyone. I feel there is also a dimension of, this is broad, this is any tasks, and it enables many more people. I think Cesar and Muhammed made that point earlier, is you don’t have to be a CS expert or rocket scientist to start using those tools and make progress in your field. So I think that maybe there is this dimension of it. 

HUIZINGA: I love the way you guys are flipping my questions back on me. [LAUGHTER] So, and again, that is fascinating, you know, a custom curb, not a curb cut. Cesar, Muhammad, do you, either of you, have any examples of how perhaps this is being used in your work and you’re having accidental or serendipitous discoveries that sort of have a bigger impact than what you might’ve thought? 

TORRES: Well, one thing comes to mind. It’s a project that two PhD students in my lab, Adam Emerson and Shreyosi Endow have been working on. It’s around this idea of communities of practice and that is to say, when we talk about how people develop skills as a group, it’s often through some sort of tiered structure. And I’m making a tree diagram with my hands here! [LAUGHTER] And so we often talk about what it’s like for an outsider to enter from outside of the community, and just how much effort it takes to get through that gate, to go through the different rungs, through the different rites of passage, to finally be a part of the inner circle, so to speak. And one of the projects that we’ve been doing, we started to examine these known communities of practice, where they exist. But in doing this analysis, we realized that there’s a couple of folks out there that exist on the periphery. And by really focusing on them, we could start to see where the field is starting to move. And these are folks that have said, I’m neither in this community or another, I’m going to kind of pave my own way. While we’re still seeing those effects of that research go through, I think being able to monitor the communities at the fringe is a really telling sign of how we’re advancing as a society. I think shining some light into these fringe areas, it’s exactly how research develops, how it’s really just about expanding at some bleeding edge. And I think sometimes we just have to recontextualize that that bleeding edge is sometimes the group of people that we haven’t been necessarily paying attention to. 

HUIZINGA: Right. Love it. Muhammad, do you have a quick example … or, I mean, you don’t have to, but I just was curious. 

IDRIS: Yeah, maybe I’ll just give one quick example that I think keeps me excited, actually has to do with the idea of kind of small language models, right? And so, you know, I gave the example of GPT-3 and how it’s trained on the entirety of the internet and with that is kind of baked in some unfortunate biases, right? And so we asked ourselves the flip side of that question. Well, how is it that we can go about actually baking in some of the good bias, right? The cultural context that’s important to train these models on. And the reality is that we started off by saying, let’s just have focus groups. Let’s talk to people. But of course that takes time, it takes money, it takes effort. And what we quickly realized actually is there are literally generations of people who have done these focus groups specifically on breast and cervical cancer screening. And so what we actually have since done is leverage that real world data in order to actually start developing synthetic data sets that are … 

HUIZINGA: Ahhhh.  

IDRIS: … small enough but are of higher quality enough that allow us to address the specific concerns around bias that might not exist. And so for me, that’s a really like awesome thing that we came across that I think in trying to solve a problem for our kind of specific use case, I think this could actually be a method for developing more representative, context-aware, culturally sensitive models and I think overall this contributes to the overall safety and reliability of these large language models and hopefully can create a method for people to be able to do it as well. 

HUIZINGA: Yeah. Evelyne, I see why it’s so cool for you to be sitting at Microsoft Research and working with these guys … It’s about now that I pose the “what could possibly go wrong if you got everything right?” question on this podcast. And I’m really interested in how researchers are thinking about the potential downsides and consequences of their work. So, Evelyne, do you have any insights on things that you’ve discovered along the path that might make you take preemptive steps to mitigate? 

VIEGAS: Yeah, I think it’s coming back to actually what Muhammed was just talking about, I think Cesar, too, around data, the importance of data and the cultural value and the local value. I think an important piece of continuing to be positive for me [LAUGHTER] is to make sure that we fully understand that at the end of the day, data, which is so important to build those foundation models is, especially language models in particular, are just proxies to human beings. And I feel that it’s uh … we need to remember that it’s a proxy to humans and that we all have some different beliefs, values, goals, preferences. And so how do we take all that into account? And I think that beyond the data safety, provenance, I think there’s an aspect of “data caring.” I don’t know how to say it differently, [LAUGHTER] but it’s kind of in the same way that we care for people, how do we care for the data as a proxy to humans? And I’m thinking of, you know, when we talk about like in, especially in cases where there is no economic value, right? [LAUGHTER] And so, but there is local value for those communities. And I think actually there is cultural value across countries. So just wanted to say that there is also an aspect, I think we need to do more research on, as data as proxies to humans. And as complex humans we are, right? 

HUIZINGA: Right. Well, one of the other questions I like to ask on these Ideas episodes is, is about the idea of “blue sky” or “moonshot” research, kind of outrageous ideas. And sometimes they’re not so much outrageous as they are just living outside the box of traditional research, kind of the “what if” questions that make us excited. So just briefly, is there anything on your horizon, specifically Cesar and Muhammed, that you would say, in light of this program, AFMR, that you’ve had access to things that you think, boy, this now would enable me to ask those bigger questions or that bigger question. I don’t know what it is. Can you share anything on that line? 

TORRES: I guess from my end, one of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And it’s a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, it’s typically the case that the tool is making something easier for us. But at the same time, if something’s easier, then some other thing is harder. And then we run into this really strange case where if everything is easy, then we are faced with the “blank canvas syndrome,” right? Like what do you even do if everything is just equally weighted with ease? And so my big idea is to actually think about tools that are purposely making us slower … 

HUIZINGA: Mmmmm … 

TORRES: … that have friction, that have errors, that have failures and really design how those moments can change our attitudes towards how we move around in space. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards. And so I really do think that this is hidden in the latent space of the data that we collect. And so we just need to be immersed in that data. We need to traverse it and really it becomes an infrastructure problem. And so the more that we expose people to these foundational models, the more that we’re going to be able to see how we can enable these new ways of walking through and exploring our environment. 

HUIZINGA: Yeah. I love this so much because I’ve actually been thinking some of the best experiences in our lives haven’t seemed like the best experiences when we went through them, right? The tough times are what make us grow. And this idea that AI makes everything accessible and easy and frictionless is what you’ve said. I’ve used that term too. I think of the people floating around in that movie WALL-E and all they have to do is pick whether I’m wearing red or blue today and which drink I want. I love this, Cesar. That’s something I hadn’t even expected you might say and boom, out of the park. Muhammad, do you have any sort of outrageous …? That was flipping it back! 

IDRIS: I was going to say, yeah, no, I listen, I don’t know how I could top that. But no, I mean, so it’s funny, Cesar, as you were mentioning that I was thinking about grad school, how at the time, it was the most, you know, friction-filled life experience. But in hindsight, I wouldn’t trade it in for the world. For me, you know, one of the things I’m often thinking about in my job is that, you know, what if we lived in a world where everyone had all the information that they needed, access to all the care they need? What would happen then? Would we magically all be the healthiest version of ourselves? I’m a little bit skeptical. I’m not going to lie, right? [LAUGHTER] But that’s something that I’m often thinking about. Now, bringing that back down to our project, one of the things that I find a little bit amusing is that I tend to ping-pong between, this is amazing, the capabilities are just, the possibilities are endless; and then there will be kind of one or two small things where it’s pretty obvious that there’s still a lot of research that needs to be done, right? So my whole, my big “what if” actually, I want to bring that back down to a kind of a technical thing which is, what if AI can truly understand culture, not just language, right? And so right now, right, an AI model can translate a public health message. It’s pretty straightforward from English to Spanish, right? But it doesn’t inherently understand why some Spanish speaking countries may be more hesitant about certain medical interventions. It doesn’t inherently appreciate the historical context that shapes that hesitancy or what kinds of messaging would build trust rather than skepticism, right? So there’s literal like cultural nuances. That to me is what, when I say culturally congruent or cultural context, what it is that I mean. And I think for me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these, or how can these, emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication. And so that’s what I would say for me is like, what would that world look like? 

HUIZINGA: Yeah, the big animating “what if?” I love this. Go ahead, Evelyne, you had something. Please. 

VIEGAS: Can I expand? I cannot talk. I’m going to do like Muhammed, I cannot talk! Like that friction and the cultural aspect, but can I expand? And as I was listening to Cesar on the education, I think I heard you talk about the educational rite of passage at some point, and Muhammed on those cultural nuances. So first, before talking about “what if?” I want to say that there is some work, again, when we talk about AFMR, is the technology is all the brain power of people thinking, having crazy ideas, very creative in the research being done. And there is some research where people are looking at what it means, actually, when you build those language models and how you can take into account different language and different culture or different languages within the same culture or between different cultures speaking the same language, or … So there is very interesting research. And so it made me think, expanding on what Muhammed and Cesar were talking about, so this educational rite of passage, I don’t know if you’re aware, so in Europe in the 17th, 18th century, there was this grand tour of Europe and that was reserved to just some people who had the funds to do that grand tour of Europe, [LAUGHTER] let’s be clear! But it was this educational rite of passage where actually they had to physically go to different countries to actually get familiar and experience, experiment, philosophy and different types of politics, and … So that was kind of this “passage obligé” we say in French. I don’t know if there is a translation in English, but kind of this rite of passage basically. And so I am like, wow, what if actually we could have, thanks to the AI looking at different nuances of cultures, of languages … not just language, but in a multimodal point of viewpoint, what if we could have this “citizen of the world” rite of passage, where we … before we are really citizens of the world, we need to understand other cultures, at least be exposed to them. So that would be my “what if?” How do we make AI do that? And so without … and for anyone, right, not just people who can afford it. 

HUIZINGA: Well, I don’t even want to close, but we have to. And I’d like each of you to reflect a bit. I think I want to frame this in a way you can sort of pick what you’d like to talk about. But I often have a little bit of vision casting in this section. But there are some specific things I’d like you to talk about. What learnings can you share from your experience with AFMR? Or/and what’s something that strikes you as important now that may not have seemed that way when you started? And you can also, I’m anticipating you people are going to flip that and say, what wasn’t important that is now? And also, how do see yourself moving forward in light of this experience that you’ve had? So Muhammed, let’s go first with you, then Cesar, and then Evelyne, you can close the show. 

IDRIS: Awesome. One of the things that, that I’m often thinking about and one of the concepts I’m often reminded of, given the significance of the work that institutions like a Morehouse School of Medicine and UT Arlington and kind of Minority Serving Institutions, right, it almost feels like there is an onslaught of pushback to addressing some of these more systemic issues that we all struggle with, is what does it mean to strive for excellence, right? So in our tradition there’s a concept called Ihsan. Ihsan … you know there’s a lot of definitions of it but essentially to do more than just the bare minimum to truly strive for excellence and I think it was interesting, having spent time at Microsoft Research in Redmond as part of the AFMR program, meeting other folks who also participated in the program that, that I started to appreciate for myself the importance of this idea of the responsible design, development, and deployment of technologies if we truly are going to achieve the potential benefits. And I think this is one of the things that I could kind of throw out there as something to take away from this podcast, it’s really, don’t just think of what we’re developing as tools, but also think of them as how will they be applied in the real world? And when you’re thinking about the context within which something is going to be deployed, that brings up a lot of interesting constraints, opportunities, and just context that I think is important, again, to not just work on an interesting technology for the sake of an interesting technology, but to truly achieve that benefit for society. 

HUIZINGA: Hmm. Cesar. 

TORRES: I mean, echoing Muhammad, I think the community is really at the center of how we can move forward. I would say the one element that really struck a chord with me, and something that I very much undervalued, was the power of infrastructure and spending time laying down the proper scaffolds and steppingstones, not just for you to do what you’re trying to do, but to allow others to also find their own path. I was setting up Azure from one of my classes and it took time, it took effort, but the payoff has been incredible in … in so much the impact that I see now of students from my class sharing with their peers. And I think this culture of entrepreneurship really comes from taking ownership of where you’ve been and where you can go. But it really just, it all comes down to infrastructure. And so AFMR for me has been that infrastructure to kind of get my foot out the door and also have the ability to bring some folks along the journey with me, so … 

HUIZINGA: Yeah. Evelyne, how blessed are you to be working with people like this? Again, my face hurts from grinning so hard. Bring us home. What are your thoughts on this? 

VIEGAS: Yeah, so first of all, I mean, it’s so wonderful just here live, like listening to the feedback from Muhammed and Cesar of what AFMR brings and has the potential to bring. And first, let me acknowledge that to put a program like AFMR, it takes a village. So I’m here, the face here, or well, not the face, the voice rather! [LAUGHTER] But it’s so many people who have, at Microsoft on the engineering side, we’re just talking about infrastructure, Cesar was talking about, you know, the pain and gain of leveraging an industry-grade infrastructure like Azure and Azure AI services. So, also our policy teams, of course, our researchers. But above all, the external research community … so grateful to see. It’s, as you said, I feel super blessed and fortunate to be working on this program and really listening what we need to do next. How can we together do better? There is one thing for me, I want to end on the community, right? Muhammed talked about this, Cesar too, the human aspect, right? The technology is super important but also understanding the human aspect. And I will say, actually, my “curb cut moment” for me [LAUGHTER] was really working with the MSIs and the cohort, including Muhammed and Cesar, when they came to Redmond, and really understanding some of the needs which were going beyond the infrastructure, beyond you know a small network, how we can put it bigger and deployments ideas too, coming from the community and that’s something which actually we also try to bring to the whole of AFMR moving forward. And I will finish on one note, which for me is really important moving forward. We heard from Muhammed talking about the really importance of interdisciplinarity, right, and let us not work in silo. And so, and I want to see AFMR go more international, internationality if the word exists … [LAUGHTER] 

HUIZINGA: It does now! 

VIEGAS: It does now! But it’s just making sure that when we have those collaborations, it’s really hard actually, time zones, you know, practically it’s a nightmare! But I think there is definitely an opportunity here for all of us. 

HUIZINGA: Well, Cesar Torres, Muhammed Idris, Evelyne Viegas. This has been so fantastic. Thank you so much for coming on the show to share your insights on AFMR today. 

[MUSIC PLAYS] 

TORRES: It was a pleasure.  

IDRIS: Thank you so much. 

VIEGAS: Pleasure. 

The post Ideas: Accelerating Foundation Models Research: AI for all appeared first on Microsoft Research.

]]>
Ideas: Quantum computing redefined with Chetan Nayak http://approjects.co.za/?big=en-us/research/podcast/ideas-quantum-computing-redefined-with-chetan-nayak/ Wed, 19 Feb 2025 16:04:49 +0000 http://approjects.co.za/?big=en-us/research/?p=1130040 Microsoft announced the creation of the first topoconductor and first QPU architecture with a topological core. Dr. Chetan Nayak, a technical fellow of Quantum Hardware at the company, discusses how the breakthroughs are redefining the field of quantum computing.

The post Ideas: Quantum computing redefined with Chetan Nayak appeared first on Microsoft Research.

]]>
Outline illustration of Chetan Nayak | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Dr. Chetan Nayak, a technical fellow focused on quantum hardware at Microsoft. As a preteen, Nayak became engrossed in the world of scientific discovery, “accidentally exposed,” he says, to the theory of relativity, advanced mathematics, and the like while exploring the shelves of his local bookstores. In studying these big ideas, he began to develop his own understanding of the forces and phenomena at work around us and ultimately realized he could make his own unique contributions, which have since included advancing the field of quantum computing. Nayak examines the defining moments in the history of quantum computing; explains why we still need quantum computing, even with the rise of generative AI; and discusses how Microsoft Quantum is re-engineering the quantum computer with the creation of the world’s first topoconductor and first quantum processing unit (QPU) architecture with a topological core, called the Majorana 1.

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest today is Dr. Chetan Nayak, a technical fellow of Quantum Hardware at Microsoft Quantum. Under Chetan’s leadership, the Microsoft Quantum team has published a paper that demonstrates a fundamental operation for a scalable topological quantum computer. The team also announced the creation of the world’s first topoconductor—more on that later—and first QPU architecture with a topological core, called the Majorana 1. Chetan Nayak, I can’t wait to find out what all of this is … welcome to Ideas!

CHETAN NAYAK: Thank you. Thanks for having me. And I’m excited to tell you about this stuff.

HUIZINGA: Well, you have a huge list of accomplishments, accolades, and awards—little alliteration there. But I want to start by getting to know a bit more about you and what got you there. So specifically, what’s your “research origin story,” as it were? What big idea inspired you to study the smallest parts of the universe?

NAYAK: It’s a great question. I think if I really have to go back to the origin story, it starts when I was a kid, you know, probably a preteen. And, you know, I’d go to bookstores to … I know, I guess many of the people listening to this may not know what that is, [LAUGHTER] but there used to be these brick-and-mortar storefronts where they would sell books, physical books, …

HUIZINGA: Right.

NAYAK: … and I’d go to bookstores to, you know, to buy books to read, you know, fiction. But I would browse through them, and there’d be a nonfiction section. And often there’d be used books, you know, sometimes used textbooks or used popular science books. And I remember, even though they were bookstores, not libraries, I would spend a lot of time there leafing through books and got exposed to—accidentally exposed to—a lot of ideas that I wouldn’t otherwise have been. You know, just, sort of, you know, I maybe went there, you know, looking to pick up the next Lord of the Rings book, and while I was there, you know, wander into a book that was sort of explaining the theory of relativity to non-scientists. And I remember leafing through those books and actually reading about Einstein’s discoveries, you know, most famously E = mc2, but actually a lot of those books were explaining these thought experiments that Einstein did where he was thinking about, you know, if he were on a train that were traveling at the speed of light, what would light look like to him? [LAUGHTER] Would he catch up to it? You know, and all these incredible thought experiments that he did to try to figure out, you know, to really play around with the basic laws as they were currently understood, of physics, and by, you know, stretching and pulling them and going into extreme … taking them to extreme situations, you could either find the flaws in them or in some cases see what the next steps were. And that was, you know, really inspirational to me. I, you know, around the same time, also started leafing through various advanced math books and a little later picked up a book on calculus and started flipping through it, used book with, like, you know, the cover falling apart and the pages starting to fall out. But there was a lot of, you know, accidental discovery of topics through wandering through bookstores, actually. I also, you know, went to this great magnet high school in New York City called Stuyvesant High School, where I was surrounded by people who were really interested in science and math and technology. So I think, you know, for me, that origin story really starts, you know, maybe even earlier, but at least in my preteen years when, you know, I went through a process of learning new things and trying to understand them in my own way. And the more you do that, eventually you find maybe you’re understanding things in a little different way than anybody else ever did. And then pretty soon, you know, you’re discovering things that no one’s ever discovered before. So that’s, sort of, how it started.

HUIZINGA: Yeah. Well, I want to drill in a little bit there because you’ve brought to mind a couple of images. One is from a Harry Potter movie, And the Half-Blood Prince, where he discovers the potions handbook, but it’s all torn up and they were fighting about who didn’t get that book. And it turned out to be … so there’s you in a bookstore somewhere between the sci-fi and the non-fi, shall we call it. And you’re, kind of, melding the two together. And I love how you say, I was accidentally exposed. [LAUGHTER] Sounds kind of like radiation of some kind and you’ve turned into a scientist. A little bit more on that. This idea of quantum, because you’ve mentioned Albert Einstein, there’s quantum physics, quantum mechanics, now quantum computing. Do these all go together? I mean, what came out of what in that initial, sort of, exploration with you? Where did you start getting interested in the quantum of things?

NAYAK: Yeah, so I definitely started with relativity, not quantum. That was the first thing I heard about. And I would say in a lot of ways, that’s the easier one. I mean, those are the two big revolutions in physics in the 20th century, relativity and quantum theory, and quantum mechanics is by far, at least for me and for many people, the harder one to get your head around because it is so counterintuitive. Quantum mechanics in some sense, or quantum theory in some sense, for most of what we experience in the world is down many abstraction layers away from what we experience. What I find amazing is that the people who created, you know, discovered quantum mechanics, they had nothing but the equations to guide them. You know, they didn’t really understand what they were doing. They knew that there were some holes or gaps in the fundamental theory, and they kind of stumbled into these equations, and they gave the right answers, and they just had to follow it. I was actually just a few weeks ago, I was in Arosa, which is a small Swiss town in the Alps. That’s actually the town where Schrödinger discovered Schrödinger’s equation.

HUIZINGA: No!

NAYAK: Yeah, a hundred years ago, this summer …

HUIZINGA: Amazing!

NAYAK: So Schrödinger suffered tuberculosis, which eventually actually killed him much later in his life. And so he went into the mountains …

HUIZINGA: … for the cure.

NAYAK: … for his health, yeah, to a sanatorium to recover from tuberculosis. And while he was there in Arosa, he discovered his equation. And it’s a remarkable story because, you know, that equation, he didn’t even know what the equation meant. He just knew, well, particles are waves, and waves have wave equations. Because that’s ultimately Maxwell’s equation. You can derive wave equations for light waves and radio waves and microwaves, x-rays. And he said, you know, there has to be a wave equation for this thing and this wave equation needs to somehow correctly predict the energy levels in hydrogen.

HUIZINGA: Oh, my gosh.

NAYAK: And he, you know, worked out this equation and then solved it, which is for that time period not entirely trivial. And he got correctly the energy levels of hydrogen, which people had … the spectra, the different wavelengths of light that hydrogen emits. And lo and behold, it works. He had no idea why. No idea what it even meant. And, um, but knew that he was onto something. And then remarkably, other people were able to build on what he’d done, were able to say, no, there must be a grain of truth here, if not the whole story, and let’s build on this, and let’s make something that is richer and encompasses more and try to understand the connections between this and other things. And Heisenberg was, around the same time, developing his what’s called matrix mechanics, a different way of thinking about quantum computing, and then people realize the connections between those, like Dirac. So it’s a remarkable story how people, how scientists, took these things they understood, you know, imposed on it a certain level of mathematical consistency and a need for the math to predict things that you could observe, and once you had, sort of, the internal mathematical consistency and it was correctly explaining a couple of data points about the world, you could build this huge edifice based on that. And so that was really impressive to me as I learned that. And that’s 100 years ago! It was 1925.

HUIZINGA: Right. Well, let me …

NAYAK: And that’s quantum mechanics!

HUIZINGA: OK.

NAYAK: You’re probably going to say, well, how does quantum computing fit into this, you know? [LAUGHTER] Right? And that’s a much later development. People spent a long time just trying to understand quantum mechanics, extend it, use it to understand more things, to understand, you know, other particles. So it was initially introduced to understand the electron, but you could understand atoms, molecules, and subatomic things and quarks and positrons. So there was a rich, you know, decades of development and understanding, and then eventually it got combined with relativity, at least to some extent. So there was a lot to do there to really understand and build upon the early discoveries of quantum mechanics. One of those directions, which was kicked off by Feynman around, I think, 1982 and independently by a Russian mathematician named Yuri Manin was, OK, great, you know, today’s computers, again, is many abstraction layers away from anything quantum mechanical, and in fact, it’s sort of separated from the quantum world by many classical abstraction layers. But what if we built a technology that didn’t do that? Like, that’s a choice. It was a choice. It was a choice that was partially forced on us just because of the scale of the things we could build. But as computers get smaller and smaller and the way Moore’s law is heading, you know, at some point, you’re going to get very close to that point at which you cannot abstract away quantum mechanics, [LAUGHTER] where you must deal with quantum mechanics, and it’s part and parcel of everything. You are not in the fortunate case where, out of quantum theory has emerged the classical world that behaves the way we expect it to intuitively. And, you know, once we go past that, that potentially is really catastrophic and scary because, you know, you’re trying to make things smaller for the sake of, you know, Moore’s law and for making computers faster and potentially more energy efficient. But, you know, if you get down to this place where the momentum and position of things, of the electrons, you know, or of the currents that you’re relying on for computation, if they’re not simultaneously well-defined, how are you going to compute with that? It looks like this is all going to break down. And so it looks like a real crisis. But, you know, what they realized and what Feynman realized was actually it’s an opportunity. It’s actually not just a crisis. Because if you do it the right way, then actually it gives you way more computational power than you would otherwise have. And so rather than looking at it as a crisis, it’s an opportunity. And it’s an opportunity to do something that would be otherwise unimaginable.

HUIZINGA: Chetan, you mentioned a bunch of names there. I have to say I feel sorry for Dr. Schrödinger because most of what he’s known for to people outside your field is a cat, a mysterious cat in a box, meme after meme. But you’ve mentioned a number of really important scientists in the field of quantum everything. I wonder, who are your particular quantum heroes? Are there any particular, sort of, modern-day 21st-century or 20th-century people that have influenced you in such a way that it’s like, I really want to go deep here?

NAYAK: Well, definitely, you know, the one person I mentioned, Feynman, is later, so he’s the second wave, you could say, of, OK, so if the first wave is like Schrödinger and Heisenberg, and you could say Einstein was the leading edge of that first wave, and Planck. But … and the second wave, maybe you’d say is, is, I don’t know, if Dirac is first or second wave. You might say Dirac is second wave and potentially Landau, a great Russian physicist, second wave. Then maybe Feynman’s the third wave, I guess? I’m not sure if he’s second or third wave, but anyway, he’s post-war and was really instrumental in the founding of quantum computing as a field. He had a famous statement, which is, you know, in his lectures, “There’s always room at the bottom.” And, you know, what he was thinking about there was, you can go to these extreme conditions, like very low temperatures and in some cases very high magnetic fields, and new phenomena emerge when you go there, phenomena that you wouldn’t otherwise observe. And in a lot of ways, many of the early quantum theorists, to some extent, were extreme reductionists because, you know, they were really trying to understand smaller and smaller things and things that in some ways are more and more basic. At the same time, you know, some of them, if not all of them, at the same time held in their mind the idea that, you know, actually, more complex behaviors emerge out of simple constituents. Einstein famously, in his miracle year of 1905, one of the things he did was he discovered … he proposed the theory of Brownian motion, which is an emergent behavior that relies on underlying atomic theory, but it is several layers of abstraction away from the underlying atoms and molecules and it’s a macroscopic thing. So Schrödinger famously, among the other things, he’s the person who came up with the concept of entanglement …

HUIZINGA: Yes.

NAYAK: … in understanding his theory. And for that matter, Schrödinger’s cat is a way to understand the paradoxes that occur when the classical world emerges from quantum mechanics. So they were thinking a lot about how these really incredible, complicated things arise or emerge from very simple constituents. And I think Feynman is one those people who really bridged that as a post-war scientist because he was thinking a lot about quantum electrodynamics and the basic underlying theory of electrons and photons and how they interact. But he also thought a lot about liquid helium and ultimately about quantum computing. Motivation for him in quantum computing was, you have these complex systems with many underlying constituents and it’s really hard to solve the equation. The equations are basically unsolvable.

HUIZINGA: Right.

NAYAK: They’re complicated equations. You can’t just, sort of, solve them analytically. Schrödinger was able to do that with his equation because it was one electron, one proton, OK. But when you have, you know, for a typical solid, you’ll have Avogadro’s number of electrons and ions inside something like that, there’s no way you’re going to solve that. And what Feynman recognized, as others did, really, coming back to Schrödinger’s observation on entanglement, is you actually can’t even put it on a computer and solve a problem like that. And in fact, it’s not just that with Avogadro’s number you can’t; you can’t put it on a computer and solve it with a thousand, you know, [LAUGHTER] atoms, right? And actually, you aren’t even going to be able to do it with a hundred, right. And when I say you can’t do that on a computer, it’s not that, well, datacenters are getting bigger, and we’re going to have gigawatt datacenters, and then that’s the point at which we’ll be able to see—no, the fact is the amazing thing about quantum theory is if, you know, you go from, let’s say, you’re trying to solve a problem with 1,000 atoms in it. You know, if you go to 1,001, you’re doubling the size of the problem. As far as if you were to store it on a cloud, just to store the problem on the classical computer, just to store the answer, I should say, on a classical computer, you’d have to double the size. So there’s no chance of getting to 100, even if, you know, with all the buildout of datacenters that’s happening at this amazing pace, which is fantastic and is driving all these amazing advances in AI, that buildout is never going to lead to a classical computer that can even store the answer to a difficult quantum mechanical problem.

HUIZINGA: Yeah, so basically in answer to the “who are your quantum heroes,” you’ve kind of given us a little history of quantum computing, kind of, the leadup and the questions that prompted it. So we’ll get back to that in one second, because I want you to go a little bit further on where we are today. But before we do that, you’ve also alluded to something that’s super interesting to me, which is in light of all the recent advances and claims in AI, especially generative AI, that are making claims like we’ll be able to shorten the timeline on scientific discovery and things like that, why then, do we need quantum computing? Why do we need it?

NAYAK: Great question, so at least AI is … AI and machine learning, at least so far, is only as good as the training data that you have for it. So if you train AI on all the data we have, and if you train AI on problems we can solve, which at some level are classical, you will be able to solve classical problems. Now, protein folding is one of those problems where the solution is basically classical, very complicated and difficult to predict but basically classical, and there was a lot of data on it, right. And so it was clearly a big data problem that’s basically classical. As far as we know, there’s no classical way to simulate or mimic quantum systems at scale, that there’s a clean separation between the classical and quantum worlds. And so, you know, that the quantum theory is the fundamental theory of the world, and there is no hidden classical model that is lurking [LAUGHTER] in the background behind it, and people sometimes would call these things like hidden variable theories, you know, which Einstein actually really was hoping, late in his life, that there was. That there was, hiding behind quantum mechanics, some hidden classical theory that was just obscured from our view. We didn’t know enough about it, and the quantum thing was just our best approximation. If that’s true, then, yeah, maybe an AI can actually discover that classical theory that’s hiding behind the quantum world and therefore would be able to discover it and answer the problems we need to answer. But that’s almost certainly not the case. You know, there’s just so much experimental evidence about the correctness of quantum mechanics and quantum theory and many experiments that really, kind of, rule out many aspects of such a classical theory that I think we’re fairly confident there isn’t going to be some classical approximation or underlying theory hiding behind quantum mechanics. And therefore, an AI model, which at the end of the day is some kind of very large matrix—you know, a neural network is some very large classical model obeying some very classical rules about, you take inputs and you produce outputs through many layers—that that’s not going to produce, you know, a quantum theory. Now, on the other hand, if you have a quantum computer and you can use that quantum computer to train an AI model, then the AI model is learning—you’re teaching it quantum mechanics—and at least within a certain realm of quantum problems, it can interpolate what we’ve learned about quantum mechanics and quantum problems to solve new problems that, you know, you hadn’t already solved. Actually, you know, like I said, in the early days, I was reading these books and flipping through these bookstores, and I’d sometimes figure out my own ways to solve problems different from how it was in the books. And then eventually I ended up solving problems that hadn’t been solved. Well, that’s sort of what an AI does, right? It trains off of the internet or off of playing chess against itself many times. You know, it learns and then takes that and eventually by learning its own way to do things, you know, it learns things that we as humans haven’t discovered yet.

HUIZINGA: Yeah.

NAYAK: And it could probably do that with quantum mechanics if it were trained on quantum data. So, but without that, you know, the world is ultimately quantum mechanical. It’s not classical. And so something classical is not going to be a general-purpose substitute for quantum theory.

HUIZINGA: OK, Chetan, this is fascinating. And as you’ve talked about pretty well everything so far, that’s given us a really good, sort of, background on quantum history as we know it in our time. Talk a little bit about where we are now, particularly—and we’re going get into topology in a minute, topological stuff—but I want to know where you feel like the science is now, and be as concise as you can because I really want get to your cool work that we’re going to talk about. And this question includes, what’s a Majorana and why is it important?

NAYAK: Yeah. So … OK, unfortunately, it won’t be that concise an answer. OK, so, you know, early ’80s, ideas about quantum computing were put forward. But I think most people thought, A, this is going to be very difficult, you know, to do. And I think, B, it wasn’t clear that there was enough motivation. You know, I think Feynman said, yes, if you really want to simulate quantum systems, you need a quantum computer. And I think at that point, people weren’t really sure, is that the most pressing thing in the world? You know, simulating quantum systems? It’s great to understand more about physics, understand more about materials, understand more about chemistry, but we weren’t even at that stage, I think, there where, hey, that’s the limiting thing that’s limiting progress for society. And then, secondly, there was also this feeling that, you know, what you’re really doing is some kind of analog computing. You know, this doesn’t feel digital, and if it doesn’t feel digital, there’s this question about error correction and how reliable is it going to be. So Peter Shor actually, you know, did two amazing things, one of which is a little more famous in the general public but one of which is probably more important technically, is he did these two amazing things in the mid-’90s. He first came up with Shor’s algorithm, where he said, if you have a quantum computer, yeah, great for simulating quantum systems, but actually you can also factor large numbers. You can find the prime factors of large numbers, and the difficulty of that problem is the underlying security feature under RSA [encryption], and many of these public key cryptography systems rely on certain types of problems that are really hard. It’s easy to multiply two large primes together and get the output, and you can use that to encrypt data. But to decrypt it, you need to know those two numbers, and it’s hard to find those factors. What Peter Shor discovered is that ideally, a quantum computer, an ideal quantum computer, would be really good at this, OK. So that was the first discovery. And at that point, what seemed at the time an academic problem of simulating quantum systems, which seemed like in Feynman’s vision, that’s what quantum computers are for, that seemingly academic problem, all of a sudden, also, you know, it turns out there’s this very important both financially and … economically and national security-wise other application of a quantum computer. And a lot of people sat up and took notice at that point. So that’s huge. But then there’s a second thing that he, you know, discovered, which was quantum error correction. Because everyone, when he first discovered it, said, sure, ideally that’s how a quantum computer works. But quantum error correction, you know, this thing sounds like an analog system. How are you going to correct errors? This thing will never work because it’ll never operate perfectly. Schrödinger’s problem with the cat’s going to happen, is that you’re going to have entanglement. The thing is going to just end up being basically classical, and you’ll lose all the supposed gains you’re getting from quantum mechanics. And quantum error correction, that second discovery of Peter Shors, really, you know, suddenly made it look like, OK, at least in principle, this thing can happen. And people built on that. Peter Shor’s original quantum error correction, I would say, it was based on a lot of ideas from classical error correction. Because you have the same problem with classical communication and classical computing. Alexei Kitaev then came up with, you know, a new set of quantum error correction procedures, which really don’t rely in the same way on classical error correction. Or if they do, it’s more indirect and in many ways rely on ideas in topology and physics. And, you know, those ideas, which lead to quantum error correcting codes, but also ideas about what kind of underlying physical systems would have built-in hardware error protection, led to what we now call topological quantum computing and topological qubits, because it’s this idea that, you know, just like people went from the early days of computers from vacuum tubes to silicon, actually, initially germanium transistors and then silicon transistors, that similarly that you had to have the right underlying material in order to make qubits.

HUIZINGA: OK.

NAYAK: And that the right underlying material platform, just as for classical computing, it’s been silicon for decades and decades, it was going to be at one of these so-called topological states of matter. And that these would be states of matter whose defining feature, in a sense, would be that they protect quantum information from errors, at least to some extent. Nothing’s perfect, but, you know, in a controllable way so that you can make it better as needed and good enough that any subsequent error correction that you might call software-level error correction would not be so cumbersome and introduce so much overhead as to make a quantum computer impractical. I would say, you know, there were these … the field had a, I would say, a reboot or a rebirth in the mid-1990s, and pretty quickly those ideas, in addition to the applications and algorithms, you know, coalesced around error correction and what’s called fault tolerance. And many of those ideas came, you know, freely interchanged between ideas in topology and the physics of what are called topological phases and, you know, gave birth to this, I would say, to the set of ideas on which Microsoft’s program has been based, which is to look for the right material … create the right material and qubits based on it so that you can get to a quantum computer at scale. Because there’s a number of constraints there. And the work that we’re really excited about right now is about getting the right material and harnessing that material for qubits.

HUIZINGA: Well, let’s talk about that in the context of this paper that you’re publishing and some pretty big news in topology. You just published a paper in Nature that demonstrates—with receipts—a fundamental operation for a scalable topological quantum computer relying on, as I referred to before, Majorana zero modes. That’s super important. So tell us about this and why it’s important.

NAYAK: Yeah, great. So building on what I was just saying about having the right material, what we’re relying on is, to an extent, is superconductivity. So that’s one of the, you know, really cool, amazing things about the physical world. That many metals, including aluminum, for instance, when you cool them down, they’re able to carry electricity with no dissipation, OK. No energy loss associated with that. And that property, the remarkable … that property, what underlies it is that the electrons form up into pairs. These things called Cooper pairs. And those Cooper pairs, their wave functions kind of lock up and go in lockstep, and as a result, actually the number of them fluctuates wildly, you know, in any place locally. And that enables them to, you know, to move easily and carry current. But also, a fundamental feature, because they form pairs, is that there’s a big difference between an even and odd number of electrons. Because if there’s an odd electron, then actually there’s some electron that’s unpaired somewhere, and there’s an energy penalty associated, an energy cost to that. It turns out that that’s not always true. There’s actually a subclass of superconductors called topological superconductors, or topoconductors, as we call them, and topoconductors have this amazing property that actually they’re perfectly OK with an odd number of electrons! In fact, when there’s an odd number of electrons, there isn’t any unpaired electron floating around. But actually, topological superconductors, they don’t have that. That’s the remarkable thing about it. I’ve been warned not to say what I’m about to say, but I’ll just go ahead [LAUGHTER] and say it anyway. I guess that’s bad way to introduce something …

HUIZINGA: No, it’s actually really exciting!

NAYAK: OK, but since you brought up, you know, Harry Potter and the Half-Blood Prince, you know, Voldemort famously split his soul into seven or, I guess, technically eight, accidentally. [LAUGHTER] He split his soul into seven Horcruxes, so in some sense, there was no place where you could say, well, that’s where his soul is.

HUIZINGA: Oh, my gosh!

NAYAK: So Majorana zero modes do kind of the same thing! Like, there’s this unpaired electron potentially in the system, but you can’t find it anywhere. Because to an extent, you’ve actually figured out a way to split it and put it … you know, sometimes we say like you put it at the two ends of the system, but that’s sort of a mathematical construct. The reality is there is no place where that unpaired electron is!

HUIZINGA: That’s crazy. Tell me, before you go on, we’re talking about Majorana. I had to look it up. That’s a guy’s name, right? So do a little dive into what this whole Majorana zero mode is.

NAYAK: Yeah, so Majorana was an Italian physicist, or maybe technically Sicilian physicist. He was very active in the ’20s and ’30s and then just disappeared mysteriously around 1937, ’38, around that time. So no one knows exactly what happened to him. You know, but one of his last works, which I think may have only been published after he disappeared, he proposed this equation called the Majorana equation. And he was actually thinking about neutrinos at the time and particles, subatomic particles that carry no charge. And so, you know, he was thinking about something very, very different from quantum computing, actually, right. So Majorana—didn’t know anything about quantum computing, didn’t know anything about topological superconductors, maybe even didn’t know much about superconductivity at all—was thinking about subatomic particles, but he wrote down this equation for neutral objects, or some things that don’t carry any charge. And so when people started, you know, in the ’90s and 2000s looking at topological superconductors, they realized that there are these things called Majorana zero modes. So, as I said, and let me explain how they enter the story, so Majorana zero modes are … I just said that topological superconductors, there’s no place you can find that even or odd number of electrons. There’s no penalty. Now superconductors, they do have a penalty—and it’s called the energy gap—for breaking a pair. Even topological superconductors. You take a pair, a Cooper pair, you break it, you have to pay that energy cost, OK. And it’s, like, double the energy, in a sense, of having an unpaired electron because you’ve created two unpaired electrons and you break that pair. Now, somehow a topological superconductor has to accommodate that unpaired electron. It turns out the way it accommodates it is it can absorb or emit one of these at the ends of the wire. If you have a topological superconductor, a topoconductor wire, at the ends, it can absorb or emit one of these things. And once it goes into one end, then it’s totally delocalized over the system, and you can’t find it anywhere. You can say, oh, it got absorbed at this end, and you can look and there’s nothing you can tell. Nothing has changed about the other end. It’s now a global property of the whole thing that you actually need to somehow figure out, and I’ll come to this, somehow figure out how to connect the two ends and actually measure the whole thing collectively to see if there’s an even or odd number of electrons. Which is why it’s so great as a qubit because the reason it’s hard for Schrödinger’s cat to be both dead and alive is because you’re going to look at it, and then you look at it, photons are going to bounce off it and you’re going to know if it’s dead or alive. And the thing is, the thing that was slightly paradoxical is actually a person doesn’t have to perceive it. If there’s anything in the environment that, you know, if a photon bounces off, it’s sort of like if a tree falls in the forest …

HUIZINGA: I was just going to say that!

NAYAK: … it still makes a sound. I know! It still makes a sound in the sense that Schrödinger’s cat is still going to be dead or alive once a photon or an air molecule bounces off it because of the fact that it’s gotten entangled with, effectively, the rest of the universe … you know many other parts of the universe at that point. And so the fact that there is no place where you can go and point to that unpaired electron means it does that “even or oddness” which we call parity, whether something’s even or odd is parity. And, you know, these are wires with, you know, 100 million electrons in them. And it’s a difference between 100 million and 100 million and one. You know, because one’s an even or odd number. And that difference, you have to be able to, like, the environment can’t detect it. So it doesn’t get entangled with anything, and so it can actually be dead and alive at the same time, you know, unlike Schrödinger’s cat, and that’s what you need to make a qubit, is to create those superpositions. And so Majorana zero modes are these features of the system that actually don’t actually carry an electrical charge. But they are a place where a single unpaired electron can enter the system and then disappear. And so they are this remarkable thing where you can hide stuff. [LAUGHS]

HUIZINGA: So how does that relate to your paper and the discoveries that you’ve made here?

NAYAK: Yeah, so in an earlier paper … so now the difficulty is you have to actually make this thing. So, you know, you put a lot of problems up front, is that you’re saying, OK, the solution to our problem is we need this new material and we need to harness it for qubits, right. Great. Well, where are we going to get this material from, right? You might discover it in nature. Nature may hand it to you. But in many cases, it doesn’t. And that’s … this is one of those cases where we actually had to engineer the material. And so engineering the material is, it turns out to be a challenge. People had ideas early on that they could put some combination of semiconductors and superconductors. But, you know, for us to really make progress, we realized that, you know, it’s a very particular combination. And we had to develop—and we did develop—simulation capabilities, classical. Unfortunately, we don’t have a quantum computer, so we had to do this classically with classical computers. We had to classically simulate various kinds of materials combinations to find one, or find a class, that would get us into the topological phase. And it turned out lots of details mattered there, OK. It involves a semiconductor, which is indium arsenide. It’s not silicon, and it’s not the second most common semiconductor, which is gallium nitride, which is used in LED lights. It’s something called indium arsenide. It has some uses as an infrared detector, but it’s a different semiconductor. And we’re using it in a nonstandard way, putting it into contact with aluminum and getting, kind of, the best of both worlds of a superconductor and a semiconductor so that we can control it and get into this topological phase. And that’s a previously published paper in American Physical [Society] journal. But that’s great. So that enables … that shows that you can create this state of matter. Now we need to then build on it; we have to harness it, and we have to, as I said, we have to make one of these wires or, in many cases, multiple wires, qubits, et cetera, complex devices, and we need to figure out, how do we measure whether we have 100 million or 100 million and one electrons in one of these wires? And that was the problem we solved, which is we made a device where we took something called a quantum dot—you should think of [it] as a tiny little capacitor—and that quantum dot is coupled to the wire in such a way that the coupling … that an electron—it’s kind of remarkable—an electron can quantum mechanically tunnel from … you know, this is like an electron, you don’t know where it is at any given time. You know, its momentum and its position aren’t well defined. So it’s, you know, an electron whose, let’s say, energy is well defined … actually, there is some probability amplitude that it’s on the wire and not on the dot. Even though it should be on the dot, it actually can, kind of, leak out or quantum mechanically end up on the wire and come back. And because of that fact—the simple fact that its quantum mechanical wave function can actually have it be on the wire—it actually becomes sensitive to that even or oddness.

HUIZINGA: Interesting.

NAYAK: And that causes a small change in the capacitance of this tiny little parallel plate capacitor, effectively, that we have. And that tiny little change in capacitance, which is, just to put into numbers, is the femtofarad, OK. So that’s a decimal point followed by, you know, 15 zeros and a one … 14 zeros and a one. So that’s how tiny it is. That that tiny change in the capacitance, if we put it into a larger resonant circuit, then that larger resonant circuit shows a small shift in its resonant frequency, which we can detect. And so what we demonstrated is we can detect the difference, that one electron difference, that even or oddness, which is, again, it’s not local property of anywhere in the wire, that we can nevertheless detect. And that’s, kind of, the fundamental thing you have to have if you want to be able to use these things for quantum information processing, you know, this parity, you have to be able to measure what that parity is, right. That’s a fundamental thing. Because ultimately, the information you need is classical information. You’re going to want to know the answer to some problem. It’s going to be a string of zeros and ones. You have to measure that. But moreover, the particular architecture we’re using, the basic operations for us are measurements of this type, which is a … it’s a very digital process. The process … I mentioned, sort of, how quantum computing looks a little analog in some ways, but it’s not really analog. Well, that’s very manifestly true in our architecture, that our operations are a succession of measurements that we turn on and off, but different kinds of measurements. And so what the paper shows is that we can do these measurements. We can do them fast. We can do them accurately.

HUIZINGA: OK.

NAYAK: And the additional, you know, announcements that we’re making, you know, right now are work that we’ve done extending and building on that with showing additional types of measurements, a scalable qubit design, and then building on that to multi-qubit arrays.

HUIZINGA: Right.

NAYAK: So that really unlocked our ability to do a number of things. And I think you can see the acceleration now with the announcements we have right now.

HUIZINGA: So, Chetan, you’ve just talked about the idea of living in a classical world and having to simulate quantum stuff.

NAYAK: Yup.

HUIZINGA: Tell us about the full stack here and how we go from, in your mind, from quantum computing at the bottom all the way to the top.

NAYAK: OK, so one thing to keep in mind is quantum computers are not a general-purpose accelerator for every problem. You know, so people sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving. On the other hand, there are lots of things that classical computers are good at that quantum computers aren’t going to be good at, because it’s not going to give you any big scale up. Like a lot of big data problems where you have lots of classical data, you know, a quantum computer with, let’s say, let’s call it 1,000 qubits, and here I mean 1,000 logical qubits, and we come back to what that means, but 1,000 error-corrected qubits can solve problems that you have no chance of solving with a classical computer, even with all the world’s computing. But in fact, if it were a 1,000 qubits, you would have to take every single atom in the entire universe, OK, and turn that into a transistor, and it still wouldn’t be big enough. You don’t have enough bytes, even if every single atom in the universe were a byte. So that’s how big these quantum problems are when you try to store them on a classical computer, just to store the answer, let’s say.

HUIZINGA: Yeah.

NAYAK: But conversely, if you have a lot of classical data, like all the data in the internet, which we train, you know, our AI models with, you can’t store that on 1,000 qubits, right. You actually can’t really store more than 1,000 bits of classical information on 1,000 qubits. So many things that we have big data in classically, we don’t have the ability to really, truly store within a quantum computer in a way that you can do anything with it. So we should definitely not view quantum computers as replacing classical computers. There’s lots of things that classical computers are already good at and we’re not trying to do those things. But there many things that classical computers are not good at all. Quantum computer we should think of as a complimentary thing, an accelerator for those types of problems. It will have to work in collaboration with a classical computer that is going to do the classical steps, and the quantum computer will do the quantum steps. So that’s one thing to just keep in mind. When we talk about a quantum computer, it is part of a larger computing, you know, framework where there are many classical elements. It might be CPUs, it might be GPUs, might be custom ASICs for certain things, and then quantum computer, you know, a quantum processor, as well. So …

HUIZINGA: Is that called a QPU?

NAYAK: A QPU is the quantum processing unit, exactly! So we’ll have CPUs, GPUs, and QPUs. And so that is, you know, at the lowest layer of that stack, is the underlying substrate, physical substrate. That’s our topoconductor. It’s the material which we build our QPUs. That’s the quantum processing unit. The quantum processing unit includes all of the qubits that we have in our architecture on a single chip. And that’s, kind of, one of the big key features, key design features, that the qubits be small and small and manufacturable on a single wafer. And then the QPU also has to enable that quantum world to talk to the classical world …

HUIZINGA: Right.

NAYAK: … because you have to send it, you know, instructions and you have to get back answers. And for us, that is turning on and off measurements because our instructions are a sequence of measurements. And then, we ultimately have to get back a string of zeros and ones. But that initially is these measurements where we’re getting, you know, phase shifts on microwaves, and … which are in turn telling us about small capacitance shifts, which are in turn telling us the parity of electrons in a wire.

HUIZINGA: Right.

NAYAK: So really, this is a quantum machine in which, you know, you have the qubits that are built on the quantum plane. You’ve then got this quantum-classical interface where the classical information is going in and out of the quantum processor. And then there’s a lot of classical processing that has to happen, both to enable error correction and to enable computations. And the whole thing has to be inside of a cryogenic environment. So it’s a very special environment in which we … in which, A, it’s kept cold because that’s what you need in order to have a topoconductor, and that’s also what you need in order just in general for the qubits to be very stable. So that … when we talk about the full stack, just on the hardware side, there are many layers to this. And then of course, you know, there is the classical firmware that takes instructions and turns them into the physical things that need to happen. And then, of course, we have algorithms and then ultimately applications.

HUIZINGA: Yeah, so I would say, Chetan, that people can probably go do their own little research on how you go from temperatures that are lower than deep space to the room you’re working in. And we don’t have time to unpack that on this show. And also, I was going to ask you what could possibly go wrong if you indeed got everything right. And you mentioned earlier about, you know, what happens in an AI world if we get everything right. If you put quantum and AI together, it’s an interesting question, what that world looks like. Can you just take a brief second to say that you’re thinking about what could happen to cryptography, to, you know, just all kinds of things that we might be wondering about in a post-quantum world?

NAYAK: Great question. So, you know, first of all, you know, one of the things I want to, kind of, emphasize is, ultimately, a lot of, you know, when we think about the potential for technology, often the limit comes down to physics. There are physics limits. You know, if you think about, like, interstellar travel and things like that, well, the speed of light is kind of a hard cutoff, [LAUGHTER] and actually, you’re not going to be able to go faster than the speed light, and you have to bake that in. That ultimately, you know, if you think of a datacenter, ultimately, like there’s a certain amount of energy, and there’s a certain amount of cooling power you have. And you can say, well, this datacenter is 100 megawatts, and then in the future, we’ll have a gigawatt to use it. But ultimately, then that energy has to come from somewhere, and you’ve got some hard physical constraints. So similarly, you could ask, you know, with quantum computers, what are the hard physical constraints? What are the things that just … because you can’t make a perpetual motion machine; you can’t violate, I think, laws of quantum mechanics. And I think in the early days, there was this concern that, you know, this idea relies on violating something. You’re doing something that’s not going to work. You know, I’d say the theory of quantum error correction, the theory of fault tolerance, you know, many of the algorithms have been developed, they really do show that there is no fundamental physical constraint saying that this isn’t going to happen, you know. That, you know, that somehow you would need to have either more power than you can really generate or you would need to go much colder than you can actually get. That, you know, there’s no physical, you know, no-go result. So that’s an important thing to keep in mind. Now, the thing is, some people might then be tempted to say, well, OK, now it’s just an engineering problem because we know this in principle can work, and we just have to figure out how to work. But the truth is, there isn’t any such, like, hard barrier where you say, well, oh, up until here, it’s fundamental physics, and then beyond this, it’s just an engineering problem. The reality is, you know, new difficulties and challenges arise every step along the way. And one person might call it an engineering or an implementation challenge, and one person may call it a fundamental, you know, barrier obstruction, and I think people will probably profitably disagree, you know, agree to disagree on, like, where that goes. I think for us, like, it was really crucial, you know, as we look out at a scale to realize quantum computers are going to really make an impact. We’re going to need thousands, you know, hundreds to thousands of logical qubits. That is error-corrected qubits. And when you look at what that means, that means really million physical qubits. That is a very large scale in a world in which people have mostly learned what we know about these things from 10 to 100 qubits. To project out from that to a million, you know, it would surprise me if the solutions that are optimal for 10 to 100 qubits are the same solutions that are optimal for a million qubits, right.

HUIZINGA: Yeah.

NAYAK: And that has been a motivation for us, is let’s try to think, based on what we now know, of things that at least have a chance to work at that million qubit. Let’s not do anything that looks like it’s going to clearly hit a dead end before then.

HUIZINGA: Right.

NAYAK: Now, obviously in science, nothing is certain, and you learn new things along the way, but we didn’t want to start out with things that looked like they were not going to be, you know, work for a million qubits. That was the reason that we developed this new material, that we created this, engineered this new material, you know, these topoconductors, precisely because we said we need to have a material that can give us something where we can operate it fast and make it small and be able to control these things. So, you know, I think that’s one key thing. And, you know, what we’ve demonstrated now is that we can harness this; that we’ve got a qubit. And that’s why we have a lot of confidence that, you know, these are things that aren’t going to be decades away. That these things are going to be years away. And that was the basis for our interaction with DARPA [Defense Advanced Research Projects Agency]. We’ve just been … signed a contract with DARPA to go into the next phase of the DARPA US2QC program. And, you know, DARPA, the US government, wants to see a fault-tolerant quantum computer. And … because they do not want any surprises.

HUIZINGA: Right?!? [LAUGHS]

NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; don’t worry about it. But I think the US government realizes they might be years, not decades away, and they want to get ahead of that. And so that’s why they’ve entered into this agreement with us and the contract with us.

HUIZINGA: Yeah.

NAYAK: And so that is, you know, the thing I just want to make sure that, you know, listeners to the podcast understand that we are, you know, the reason that we fundamentally re-engineered, re-architected, what we think a quantum computer should look like and what the qubit should be and even … going all the way down to the underlying materials was … which is high risk, right? I mean, there was no guarantee … there’s no guarantee that any of this is going to work, A. And, B, there was no guarantee we would even be able to do the things we’ve done so far. I mean, you know, that’s the nature of it. If you’re going to try to do something really different, you’re going to have to take risks. And we did take risks by really starting at, you know, the ground floor and trying to redesign and re-engineer these things. So that was a necessary part of this journey and the story, was for us to re-engineer these things in a high-risk way. What that leads to is, you know, potentially changing that timeline. And so in that context, it’s really important to make this transition to post-quantum crypto because, you know, the cryptography systems in use up until now are things that are not safe from quantum attacks if you have a utility-scale quantum computer. We do know that there are crypto systems which, at least as far as we know, appear to be safe from quantum attacks. That’s what’s called post-quantum cryptography. You know, they rely on different types of hard math problems, which quantum computers aren’t probably good at. And so, you know, and changing over to a new crypto standard isn’t something that happens at the flip of a switch.

HUIZINGA: No.

NAYAK: It’s something that takes time. You know, first, you know, early part of that was based around the National Institute of Standards and Technology aligning around one or a few standard systems that people would implement, which they certified would be quantum safe and, you know, those processes have occurred. And so now is the time to switch over. Given that we know that we can do this and that it won’t happen overnight, now’s the time to make that switch.

HUIZINGA: And we’ve had several cryptographers on the show who’ve been working on this for years. It’s not like they’re just starting. They saw this coming even before you had some solidity in your work. But listen, I would love to talk to you for hours, but we’re coming to a close here. And as we close, I want to refer to a conversation you had with distinguished university professor Sankar Das Sarma. He suggested that with the emergence of Majorana zero modes, you had reached the end of the beginning and that you were now sort of embarking on the beginning of the end in this work. Well, maybe that’s a sort of romanticized vision of what it is. But could you give us a little bit of a hint on what are the next milestones on your road to a scalable, reliable quantum computer, and what’s on your research roadmap to reach them?

NAYAK: Yeah, so interestingly, we actually just also posted on the arXiv a paper that shows some aspects of our roadmap, kind of the more scientific aspects of our roadmap. And that roadmap is, kind of, continuously going from the scientific discovery phase through the engineering phase, OK. Again, as I said, it’s a matter of debate and even taste of what exactly you want to call scientific discovery versus engineering, but—which will be hotly debated, I’m sure—but it is definitely a continuum that’s going more towards … from one towards the other. And I would say, you know, at a high level, logical qubits, you know, error-corrected, reliable qubits, are, you know, the basis of quantum computation at scale and developing, demonstrating, and building those logical qubits and logic qubits at scale is kind of a big thing that—for us and for the whole industry—is, I would say, is, sort of, the next level of quantum computing. Jason Zander wrote this blog where he talked about level one, level two, level three, where level one was this NISQ—noisy intermediate-scale quantum—era; level two is foundations of, you know, reliable and logical qubits; and level three is the, you know, at-scale logical qubits. I think we’re heading towards level two, and so in my mind, that’s sort of, you know, the next North Star is really around that. I think there will be a lot of very interesting and important things that are more technical and maybe are not as accessible to a big audience. But I’d say that’s, kind of, the … I would say, if you’re, you know, a thing to keep in mind as a big exciting thing happening in the field.

HUIZINGA: Yeah. Well, Chetan Nayak, what a ride this show has been. I’m going to be watching this space—and the timelines thereof because they keep getting adjusted!

[MUSIC]

Thank you for taking time to share your important work with us today.

NAYAK: Thank you very much, my pleasure!

[MUSIC FADES]

The post Ideas: Quantum computing redefined with Chetan Nayak appeared first on Microsoft Research.

]]>
Ideas: Building AI for population-scale systems with Akshay Nambi http://approjects.co.za/?big=en-us/research/podcast/ideas-building-ai-for-population-scale-systems-with-akshay-nambi/ Tue, 11 Feb 2025 04:26:10 +0000 http://approjects.co.za/?big=en-us/research/?p=1127448 Advances in AI are driving meaningful real-world impact. Principal Researcher Akshay Nambi shares how his passion for tackling real-world challenges across various domains fuels his work in building reliable and robust AI systems.

The post Ideas: Building AI for population-scale systems with Akshay Nambi appeared first on Microsoft Research.

]]>
Outline illustration of Akshay Nambi | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, guest host Chris Stetkiewicz talks with Microsoft Principal Researcher Akshay Nambi about his focus on developing AI-driven technology that addresses real-world challenges at scale. Drawing on firsthand experiences, Nambi combines his expertise in electronics and computer science to create systems that enhance road safety, agriculture, and energy infrastructure. He’s currently working on AI-powered tools to improve education, including a digital assistant that can help teachers work more efficiently and create effective lesson plans and solutions to help improve the accuracy of models underpinning AI tutors.

Learn more:

Teachers in India help Microsoft Research design AI tool for creating great classroom content
Microsoft Research Blog, October 2023

HAMS: Harnessing AutoMobiles for Safety
Project homepage

Microsoft Research AI project automates driver’s license tests in India (opens in new tab)
Microsoft Source Asia Blog

InSight: Monitoring the State of the Driver in Low-Light Using Smartphones
Publication, September 2020

Chanakya: Learning Runtime Decisions for Adaptive Real-Time Perception
Publication, December 2023

ALT: Towards Automating Driver License Testing using Smartphones
Publication, November 2019

Dependable IoT
Project homepage

Vasudha
Project homepage

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

AKSHAY NAMBI: For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new, and that’s what keeps me excited.

[TEASER ENDS]

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

I’m your guest host, Chris Stetkiewicz. Today, I’m talking to Akshay Nambi. Akshay is a principal researcher at Microsoft Research. His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems. Akshay’s research extends across education, agriculture, transportation, and energy. He is currently working on enhancing the quality and reliability of AI systems by addressing critical challenges such as reasoning, grounding, and managing complex queries.

Akshay, welcome to the podcast.

AKSHAY NAMBI: Thanks for having me.

STETKIEWICZ: I’d like to begin by asking you to tell us your origin story. How did you get started on your path? Was there a big idea or experience that captured your imagination or motivated you to do what you’re doing today?

NAMBI: If I look back, my journey into research wasn’t a straight line. It was more about discovering my passion through some unexpected opportunities and also finding purpose along the way. So before I started with my undergrad studies, I was very interested in electronics and systems. My passion for electronics, kind of, started when I was in school. I was more like an average student, not a nerd or not too curious, but I was always tinkering around, doing things, building stuff, and playing with gadgets and that, kind of, made me very keen on electronics and putting things together, and that was my passion. But sometimes things don’t go as planned. So I didn’t get into the college which I had hoped to join for electronics, so I ended up pursuing computer science, which wasn’t too bad either. So during my final year of bachelor’s, I had to do a final semester project, which turned out to be a very pivotal moment. And that’s when I got to know this institute called Indian Institute of Science (IISc), which is a top research institute in India and also globally. And I had a chance to work on a project there. And it was my first real exposure to open-ended research, right, so I remember … where we were trying to build a solution that helped to efficiently construct an ontology for a specific domain, which simply means that we were building systems to help users uncover relationships in the data and allow them to query it more efficiently, right. And it was super exciting for me to design and build something new. And that experience made me realize that I wanted to pursue research further. And right after that project, I decided to explore research opportunities, which led me to join Indian Institute of Science again as a research assistant.

STETKIEWICZ: So what made you want to take the skills you were developing and apply them to a research career?

NAMBI: So interestingly when I joined IISc, the professor I worked with specialized in electronics, so things come back, so something I had always been passionate about. And I was the only computer science graduate in the lab at that time with others being electronic engineers, and I didn’t even know how to solder. But the lab environment was super encouraging, collaborative, so I, kind of, caught up very quickly. In that lab, basically, I worked on several projects in the emerging fields of embedded device and energy harvesting systems. Specifically, we were designing systems that could harvest energy from sources like sun, hydro, and even RF (radio frequency) signals. And my role was kind of twofold. One, I designed circuits and systems to make energy harvesting more efficient so that you can store this energy. And then I also wrote programs, software, to ensure that the harvested energy can be used efficiently. For instance, as we harvest some of this energy, you want to have your programs run very quickly so that you are able to sense the data, send it to the server in an efficient way. And one of the most exciting projects I worked during that time was on data-driven agriculture. So this was back in 2008, 2009, right, where we developed an embedded system device with sensors to monitor the agricultural fields, collecting data like soil moisture, soil temperature. And that was sent to the agronomists who were able to analyze this data and provide feedback to farmers. In many remote areas, still access to power is a huge challenge. So we used many of the technologies we were developing in the lab, specifically energy harvesting techniques, to power these sensors and devices in the rural farms, and that’s when I really got to see firsthand how technology could help people’s lives, particularly in rural settings. And that’s what, kind of, stood out in my experience at IISc, right, was that it was [the] end-to-end nature of the work. And it was not just writing code or designing circuits. It was about identifying the real-world problems, solving them efficiently, and deploying solutions in the field. And this cemented my passion for creating technology that solves real-world problems, and that’s what keeps me driving even today.

STETKIEWICZ: And as you’re thinking about those problems that you want to try and solve, where did you look for, for inspiration? It sounds like some of these are happening right there in your home.

NAMBI: That’s right. Growing up and living in India, I’ve been surrounded by these, kind of, many challenges. And these are not distant problems. These are right in front of us. And some of them are quite literally outside the door. So being here in India provides a unique opportunity to tackle some of the pressing real-world challenges in agriculture, education, or in road safety, where even small advancements can create significant impact.

STETKIEWICZ: So how would you describe your research philosophy? Do you have some big goals that guide you?

NAMBI: Right, as I mentioned, right, my research philosophy is mainly rooted in solving real-world problems through end-to-end innovation. For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground, right. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new. And that’s what keeps me excited.

STETKIEWICZ: So let’s talk a little bit about your journey at Microsoft Research. I know you began as an intern, and some of the initial work you did was focused on computer vision, road safety, energy efficiency. Tell us about some of those projects.

NAMBI: As I was nearing the completion of my PhD, I was eager to look for opportunities in industrial labs, and Microsoft Research obviously stood out as an exciting opportunity. And additionally, the fact that Microsoft Research India was in my hometown, Bangalore, made it even more appealing. So when I joined as an intern, I worked together with Venkat Padmanabhan, who now leads the lab, and we started this project called HAMS, which stands for Harnessing Automobiles for Safety. As you know, road safety is a major public health issue globally, responsible for almost 1.35 million fatalities annually and with the situation being even more severe in countries like India. For instance, there are estimates that there’s a life lost on the road every four minutes in India. When analyzing the factors which affect road safety, we saw mainly three elements. One, the vehicle. Second, the infrastructure. And then the driver. Among these, the driver plays the most critical role in many incidents, whether it’s over-speeding, driving without seat belts, drowsiness, fatigue, any of these, right. And this realization motivated us to focus on driver monitoring, which led to the development of HAMS. In a nutshell, HAMS is basically a smartphone-based system where you’re mounting your smartphone on a windshield of a vehicle to monitor both the driver and the driving in real time with the goal of improving road safety. Basically, it observes key aspects such as where the driver is looking, whether they are distracted or fatigued[1], while also considering the external driving environment, because we truly believe to improve road safety, we need to understand not just the driver’s action but also the context in which they are driving. For example, if the smartphone’s accelerometer detects sharp braking, the system would automatically check the distance to the vehicle in the front using the rear camera and whether the driver was distracted or fatigued using the front camera. And this holistic approach ensures a more accurate and comprehensive assessment of the driving behavior, enabling a more meaningful feedback.

STETKIEWICZ: So that sounds like a system that’s got several moving parts to it. And I imagine you had some technical challenges you had to deal with there. Can you talk about that?

NAMBI: One of our guiding principles in HAMS was to use commodity, off-the-shelf smartphone devices, right. This should be affordable, in the range of $100 to $200, so that you can just take out regular smartphones and enable this driver and driving monitoring. And that led to handling several technical challenges. For instance, we had to develop efficient computer vision algorithms that could run locally on the device with cheap smartphone processing units while still performing very well at low-light conditions. We wrote multiple papers and developed many of the novel algorithms which we implemented on very low-cost smartphones. And once we had such a monitoring system, right, you can imagine there’s several deployment opportunities, starting from fleet monitoring to even training new drivers, right. However, one application we hadn’t originally envisioned but turned out to be its most impactful use case even today is automated driver’s license testing. As you know, before you get a license, a driver is supposed to pass a test, but what happens in many places, including India, is that licenses are issued with very minimal or no actual testing, leading to unsafe and untrained drivers on the road. At the same time as we were working on HAMS, Indian government were looking at introducing technology to make testing more transparent and also automated. So we worked with the right set of partners, and we demonstrated to the government that HAMS could actually completely automate the entire license testing process. So we first deployed this system in Dehradun RTO (Regional Transport Office)—which is the equivalent of a DMV in the US—in 2019, working very closely with RTO officials to define what should be some of the evaluation criteria, right. Some of these would be very simple like, oh, is it the same candidate who is taking the test who actually registered for the test, right? And whether they are wearing seat belts. Did they scan their mirrors before taking a left turn and how well they performed in tasks like reverse parking and things like that.

STETKIEWICZ: So what’s been the government response to that? Have they embraced it or deployed it in a wider extent?

NAMBI: Yes, yes. So after the deployment in Dehradun in 2019, we actually open sourced the entire HAMS technology and our partners are now working with several state governments and scaled HAMS to several states in India. And as of today, we have around 28 RTOs where HAMS is actually being deployed, and the pass rate of such license test is just 60% as compared to 90-plus percent with manual testing. That’s the extensive rigor the system brings in. And now what excites me is after nearly five years later, we are now taking the next step in this project where we are now evaluating the long-term impact of this intervention on driving behavior and road safety. So we are collaborating with Professor Michael Kremer, who is a Nobel laureate and professor at University of Chicago, and his team to study how this technology has influenced driving patterns and accident rates over time. So this focus on closing the loop and moving beyond just deployment in the field to actually measuring the real impact, right, is something that truly excites me and that makes research at Microsoft is very unique. And that is actually one of the reasons why I joined Microsoft Research as a full-time after my internship, and this unique flexibility to work on real-world problems, develop novel research ideas, and actually collaborate with partners both internally and externally to deploy at scale is something that is very unique here.

STETKIEWICZ: So have you actually received any evidence that the project is working? Is driving getting safer?

NAMBI: Yes, these are very early analysis, and there are very positive insights we are getting from that. Soon we will be releasing a white paper on our study on this long-term impact.

STETKIEWICZ: That’s great. I look forward to that one. So you’ve also done some interesting work involving the Internet of Things, with an emphasis on making it more reliable and practical. So for those in our audience who may not know, the Internet of Things, or IoT, is a network that includes billions of devices and sensors in things like smart thermostats and fitness trackers. So talk a little bit about your work in this area.

NAMBI: Right, so IoT, as you know, is already transforming several industries with billions of sensors being deployed in areas like industrial monitoring, manufacturing, agriculture, smart buildings, and also air pollution monitoring. And if you think about it, these sensors provide critical data that businesses rely for decision making. However, a fundamental challenge is ensuring that the data collected from these sensors is actually reliable. If the data is faulty, it can lead to poor decisions and inefficiencies. And the challenge is that these sensor failures are always not obvious. What I mean by that is when a sensor stops working, it always doesn’t stop sending data, but it often continues to send some data which appear to be normal. And that’s one of the biggest problems, right. So detecting these errors is non-trivial because the faulty sensors can mimic real-world working data, and traditional solutions like deploying redundant sensors or even manually inspecting them are very expensive, labor intensive, and also sometimes infeasible, especially for remote deployments. Our goal in this work was to develop a simple and efficient way to remotely monitor the health of the IoT sensors. So what we did was we hypothesized that most sensor failures occurred due to the electronic malfunctions. It could be either due to short circuits or component degradation or due to environmental factors such as heat, humidity, or pollution. Since these failures originate within the sensor hardware itself, we saw an opportunity to leverage some of the basic electronic principles to create a novel solution. The core idea was to develop a way to automatically generate a fingerprint for each sensor. And by fingerprint, I mean the unique electrical characteristic exhibited by a properly working sensor. We built a system that could devise these fingerprints for different types of sensors, allowing us to detect failures purely based on the sensors internal characteristics, that is the fingerprint, and even without looking at the data it produces. Essentially what it means now is that we were able to tag each sensor data with a reliability score, ensuring verifiability.

STETKIEWICZ: So how does that technology get deployed in the real world? Is there an application where it’s being put to work today?

NAMBI: Yes, this technology, we worked together with Azure IoT and open-sourced it where there were several opportunities and several companies took the solution into their systems, including air pollution monitoring, smart buildings, industrial monitoring. The one which I would like to talk about today is about air pollution monitoring. As you know, air pollution is a major challenge in many parts of the world, especially in India. And traditionally, air quality monitoring relies on these expensive fixed sensors, which provide limited coverage. On the other hand, there is a rich body of work on low-cost sensors, which can offer wider deployment. Like, you can put these sensors on a bus or a vehicle and have it move around the entire city, where you can get much more fine-grained, accurate picture on the ground. But these are often unreliable because these are low-cost sensors and have reliability issues. So we collaborated with several startups who were developing these low-cost air pollution sensors who were finding it very challenging to gain trust because one of the main concerns was the accuracy of the data from low-cost sensors. So our solution seamlessly integrated with these sensors, which enabled verification of the data quality coming out from these low-cost air pollution sensors. So this bridged the trust gap, allowing government agencies to initiate large-scale pilots using low-cost sensors for fine-grain air-quality monitoring.

STETKIEWICZ: So as we’re talking about evolving technology, large language models, or LLMs, are also enabling big changes, and they’re not theoretical. They’re happening today. And you’ve been working on LLMs and their applicability to real-world problems. Can you talk about your work there and some of the latest releases?

NAMBI: So when ChatGPT was first released, I, like many people, was very skeptical. However, I was also curious both of how it worked and, more importantly, whether it could accelerate solutions to real-world problems. That led to the exploration of LLMs in education, where we fundamentally asked this question, can AI help improve educational outcomes? And this was one of the key questions which led to the development of Shiksha copilot, which is a genAI-powered assistant designed to support teachers in their daily work, starting from helping them to create personalized learning experience, design assignments, generate hands-on activities, and even more. Teachers today universally face several challenges, from time management to lesson planning. And our goal with Shiksha was to empower them to significantly reduce the time spent on this task. For instance, lesson planning, which traditionally took about 60 minutes, can now be completed in just five minutes using the Shiksha copilot. And what makes Shiksha unique is that it’s completely grounded in the local curriculum and the learning objectives, ensuring that the AI-generated content aligns very well with the pedagogical best practices. The system actually supports multilingual interactions, multimodal capabilities, and also integration with external knowledge base, making it very highly adaptable for different curriculums. Initially, many teachers were skeptical. Some feared this would limit their creativity. However, as they began starting to use Shiksha, they realized that it didn’t replace their expertise, but rather amplified it, enabling them to do work faster and more efficiently.

STETKIEWICZ: So, Akshay, the last time you and I talked about Shiksha copilot, it was very much in the pilot phase and the teachers were just getting their hands on it. So it sounds like, though, you’ve gotten some pretty good feedback from them since then.

NAMBI: Yes, so when we were discussing, we were doing this six-month pilot with 50-plus teachers where we gathered overwhelming positive feedback on how technologies are helping teachers to reduce time in their lesson planning. And in fact, they were using the system so much that they really enjoyed working with Shiksha copilot where they were able to do more things with much less time, right. And with a lot of feedback from teachers, we have improved Shiksha copilot over the past few months. And starting this academic year, we have already deployed Shiksha to 1,000-plus teachers in Karnataka. This is with close collaboration with our partners in … with the Sikshana Foundation and also with the government of Karnataka. And the response has been already incredibly encouraging. And looking ahead, we are actually focusing on again, closing this loop, right, and measuring the impact on the ground, where we are doing a lot of studies with the teachers to understand not just improving efficiency of the teachers but also measuring how AI-generated content enriched by teachers is actually enhancing student learning objectives. So that’s the study we are conducting, which hopefully will close this loop and understand our original question that, can AI actually help improve educational outcomes?

STETKIEWICZ: And is the deployment primarily in rural areas, or does it include urban centers, or what’s the target?

NAMBI: So the current deployment with 1,000 teachers is a combination of both rural and urban public schools. These are covering both English medium and Kannada medium teaching schools with grades from Class 5 to Class 10.

STETKIEWICZ: Great. So Shiksha was focused on helping teachers and making their jobs easier, but I understand you’re also working on some opportunities to use AI to help students succeed. Can you talk about that?

NAMBI: So as you know, LLMs are still evolving and inherently they are fragile, and deploying them in real-world settings, especially in education, presents a lot of challenges. With Shiksha, if you think about it, teachers remain in control throughout the interaction, making the final decision on whether to use the AI-generated content in the classroom or not. However, when it comes to AI tutors for students, the stakes are slightly higher, where we need to ensure the AI doesn’t produce incorrect answers, misrepresent concepts, or even mislead explanations. Currently, we are developing solutions to enhance accuracy and also the reasoning capabilities of these foundational models, particularly solving math problems. This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

STETKIEWICZ: So you’ve talked about working in computer vision and IoT and LLMs. What do those areas have in common? Is there some thread that weaves through the work that you’re doing?

NAMBI: That’s a great question. As a systems researcher, I’m quite interested in this end-to-end systems development, which means that my focus is not just about improving a particular algorithm but also thinking about the end-to-end system, which means that I, kind of, think about computer vision, IoT, and even LLMs as tools, where we would want to improve them for a particular application. It could be agriculture, education, or road safety. And then how do you think this holistically to come up with the best efficient system that can be deployed at population scale, right. I think that’s the connecting story here, that how do you have this systemic thinking which kind of takes the existing tools, improves them, makes it more efficient, and takes it out from the lab to real world.

STETKIEWICZ: So you’re working on some very powerful technology that is creating tangible benefits for society, which is your goal. At the same time, we’re still in the very early stages of the development of AI and machine learning. Have you ever thought about unintended consequences? Are there some things that could go wrong, even if we get the technology right? And does that kind of thinking ever influence the development process?

NAMBI: Absolutely. Unintended consequences are something I think about deeply. Even the most well-designed technology can have these ripple effects that we may not fully anticipate, especially when we are deploying it at population scale. For me, being proactive is one of the key important aspects. This means not only designing the technology at the lab but actually also carefully deploying them in real world, measuring its impact, and working with the stakeholders to minimize the harm. In most of my work, I try to work very closely with the partner team on the ground to monitor, analyze, how the technology is being used and what are some of the risks and how can we eliminate that. At the same time, I also remain very optimistic. It’s also about responsibility. If we are able to embed societal values, ethics, into the design of the system and involve diverse perspectives, especially from people on the ground, we can remain vigilant as the technology evolves and we can create systems that can truly deliver immense societal benefits while addressing many of the potential risks.

STETKIEWICZ: So we’ve heard a lot of great examples today about building technology to solve real-world problems and your motivation to keep doing that. So as you look ahead, where do you see your research going next? How will people be better off because of the technology you develop and the advances that they support?

NAMBI: Yeah, I’m deeply interested in advancing AI systems that can truly assist anyone in their daily tasks, whether it’s providing personalized guidance to a farmer in a rural village, helping a student get instant 24 by 7 support for their learning doubts, or even empowering professionals to work more efficiently. And to achieve this, my research is focusing on tackling some of the fundamental challenges in AI with respect to reasoning and reliability and also making sure that AI is more context aware and responsive to evolving user needs. And looking ahead, I envision AI as not just an assistant but also as an intelligent and equitable copilot seamlessly integrated into our everyday life, empowering individuals across various domains.

STETKIEWICZ: Great. Well, Akshay, thank you for joining us on Ideas. It’s been a pleasure.

[MUSIC]

NAMBI: Yeah, I really enjoyed talking to you, Chris. Thank you.

STETKIEWICZ: Till next time.

[MUSIC FADES]


[1] To ensure data privacy, all processing is done locally on the smartphone. This approach ensures that driving behavior insights remain private and secure with no personal data stored or shared.

The post Ideas: Building AI for population-scale systems with Akshay Nambi appeared first on Microsoft Research.

]]>
Ideas: Bug hunting with Shan Lu http://approjects.co.za/?big=en-us/research/podcast/ideas-bug-hunting-with-shan-lu/ Thu, 23 Jan 2025 17:07:54 +0000 http://approjects.co.za/?big=en-us/research/?p=1122786 Struggles with programming languages helped research manager Shan Lu find her calling as a bug hunter. She discusses one bug that really haunted her, the thousands she’s identified since, and how she’s turning to LLMs to help make software more reliable.

The post Ideas: Bug hunting with Shan Lu appeared first on Microsoft Research.

]]>
Ideas podcast | illustration of Shan Lu

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Shan Lu, a senior principal research manager at Microsoft. As a college student studying computer science, Lu saw classmates seemingly learn and navigate one new programming language after another with ease while she struggled. She felt like she just wasn’t meant to be a programmer. But this perceived lack of skill turned out to be, as an early mentor pointed out when she began grad school, what made Lu an ideal bug hunter. It’s a path she’s pursued since. After studying bugs in concurrent systems for more than 15 years—she and her coauthors built a tool that identified over a thousand in a 2019 award-winning paper—Lu is focusing on other types of code defects. Recently, Lu and collaborators combined traditional program analysis and large language models in the search for retry bugs, and she’s now exploring the potential role of LLMs in verifying the correctness of large software systems.

Learn more:

If at First You Don’t Succeed, Try, Try, Again…? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems
Publication, November 2024

Abstracts: November 4, 2024
Microsoft Research Podcast, November 2024

Automated Proof Generation for Rust Code via Self-Evolution 
Publication, October 2024

AutoVerus: Automated Proof Generation for Rust Code
Publication, September 2024

Efficient and Scalable Thread-Safety Violation Detection – Finding thousands of concurrency bugs during testing
Publication, October 2019

Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics
Publication, March 2008 

Verus: A Practical Foundation for Systems Verification
Publication, November 2024

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

SHAN LU: I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute. But just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI. You know, a lot of mechanical things that can actually now be done in a much more automated way, you know, by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever, you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

Today I’m talking to Shan Lu, a senior principal research manager at Microsoft Research and a computer science professor at the University of Chicago. Part of the Systems Research Group, Shan and her colleagues are working to make our computer systems, and I quote, “secure, scalable, fault tolerant, manageable, fast, and efficient.” That’s no small order, so I’m excited to explore the big ideas behind Shan’s influential research and find out more about her reputation as a bug bounty hunter. Shan Lu, welcome to Ideas!

SHAN LU: Thank you.

HUIZINGA: So I like to start these episodes with what I’ve been calling the “research origin story,” and you have a unique, almost counterintuitive, story about what got you started in the field of systems research. Would you share that story with our listeners?

LU: Sure, sure. Yeah. I grew up fascinating that I will become mathematician. I think I was good at math, and at some point, actually, until, I think, I entered college, I was still, you know, thinking about, should I do math? Should I do computer science? For whatever reason, I think someone told me, you know, doing computer science will help you; it’s easier to get a job. And I reluctantly pick up computer science major. And then there was a few years in my college, I had a really difficult time for programming. And I also remember that there was, like, I spent a lot of time learning one language—we started with Pascal—and I feel like I finally know what to do and then there’s yet another language, C, and another class, Java. And I remember, like, the teacher will ask us to do a programming project, and there are times I don’t even, I just don’t know how to get started. And I remember, at that time, in my class, I think there were … we only had like four girls taking this class that requires programming in Java, and none of us have learned Java before. And when we ask our classmates, when we ask the boys, they just naturally know what to do. It was really, really humiliating. Embarrassing. I had the feeling that, I felt like I’m just not born to be a programmer. And then, I came to graduate school. I was thinking about, you know, what kind of research direction I should do. And I was thinking that, oh, maybe I should do theory research, like, you know, complexity theory or something. You know, after a lot of back and forth, I met my eventual adviser. She was a great, great mentor to me, and she told me that, hey, Shan, you know, my group is doing research about finding bugs in software. And she said her group is doing system research, and she said a lot of current team members are all great programmers, and as a result, they are not really well-motivated [LAUGHS] by finding bugs in software!

HUIZINGA: Interesting.

LU: And then she said, you are really motivated, right, by, you know, getting help to developers, to help developers finding bugs in their software, so maybe that’s the research project for you. So that’s how I got started.

HUIZINGA: Well, let’s go a little bit further on this mentor and mentors in general. As Dr. Seuss might say, every “what” has a “who.” So by that I mean an inspirational person or people behind every successful researcher’s career. And most often, they’re kind of big names and meaningful relationships, but you have another unique story on who has influenced you in your career, so why don’t you tell us about the spectrum of people who’ve been influential in your life and your career?

LU: Mm-hmm. Yeah, I mean, I think I mentioned my adviser, and she’s just so supportive. And I remember, when I started doing research, I just felt like I seemed to be so far behind everyone else. You know, I felt like, how come everybody else knows how to ask, you know, insightful questions? And they, like, they know how to program really fast, bug free. And my adviser really encouraged me, saying, you know, there are background knowledge that you can pick up; you just need to be patient. But then there are also, like, you know how to do research, you know how to think about things, problem solving. And she encouraged me saying, Shan, you’re good at that!

HUIZINGA: Interesting!

LU: Well, I don’t know how she found out, and anyway, so she was super, super helpful.

HUIZINGA: OK, so go a little further on this because I know you have others that have influence you, as well.

LU: Yes. Yes, yes. And I think those, to be honest, I’m a very emotional, sensitive person. I would just, you know, move the timeline to be, kind of, more recent. So I joined Microsoft Research as a manager, and there’s something called Connect that, you know, people write down twice every year talking about what it is they’ve been doing. So I was just checking, you know, my members in my team to see what they have been doing over the years just to just get myself familiar with them. And I remember I read several of them. I felt like I almost have tears in my eyes! Like, I realized, wow, like … And just to give example, for Chris, Chris Hawblitzel, I read his Connect, and I saw that he’s working on something called program verification. It’s a very, very difficult problem, and [as an] outsider, you know, I’ve read many of his papers, but when I read, you know, his own writing, and I realized, wow, you know, it’s almost two decades, right. Like, he just keeps doing these very difficult things. And I read his words about, you know, how his old approach has problems, how he’s thinking about how to address that problem. Oh, I have an idea, right. And then spend multiple years to implement that idea and get improvement; find a new problem and then just find new solutions. And I really feel like, wow, I’m really, really, like, I feel like this is, kind of, like a, you know, there’s, how to say, a hero-ish story behind this, you know, this kind of goal, and you’re willing to spend many years to keep tackling this challenging problem. And I just feel like, wow, I’m so honored, you know, to be in the same group with a group of fighters, you know, determined to tackle difficult research problems.

HUIZINGA: Yeah. And I think when you talk about it, it’s like this is a person that was working for you, a direct report. [LAUGHTER] And often, we think about our heroes as being the ones who mentored us, who taught us, who managed us, but yours is kind of 360! It’s like …

LU: True!

HUIZINGA: … your heroes [are] above, beside and below.

LU: Right. And I would just say that I have many other, you know, direct reports in my group, and I have, you know, for example, say a couple other … my colleagues, my direct reports, Dan Ports and Jacob Nelson. And again, this is something like their story really inspired me. Like, they were, again, spent five or six years on something, and it looks like, oh, it’s close to the success of tech transfer, and then something out of their control happened. It happened because Intel decided to stop manufacturing a chip that their research relied on. And it’s, kind of, like the end of the world to them, …

HUIZINGA: Yeah.

LU: … and then they did not give up. And then, you know, like, one year later, they found a solution, you know, together with their product team collaborators.

HUIZINGA: Wow.

LU: And I still feel like, wow, you know, I feel so … I feel like I’m inspired every day! Like, I’m so happy to be working together with, you know, all these great people, great researchers in my team.

HUIZINGA: Yeah. Wow. So much of your work centers on this idea of concurrent systems and I want you to talk about some specific examples of this work next, but I think it warrants a little explication upfront for those people in the audience who don’t spend all their time working on concurrent systems themselves. So give us a short “101” on concurrent systems and explain why the work you do matters to both the people who make it and the people who use it.

LU: Sure. Yeah. So I think a lot of people may not realize … so actually, the software we’re using every day, almost every software we use these days are concurrent. So the meaning of concurrent is that you have multiple threads of execution going on at the same time, in parallel. And then, when we go to a web browser, right, so it’s not just one rendering that is going on. There are actually multiple concurrent renderings that is going on. So the problem of writing … for software developers to develop this type of concurrent system, a challenge is the timing. So because you have multiple concurrent things going on, it’s very difficult to manage and reason about, you know, what may happen first, what may happen second. And also, it’s, like, there’s an inherent non-determinism in it. What happened first this time may happen second next time. So as a result, a lot of bugs are introduced by this. And it was a very challenging problem because I would say about 20 years ago, there was a shift. Like, in the older days, actually most of our software is written in a sequential way instead of a concurrent way. So, you know, a lot of developers also have a difficult time to shift their mindset from the sequential way of reasoning to this concurrent way of reasoning.

HUIZINGA: Right. Well, and I think, from a user’s perspective, all you experience is what I like to call the spinning beachball of doom. It’s like I’ve asked something, and it doesn’t want to give, so [LAUGHS] … And this is, like, behind the scenes from a reasoning perspective of, how do we keep that from happening to our users? How do we identify the bugs? Which we’ll get to in a second. Umm. Thanks for that. Your research now revolves around what I would call the big idea of learning from mistakes. And in fact, it all seems to have started with a paper that you published way back in 2008 called “Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics,” and you say this strongly influenced your research style and approach. And by the way, I’ll note that this paper received the Most Influential Paper Award in 2022 from ASPLOS, which is the Architectural Support for Programming Languages and Operating Systems. Huge mouthful. And it also has more than a thousand citations, so I dare say it’s influenced other researchers’ approach to research, as well. Talk about the big idea behind this paper and exactly how it informed your research style and approach today.

LU: Mm-hmm. Yeah. So I think this, like, again, went back to the days that I, you know, my PhD days, I started working with my adviser, you know, YY (Yuanyuan Zhou). So at that time, there had been a lot of people working on bug finding, but then now when I think about it, people just magically say, hey, I want to look at this type of bug. Just magically, oh, I want to look at that type of bug. And then, my adviser at that time suggested to me, saying, hey, maybe, you know, actually take a look, right. At that time, as I mentioned, software was kind of shifting from sequential software to concurrent software, and my adviser was saying, hey, just take a look at those real systems bug databases, and see what type of concurrency bugs are actually there. You know, instead of just randomly saying, oh, I want to work on this type of bug.

HUIZINGA: Oh, yeah.

LU: And then also, of course, it’s not just look at it. It’s not just like you read a novel or something, right. [LAUGHTER] And again, my adviser said, hey, Shan, right, you have this, you have a connection, natural connection, you know, with bugs and the developers who commit …

HUIZINGA: Who make them …

LU: Who make them! [LAUGHTER] So she said, you know, try to think about the patterns behind them, right. Try to think about whether you can generalize some …

HUIZINGA: Interesting …

LU: … characteristics, and use that to guide people’s research in this domain. And at that time, we were actually thinking we don’t know whether, you know, we can actually write a paper about it because traditionally you publish a paper, just say, oh, I have a new tool, right, which can do this and that. At that time in system conferences, people rarely have, you know, just say, here’s a study, right. But we studied that, and indeed, you know, I had this thought that, hey, why I make a lot of mistakes. And when I study a lot of bugs, the more and more, I feel, you know, there’s a reason behind it, right. It’s like I’m not the only dumb person in the world, right? [LAUGHTER] There’s a reason that, you know, there’s some part of this language is difficult to use, right, and there’s a certain type of concurrent reasoning, it’s just not natural to many people, right. So because of that, there are patterns behind these bugs. And so at that time, we were surprised that the paper was actually accepted. Because I’m just happy with the learning I get. But after this paper was accepted, in the next, I would say, many years, there are more and more people realize, hey, before we actually, you know, do bug-finding things, let’s first do a study, right, to understand, and then this paper was … yeah … I was very happy that it was cited many, many times.

HUIZINGA: Yeah. And then gets the most influential paper many years later.

LU: Many years later. Yes.

HUIZINGA: Yeah, I feel like there’s a lot of things going through my head right now, one of which is what AI is, is a pattern detector, and you were doing that before AI even came on the scene. Which goes to show you that humans are pretty good at pattern detection also. We might not do as fast as …

LU: True.

HUIZINGA: … as an AI but … so this idea of learning from mistakes is a broad theme. Another theme that I see coming through your papers and your work is persistence. [LAUGHTER] And you mentioned this about your team, right. I was like, these people are people who don’t give up. So we covered this idea in an Abstracts podcast recently talking about a paper which really brings this to light: “If at First You Don’t Succeed, Try, Try Again.” That’s the name of the paper. And we didn’t have time to discuss it in depth at the time because the Abstracts show is so quick. But we do now. So I’d like you to expand a little bit on this big idea of persistence and how large language models are not only changing the way programming and verification happens but also providing insights into detecting retry bugs.

LU: Yes. So I guess maybe I will, since you mentioned this persistence, you know, after that “Learning from Mistakes” paper—so that was in 2008—and in the next 10 years, a little bit more than 10 years, in terms of persistence, right, so we have continued, me and my students, my collaborators, we have continued working on, you know, finding concurrency bugs …

HUIZINGA: Yeah.

LU: … which is related to, kind of related to, why I’m here at Microsoft Research. And we keep doing it, doing it, and then I feel like a high point was that I had a collaboration with my now colleagues here, Madan Musuvathi and Suman Nath. So we built a tool to detect concurrency bugs, and after more than 15 years of effort on this, we were able to find more than 1,000 concurrency bugs. It was built in a tool called Torch that was deployed in the company, and it won the Best Paper Award at the top system conference, SOSP, and it was actually a bittersweet moment. This paper seems to, you know, put an end …

HUIZINGA: Oh, interesting!

LU: … to our research. And also some of the findings from that paper is that we used to do very sophisticated program analysis to reason about the timing. And in that paper, we realized actually, sometimes, if you’re a little bit fuzzy, don’t aim to do perfect analysis, the resulting tool is actually more effective. So after that paper, Madan, Suman, and me, we kind of, you know, shifted our focus to looking at other types of bugs. And at the same time, the three of us realized the traditional, very precise program analysis may not be needed for some of the bug finding. So then, for this paper, this retry bugs, after we shifted our focus away from concurrency bugs, we realized, oh, there are many other types of important bugs, such as, in this case, like retry, right, when your software goes wrong, right. Another thing we learned is that it looks like you can never eliminate all bugs, so something will go wrong, [LAUGHTER] and then so that’s why you need something like retry, right. So like if something goes wrong, at least you won’t give up immediately.

HUIZINGA: Right.

LU: The software will retry. And another thing that started from this earlier effort is we started using large language models because we realized, yeah, you know, traditional program analysis sometimes can give you a very strong guarantee, but in some other cases, like in this retry case, some kind of fuzzy analysis, you know, not so precise, offered by large language models is sometimes even more beneficial. Yeah. So that’s kind of, you know, the story behind this paper.

HUIZINGA: Yeah, yeah, yeah, yeah. So, Shan, we’re hearing a lot about how large language models are writing code nowadays. In fact, NVIDIA’s CEO says, mamas, don’t let your babies grow up to be coders because AI’s going to do that. I don’t know if he’s right, but one of the projects you’re most excited about right now is called Verus, and your colleague Jay Lorch recently said that he sees a lot of synergy between AI and verification, where each discipline brings something to the other, and Rafah Hosn has referred to this as “co-innovation” or “bidirectional enrichment.” I don’t know if that’s exactly what is going on here, but it seems like it is. Tell us more about this project, Verus, and how AI and software verification are helping each other out.

LU: Yes, yes, yes, yes. I’m very excited about this project now! So first of all, starting from Verus. So Verus is a tool that helps you verify the correctness of Rust code. So this is a … it’s a relatively new tool, but it’s creating a lot of, you know, excitement in the research community, and it’s created by my colleague Chris Hawblitzel and his collaborators outside Microsoft Research.

HUIZINGA: Interesting.

LU: And as I mentioned, right, this is a part that, you know, really inspired me. So traditionally to verify, right, your program is correct, it requires a lot of expertise. You actually have to write your proof typically in a special language. And, you know, so a lot of people, including me, right, who are so eager to get rid of bugs in my software, but there are people told me, saying just to learn that language—so they were referring to a language called Coq—just to learn that language, they said it takes one or two years. And then once you learn that language, right, then you have to learn about how to write proofs in that special language. So people, particularly in the bug-finding community, people know that, oh, in theory, you can verify it, but in reality, people don’t do that. OK, so now going back to this Verus tool, why it’s exciting … so it actually allows people to write proofs in Rust. So Rust is an increasingly popular language. And there are more and more people picking up Rust. It’s the first time I heard about, oh, you can, you know, write proofs in a popular language. And also, another thing is in the past, you cannot verify an implementation directly. You can only verify something written in a special language. And the proof is proving something that is in a special language. And then finally, that special language is maybe then transformed into an implementation. So it’s just, there’s just too many special languages there.

HUIZINGA: A lot of layers.

LU: A lot of layers. So now this Verus tool allows you to write a proof in Rust to prove an implementation that is in Rust. So it’s very direct. I just feel like I’m just not good at learning a new language.

HUIZINGA: Interesting.

LU: So when I came here, you know, and learned about this Verus tool, you know, by Chris and his collaborators, I feel like, oh, looks like maybe I can give it a try. And surprisingly, I realized, oh, wow! I can actually write proofs using this Verus tool.

HUIZINGA: Right.

LU: And then, of course, you know, I was told, if you really want to, right, write proofs for large systems, it still takes a lot of effort. And then this idea came to me that, hey, maybe, you know, these days, like, large language models can write code, then why not let large language models write proofs, right? And of course, you know, other people actually had this idea, as well, but there’s a doubt that, you know, can large language models really write proofs, right? And also, people have this feeling that, you know, large language models seem not very disciplined, you know, by nature. But, you know, that’s what intrigued me, right. And also, I used to be a doubter for, say, GitHub Copilot. USED to! Because I feel like, yes, it can generate a lot of code, but who knows [LAUGHS] …

HUIZINGA: Whether it’s right …

LU: What, what is … whether it’s right?

HUIZINGA: Yeah.

LU: Right, so I feel like, wow, you know, this could be a game-changer, right? Like, if AI can write not only code but also proofs. Yeah, so that’s what I have been doing. I’ve been working on this for one year, and I gradually get more collaborators both, you know, people in Microsoft Research Asia, and, you know, expertise here, like Chris, and Jay Lorch. They all help me a lot. So we actually have made a lot of progress.

HUIZINGA: Yeah.

LU: Like, now it’s, like, we’ve tried, like, for example, for some small programs, benchmarks, and we see that actually large language models can correctly prove the majority of the benchmarks that we throw to it. Yeah. It’s very, very exciting.

HUIZINGA: Well, and so … and we’re going to talk a little bit more about some of those doubts and some of those interesting concerns in a bit. I do want you to address what I think Jay was getting at, which is that somehow the two help each other. The verification improves the AI. The AI improves the verification.

LU: Yes, yes.

HUIZINGA: How?

LU: Yes. My feeling is that a lot of people, if they’re concerned with using AI, it’s because they feel like there’s no guarantee for the content generated by AI, right. And then we also all heard about, you know, hallucination. And I tried myself. Like, I remember, at some point, if I ask AI, say, you know, which is bigger: is it three times three or eight? And the AI will tell me eight is bigger. And … [LAUGHTER]

HUIZINGA: Like, what?

LU: So I feel like verification can really help AI …

HUIZINGA: Get better …

LU: … because now you can give, you know, kind of, add in mathematical rigors into whatever that is generated by AI, right. And I say it would help AI. It will also help people who use AI, right, so that they know what can be trusted, right.

HUIZINGA: Right.

LU: What is guaranteed by this content generated by AI?

HUIZINGA: Yeah, yeah, yeah.

LU: Yeah, and now of course AI can help verification because, you know, verification, you know, it’s hard. There is a lot of mathematical reasoning behind it. [LAUGHS] And so now with AI, it will enable verification to be picked up by more and more developers so that we can get higher-quality software.

HUIZINGA: Yeah.

LU: Yeah.

HUIZINGA: Yeah. And we’ll get to that, too, about what I would call the democratization of things. But before that, I want to, again, say an observation that I had based on your work and my conversations with you is that you’ve basically dedicated your career to hunting bugs.

LU: Yes.

HUIZINGA: And maybe that’s partly due to a personal story about how a tiny mistake became a bug that haunted you for years. Tell us the story.

LU: Yes.

HUIZINGA: And explain why and how it launched a lifelong quest to understand, detect, and expose bugs of all kinds.

LU: Yes. So before I came here, I already had multiple times, you know, interacting with Microsoft Research. So I was a summer intern at Microsoft Research Redmond almost 20 years ago.

HUIZINGA: Oh, wow!

LU: I think it was in the summer of 2005. And I remember I came here, you know, full of ambition. And I thought, OK, you know, I will implement some smart algorithm. I will deliver some useful tools. So at that time, I had just finished two years of my PhD, so I, kind of, just started my research on bug finding and so on. And I remember I came here, and I was told that I need to program in C#. And, you know, I just naturally have a fear of learning a new language. But anyway, I remember, I thought, oh, the task I was assigned was very straightforward. And I think I went ahead of myself. I was thinking, oh, I want to quickly finish this, and I want to do something more novel, you know, that can be more creative. But then this simple task I was assigned, I ended up spending the whole summer on it. So the tool that I wrote was supposed to process very huge logs. And then the problem is my software is, like, you run it initially … So, like, I can only run it for 10 minutes because my software used so much memory and it will crash. And then, I spent a lot of time … I was thinking, oh, my software is just using too much memory. Let me optimize it, right. And then so, I, you know, I try to make sure to use memory in a very efficient way, but then as a result, instead of crashing every 10 minutes, it will just crash after one hour. And I know there’s a bug at that time. So there’s a type of bug called memory leak. I know there’s a bug in my code, and I spent a lot of time and there was an engineer helping me checking my code. We spent a lot of time. We were just not able to find that bug. And at the end, we … the solution is I was just sitting in front of my computer waiting for my program to crash and restart. [LAUGHTER] And at that time, because there was very little remote working option, so in order to finish processing all those logs, it’s like, you know, after dinner, I …

HUIZINGA: You have to stay all night!

LU: I have to stay all night! And all my intern friends, they were saying, oh, Shan, you work really hard! And I’m just feeling like, you know what I’m doing is just sitting in front of my computer waiting [LAUGHTER] for my program to crash so that I can restart it! And near the end of my internship, I finally find the bug. It turns out that I missed a pair of brackets in one line of code.

HUIZINGA: That’s it.

LU: That’s it.

HUIZINGA: Oh, my goodness.

LU: And it turns out, because I was used to C, and in C, when you want to free, which means deallocate, an array, you just say “free array.” And if I remember correctly, in this language, C#, you have to say, “free this array name” and you put a bracket behind it. Otherwise, it will only free the first element. And I … it was a nightmare. And I also felt like, the most frustrating thing is, if it’s a clever bug, right … [LAUGHS]

HUIZINGA: Sure.

LU: … then you feel like at least I’m defeated by something complicated …

HUIZINGA: Smart.

LU: Something smart. And then it’s like, you know, also all this ambition I had about, you know, doing creative work, right, with all these smart researchers in MSR (Microsoft Research), I feel like I ended up achieving very little in my summer internship.

HUIZINGA: But maybe the humility of making a stupid mistake is the kind of thing that somebody who’s good at hunting bugs … It’s like missing an error in the headline of an article, because the print is so big [LAUGHTER] that you’re looking for the little things in the … I know that’s a journalist’s problem. Actually, I actually love that story. And it, kind of, presents a big picture of you, Shan, as a person who has a realistic, self-awareness of … and humility, which I think is rare at times in the software world. So thanks for sharing that. So moving on. When we talked before, you mentioned the large variety of programming languages and how that can be a barrier to entry or at least a big hurdle to overcome in software programming and verification. But you also talked about, as we just mentioned, how LLMs have been a democratizing force …

LU: Yes.

HUIZINGA: in this field. So going back to when you first started …

LU: Yes.

HUIZINGA: … and what you see now with the advent of tools like GitHub Copilot, …

LU: Yes.

HUIZINGA: … what … what’s changed?

LU: Oh, so much has changed. Well, I don’t even know how to start. Like, I used to be really scared about programming. You know, when I tell this story, a lot of people say, no, I don’t believe you. And I feel like it’s a trauma, you know.

HUIZINGA: Sure.

LU: I almost feel like it’s like, you know, the college-day me, right, who was scared of starting any programming project. Somehow, I felt humiliated when asking those very, I feel like, stupid questions to my classmates. It almost changed my personality! It’s like … for a long time, whenever someone introduced me to a new software tool, my first reaction is, uh, I probably will not be able to successfully even install it. Like whenever, you know, there’s a new language, my first reaction is, uh, no, I’m not good at it. And then, like, for example, this GitHub Copilot thing, actually, I did not try it until I joined Microsoft. And then I, actually, I haven’t programmed for a long time. And then I started collaborating with people in Microsoft Research Asia, and he writes programs in Python, right. And I have never written a single line of Python code before. And also, this Verus tool. It helps you to verify code in Rust, but I have never learned Rust before. So I thought, OK, maybe let me just try GitHub Copilot. And wow! You know, it’s like I realized, wow! Like … [LAUGHS]

HUIZINGA: I can do this!

LU: I can do this! And, of course, sometimes I feel like my colleagues may sometimes be surprised because on one hand it looks like I’m able to just finish, you know, write a Rust function. But on some other days, I ask very basic questions, [LAUGHTER] and I have those questions because, you know, the GitHub Copilot just helps me finish! [LAUGHS]

HUIZINGA: Right.

LU: You know, I’m just starting something to start it, and then it just helps me finish. And I wish, when I started my college, if at that time there was GitHub Copilot, I feel like, you know, my mindset towards programming and towards computer science might be different. So it does make me feel very positive, you know, about, you know, what future we have, you know, with AI, with computer science.

HUIZINGA: OK, usually, I ask researchers at this time, what could possibly go wrong if you got everything right? And I was thinking about this question in a different way until just this minute. I want to ask you … what do you think that it means to have a tool that can do things for you that you don’t have to struggle with? And maybe, is there anything good about the struggle? Because you’re framing it as it sapped your confidence.

LU: [LAUGHS] Yes.

HUIZINGA: And at the same time, I see a woman who emerged stronger because of this struggle with an amazing career, a huge list of publications, influential papers, citations, leadership role. [LAUGHTER] So in light of that …

LU: Right.

HUIZINGA: … what do you see as the tension between struggling to learn a new language versus having this tool that can just do it that makes you look amazing? And maybe the truth of it is you don’t know!

LU: Yeah. That’s a very good point. I guess you need some kind of balance. And on one hand, yes, I feel like, again, right, this goes back to like my internship. I left with the frustration that I felt like I have so much creativity to contribute, and yet I could not because of this language barrier. You know, I feel positive in the sense that just from GitHub Copilot, right, how it has enabled me to just bravely try something new. I feel like this goes beyond just computer science, right. I can imagine it’ll help people to truly unleash their creativity, not being bothered by some challenges in learning the tool. But on the other hand, you made a very good point. My adviser told me she feels like, you know, I write code slowly, but I tend to make fewer mistakes. And the difficulty of learning, right, and all these nightmares I had definitely made me more … more cautious? I pay more respect to the task that is given to me, so there is definitely the other side of AI, right, which is, you feel like everything is easy and maybe you do not have the experience of those bugs, right, that a software can bring to you and you have overreliance, right, on this tool.

HUIZINGA: Yeah!

LU: So hopefully, you know, some of the things we we’re doing now, right, like for example, say verification, right, like bringing this mathematical rigor to AI, hopefully that can help.

HUIZINGA: Yeah. You know, even as you unpack the nuances there, it strikes me that both are good. Both having to struggle and learning languages and understanding …

LU: Yeah.

HUIZINGA: … the core of it and the idea that in natural language, you could just say, here’s what I want to happen, and the AI does the code, the verification, etc. That said, do we trust it? And this was where I was going with the first “what could possibly go wrong?” question. How do we know that it is really as clever as it appears to be? [LAUGHS]

LU: Yeah, I think I would just use the research problem we are working on now, right. Like, I think on one hand, I can use AI to generate a proof, right, to prove the code generated by AI is correct. But having said that, even if we’re wildly successful, you know, in this thing, human beings’ expertise is still needed because just take this as an example. What do you mean by “correct,” right?

HUIZINGA: Sure.

LU: And so someone first has to define what correctness means. And then so far, the experience shows that you can’t just define it using natural language because our natural language is inherently imprecise.

HUIZINGA: Sure.

LU: So you still need to translate it to a formal specification in a programming language. It could be in a popular language like in Rust, right, which is what Verus is aiming at. And then we are, like, for example, some of the research we do is showing that, yes, you know, I can also use AI to do this translation from natural language to specification. But again, then, who to verify that, right? So at the end of the day, I think we still do need to have humans in the loop. But what we can do is to lower the burden and make the interface not so complicated, right. So that it’ll be easy for human beings to check what AI has been doing.

HUIZINGA: Yeah. You know, everything we’re talking about just reinforces this idea that we’re living in a time where the advances in computer science that seemed unrealistic or impossible, unattainable even a few years ago are now so common that we take it for granted. And they don’t even seem outrageous, but they are. So I’m interested to know what, if anything, you would classify now as “blue sky” research in your field. Maybe something in systems research today that looks like a moonshot. You’ve actually anchored this in the fact that you, kind of, have, you know, blinders on for the work you’re doing—head down in the in the work you’re doing—but even as you peek up from the work that might be outrageous, is there anything else? I just like to get this out there that, you know, what’s going on 10 years down the line?

LU: You know, sometimes I feel like I’m just now so much into my own work, but, you know, occasionally, like, say, when I had a chat with my daughter and I explained to her, you know, oh, I’m working on, you know, not only having AI to generate code but also having AI to prove, right, the code is correct. And she would feel, wow, that sounds amazing! [LAUGHS] So I don’t know whether that is, you know, a moonshot thing, but that’s a thing that I’m super excited about …

HUIZINGA: Yeah.

LU: … about the potential. And then there also have, you know, my colleagues, we spend a lot of time building systems, and it’s not just about correctness, right. Like, the verification thing I’m doing now is related to automatically verify it’s correct. But also, you need to do a lot of performance tuning, right. Just so that your system can react fast, right. It can have good utilization of computer resources. And my colleagues are also working on using AI, right, to automatically do performance tuning. And I know what they are doing, so I don’t particularly feel that’s a moonshot, but I guess …

HUIZINGA: I feel like, because you are so immersed, [LAUGHTER] that you just don’t see how much we think …

LU: Yeah!

HUIZINGA: … it’s amazing. Well, I’m just delighted to talk to you today, Shan. As we close … and you’ve sort of just done a little vision casting, but let’s take your daughter, my daughter, [LAUGHTER] all of our daughters …

LU: Yes!

HUIZINGA: How does what we believe about the future in terms of these things that we could accomplish influence the work we do today as sort of a vision casting for the next “Shan Lu” who’s struggling in undergrad/grad school?

LU: Yes, yes, yes. Oh, thank you for asking that question. Yeah, I have to say, you know, I think we’re in a very interesting time, right, with all this AI thing.

HUIZINGA: Isn’t that a curse in China? “May you live in interesting times!”

LU: And I think there were times, actually, you know, before I myself fully embraced AI, I was … indeed I had my daughter in mind. I was worried when she grows up, what would happen? There will be no job for her because everything will be done by AI!

HUIZINGA: Oh, interesting.

LU: But then now, now that I have, you know, kind of fully embraced AI myself, actually, I see this more and more positive. Like you said, I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute, but just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI, you know, a lot of mechanical things that can actually now be done in a much more automated way by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about. Hopefully, they don’t have to, you know, go through what I went through, right, to finally be able to contribute. But then, of course, you know, at the same time, I do feel this responsibility of me, my colleagues, MSR, we have the capability and also the responsibility, right, of building AI tools in a responsible way so that it will be used in a positive way by the next generation.

HUIZINGA: Yeah. Shan Lu, thank you so much for coming on the show today. [MUSIC] It’s been absolutely delightful, instructive, informative, wonderful.

LU: Thank you. My pleasure.

The post Ideas: Bug hunting with Shan Lu appeared first on Microsoft Research.

]]>
Ideas: AI for materials discovery with Tian Xie and Ziheng Lu http://approjects.co.za/?big=en-us/research/podcast/ideas-ai-for-materials-discovery-with-tian-xie-and-ziheng-lu/ Thu, 16 Jan 2025 10:12:46 +0000 http://approjects.co.za/?big=en-us/research/?p=1120956 How do you generate and test materials that don’t exist yet? Researchers Tian Xie and Ziheng Lu share the story behind MatterGen and MatterSim, AI tools poised to transform materials discovery and help drive advances in energy, manufacturing, and sustainability.

The post Ideas: AI for materials discovery with Tian Xie and Ziheng Lu appeared first on Microsoft Research.

]]>
Ideas podcast | illustration of Tian Xie and Ziheng Lu

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

In this episode, guest host Lindsay Kalter talks with Principal Research Manager Tian Xie and Principal Researcher Ziheng Lu about their groundbreaking AI tools for materials discovery. Xie introduces MatterGen, which can generate new materials tailored to the specific needs of an application, such as materials with powerful magnetic properties or those that efficiently conduct lithium ions for better batteries. Lu explains how MatterSim accelerates simulations to validate and refine these discoveries. Together, these tools act as a “copilot” for scientists, proposing creative hypotheses and exploring vast material spaces far beyond traditional methods. The conversation highlights the challenges of bridging AI and experimental science and the potential of these tools to drive advancements in energy, manufacturing, and sustainability. At the cutting edge of AI research, Xie and Lu share their vision for the future of materials design and how these technologies could transform the field.

Learn more:

MatterSim: A deep-learning model for materials under real-world conditions 
Microsoft Research blog, May 2024 

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures 
Publication, March 2024 

MatterSim (opens in new tab) 
GitHub repo 

A generative model for inorganic materials design (opens in new tab) 
Publication, January 2025 

MatterGen: A Generative Model for Materials Design 
Video, Microsoft Research Forum, June 2024 

MatterGen: Property-guided materials design 
Microsoft Research blog, December 2023 

MatterGen (opens in new tab) 
GitHub repo 

Crystal Diffusion Variational Autoencoder for Periodic Material Generation 
Publication, October 2021

Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties
Publication, April 2018

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE] 

TIAN XIE: Yeah, so the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018, when I was giving a talk about property-prediction models, right, one of the first questions people asked is, instead of going from material structure to properties, can you, kind of, inversely generate the materials directly from their property conditions? So in a way, this is, kind of, like a dream for material scientists because, like, the end goal is really about finding materials property, right, [that] will satisfy your application. 

ZIHENG LU: Previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, it really can make things different, right. You have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly. 

[TEASER ENDS] 

LINDSAY KALTER: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES] 

I’m your guest host, Lindsay Kalter. Today I’m talking to Microsoft Principal Research Manager Tian Xie and Microsoft Principal Researcher Ziheng Lu. Tian is doing fascinating work with MatterGen, an AI tool for generating new materials guided by specific design requirements. Ziheng is one of the visionaries behind MatterSim, which puts those new materials to the test through advanced simulations. Together, they’re redefining what’s possible in materials science. Tian and Ziheng, welcome to the podcast. 

TIAN XIE: Very excited to be here. 

ZIHENG LU: Thanks, Lindsay, very excited. 

KALTER: Before we dig into the specifics of MatterGen and MatterSim, let’s give our audience a sense of how you, as researchers, arrived at this moment. Materials science, especially at the intersection of computer science, is such a cutting-edge and transformative field. What first drew each of you to this space? And what, if any, moment or experience made you realize this was where you wanted to innovate? Tian, do you want to start? 

XIE: So I started working on AI for materials back in 2015, when I started my PhD. So I come as a chemist and materials scientist, but I was, kind of, figuring out what I want to do during my PhD. So there is actually one moment really drove me into the field. That was AlphaGo. AlphaGo was, kind of, coming out in 2016, where it was able to beat the world champion in go in 2016. I was extremely impressed by that because I, kind of, learned how to do go, like, in my childhood. I know how hard it is and how much effort those professional go players have spent, right, in learning about go. So I, kind of, have the feeling that if AI can surpass the world-leading go players, one day, it will too surpass material scientists, right, in their ability to design novel materials. So that’s why I ended up deciding to focus my entire PhD on working on AI for materials. And I have been working on that since then. So it was actually very interesting because it was a very small field back then. And it’s great to see how much progress has been made, right, in the past 10 years and how much bigger a field it is now compared with 10 years ago. 

LU: That’s very interesting, Tian. So, actually, I think I started, like, two years before you as a PhD student. So I, actually, I was trained as a computational materials scientist solely, not really an AI expert. But at that time, the computational materials science did not really work that well. It works but not working that well. So after, like, two or three years, I went back to experiments for, like, another two or three years because, I mean, the experiment is always the gold standard, right. And I worked on this experiments for a few years, and then about three years ago, I went back to this field of computation, especially because of AI. At that time, I think GPT and these large AI models that currently we’re using is not there, but we already have their prior forms like BERT, so we see the very large potential of AI. We know that these large AIs might work. So one idea is really to use AI to learn the entire space of materials and really grasp the physics there, and that really drove me to this field and that’s why I’m here working on this field, yeah. 

KALTER: We’re going to get into what MatterGen and MatterSim mean for materials science—the potential, the challenges, and open questions. But first, give us an overview of what each of these tools are, how they do what they do, and—as this show is about big ideas—the idea driving the work. Ziheng, let’s have you go first. 

LU: So MatterSim is a tool to do in silico characterizations of materials. If you think about working on materials, you have several steps. You first need to synthesize it, and then you need to characterize this. Basically, you need to know what property, what structures, whatever stuff about these materials. So for MatterSim, what we want to do is to really move the characterization process, a lot of these processes, into using computations. So the idea behind MatterSim is to really learn the fundamentals of physics. So we learn the energies and forces and stresses from these atomic structures and the charge densities, all of these things, and then with these, we can really simulate any sort of materials using our computational machines. And then with these, we can really characterize a lot of these materials’ properties using our computer, that is very fast. It’s much faster than we do experiments so that we can accelerate the materials design. So just in a word, basically, you input your material into your computer, a structure into your computer, and MatterSim will try to simulate these materials like what you do in a furnace or with an XRD (x-ray diffraction) and then you get your properties out of that, and a lot of times it’s much faster than you do experiments. 

KALTER: All right, thank you very much. Tian, why don’t you tell us about MatterGen? 

XIE: Yeah, thank you. So, actually, Ziheng, once you start with explaining MatterSim, it makes it much easier for me to explain MatterGen. So MatterGen actually represents a new way to design materials with generative AI. Material discovery is like finding needles in a haystack. You’re looking for a material with a very specific property for a material application. For example, like finding a room-temperature superconductor or finding a solid that can conduct a lithium ion very well inside a battery. So it’s like finding one very specific material from a million, kind of, candidates. So the conventional way of doing material discovery is via screening, where you, kind of, go over millions of candidates to find the one that you’re looking for, where MatterSim is able to significantly accelerate that process by making the simulation much faster. But it’s still very inefficient because you need to go through this million candidates, right. So with MatterGen, you can, kind of, directly generate materials given the prompts of the design requirements for the application. So this means that you can discover materials—discover useful materials— much more efficiently. And it also allows us to explore a much larger space beyond the set of known materials. 

KALTER: Thank you, Tian. Can you tell us a little bit about how MatterGen and MatterSim work together? 

XIE: So you can really think about MatterSim and MatterGen accelerating different parts of materials discovery process. MatterSim is trying to accelerate the simulation of material properties, while MatterGen is trying to accelerate the search of novel material candidates. It means that they can really work together as a flywheel and you can compound the acceleration from both models. They are also both foundation AI models, meaning they can both be used for a broad range of materials design problems. So we’re really looking forward to see how they can, kind of, working together iteratively as a tool to design novel materials for a broad range of applications. 

LU: I think that’s a very good, like, general introduction of how they work together. I think I can provide an example of how they really fit together. If you want a material with a specific, like, bulk modulus or lithium-ion conductivity or thermal conductivity for your CPU chips, so basically what you want to do is start with a pool of material structures, like some structures from the database, and then you compute or you characterize your wanted property from that stack of materials. And then what you do, you’ve got these properties and structure pairs, and you input these pairs into MatterGen. And MatterGen will be able to give you a lot more of these structures that are highly possible to be real. But the number will be very large. For example, for the bulk modulus, I don’t remember the number we generated in our work … was that like thousands, tens of thousands? 

XIE: Thousands, tens of thousands. 

LU: Yeah, that would be a very large number pool even with MatterGen, so then the next step will be, how would you like to screen that? You cannot really just send all of those structures to a lab to synthesize. It’s too much, right. That’s when MatterSim again comes in. So MatterSim comes in and screen all those structures again and see which ones are the most likely to be synthesized and which ones have the closest property you wanted. And then after screening, you probably get five, 10 top candidates and then you send to a lab. Boom, everything goes down. That’s it. 

KALTER: I’m wondering if there’s any prior research or advancements that you drew from in creating MatterGen and MatterSim. Were there any specific breakthroughs that influenced your approaches at all? 

LU: Thanks, Lindsay. I think I’ll take that question first. So interestingly for MatterSim, a very fundamental idea was drew from Chi Chen, who was a previous lab mate of mine and now also works for Microsoft at Microsoft Quantum. He made this fantastic model named M3GNet, which is a prior form of a lot of these large-scale models for atomistic simulations. That model, M3GNet, actually resolves the near ground state prediction problem. I mean, the near ground state problem sounds like a fancy but not realistic word, but what that actually means is that it can simulate materials at near-zero covalent states. So basically at very low temperatures. So at that time, we were thinking since the models are now able to simulate materials at their near ground states, it’s not a very large space. But if you also look at other larger models, like GPT whatever, those models are large enough to simulate entire human language. So it’s possible to really extend the capability from these such prior models to very large space. Because we believe in the capability of AI, then it really drove us to use MatterSim to learn the entire space of materials. I mean, the entire space really means the entire periodic table, all the temperatures and the pressures people can actually grasp. 

XIE: Yeah, I still remember a lot of the amazing works from Chi Chen whenever we’re, kind of, back working on property-prediction models. So, yeah, so the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018, when I was, kind of, working on CGCNN (crystal graph convolutional neural networks) and giving a talk about property-prediction models, right, one of the first questions people asked is, OK, can you inverse this process? Instead of going from material structure to properties, can you, kind of, inversely generate the materials directly from their property conditions? So in a way, this is, kind of, like a dream for material scientists—some people even call it, like, holy grail—because, like, the end goal is really about finding materials property, right, [that] will satisfy your application. So I’ve been, kind of, thinking about this problem for a while and also there has been a lot of work, right, over the past few years in the community to build a generative model for materials. A lot of people have tried before, like 2020, using ideas like VAEs or GANs. But it’s hard to represent materials in this type of generative model architecture, and many of those models generated relatively poor candidates. So I thought it was a hard problem. I, kind of, know it for a while. But there is no good solutions back then. So I started to focus more on this problem during my postdoc, when I studied that in 2020 and I keep working on that in 2021. At the beginning, I wasn’t really sure exactly what approach to take because it’s, kind of, like open question and really tried a lot of random ideas. So one day actually in my group back then with Tommi Jaakkola and Regina Barzilay at MIT’s CSAIL (Computer Science & Artificial Intelligence Laboratory), we, kind of, get to know this method called diffusion model. It was a very early stage of a diffusion model back then, but it already began to show very promising signs, kind of, achieving state of art in many problems like 3D point cloud generation and the 3D molecular conformer generation. So the work that really inspired me a lot is two works that was for molecular conformer generation. One is ConfGF, and one is GeoDiff. So they, kind of, inspired me to, kind of, focus more on diffusion models. That actually lead to CDVAE (crystal diffusion variational autoencoder). So it’s interesting that we, kind of, spend like a couple of weeks in trying all this diffusion idea, and without that much work, it actually worked quite out of box. And at that time, CDVAE achieves much better performance than any previous models in materials generation, and we’re, kind of, super happy with that. So after CDVAE, I, kind of, joined Microsoft, now working with more people together on this problem of generative model for materials. So we, kind of, know what the limitations of CDVAE are, is that it can do unconditional material generation well means it can generate novel material structures, but it is very hard to use CDVAE to do property-guided generations. So basically, it uses an architecture called a variational autoencoder, where you have a latent space. So the way that you do property-guided generation there was to do a, kind of, a gradient update inside the latent space. But because the latent space wasn’t learned very well, so it actually … you cannot do, kind of, good property-guided generation. We only managed to do energy-guided generation, but it wasn’t successful in going beyond energy. So that comes us to really thinking, right, how can we make the property-guided generation much better? So I remember like one day, actually, my colleague, Daniel Zügner, who actually really showed me this blog which basically explains this idea of classifier-free guidance, which is the powerhouse behind the text-image generative models. And so, yeah, then we began to think about, can we actually make the diffusion model work for classifier-free guidance? That lead us to remove the, kind of, the variational autoencoder component from CDVAE and begin to work on a pure diffusion architecture. But then there was, kind of, a lot of development around that. But it turns out that classifier-free guidance is the key really to make property-guided generation work, and then combined with a lot more effort in, kind of, improving architecture and also generating more data and also trying out all these different downstream tasks that end up leading into MatterGen as we see today. 

KALTER: Yeah, I think you’ve both done a really great job of explaining how MatterGen and MatterSim work together and how MatterGen can offer a lot in terms of reducing the amount of time and work that goes into finding new materials. Tian, how does the process of using MatterGen to generate materials translate into real-world applications? 

XIE: Yeah, that’s a fantastic question. So one way that I think about MatterGen, right, is that you can think about it as like a copilot for materials scientists, right. So they can help you to come up with, kind of, potential good hypothesis for the materials design problems that you’re looking for. So say you’re trying to design a battery, right. So you may have some ideas over, OK, what candidates you want to make, but this is, kind of, based on your own experience, right. Depths of experience as a researcher. But MatterGen is able to, kind of, learn from a very broad set of data, so therefore, it may be able to come up with some good suggestions, even surprising suggestions, for you so that you can, kind of, try this out, right, both with computation or even one day in wet lab and experimentally synthesize it. But I also want to note that this, in a way, this is still an early stage in generative AI for materials means that I don’t expect all the candidates MatterGen generates will be, kind of, suits your needs, right. So you still need to, kind of, look into them with expertise or with some kind of computational screening. But I think in the future, as this model keep improving themselves, they will become a key component, right, in the design process of many of the materials we’re seeing today, like designing new batteries, new solar cells, or even computer chips, right, so that like Ziheng mentioned earlier. 

KALTER: I want to pivot a little bit to the MatterSim side of things. I know identifying new combinations of compounds is key to meeting changing needs for things like sustainable materials. But testing them is equally important to developing materials that can be put to use. Ziheng, how does MatterSim handle the uncertainty of how materials behave under various conditions, and how do you ensure that the predictions remain robust despite the inherent complexity of molecular systems? 

LU: Thanks. That’s a very, very good question. So uncertainty quantification is a key to make sure all these predictions and simulations are trustworthy. And that’s actually one of the questions we got almost every time after a presentation. So people will ask, well—especially those experimentalists—would ask, well, I’ve been using your model; how do I know those predictions are true under the very complex conditions I’m using in my experiments? So to understand how we deal with uncertainty, we need to know how MatterSim really functions in predicting an arbitrary property, especially under the condition you want, like the temperature and pressure. That would be quite complex, right? So in the ideal case, we would hope that by using MatterSim, you can directly simulate the properties you want using molecular dynamics combined with statistical mechanics. So if so, it would be easy to really quantify the uncertainty because there are just two parts: the error from the model and the error from the simulations, the statistical mechanics. So the error from the model will be able to be measured by, what we call, an ensemble. So basically you start with different random seeds when you train the model, and then when you predict your property, you use several models from the ensemble and then you get different numbers. If the variance from the numbers are very large, you’ll say the prediction is not that trustworthy. But a lot of times, we will see the variance is very small. So basically, an ensemble of several different models will give you almost exactly the same number; you’re quite sure that the number is somehow very, like, useful. So that’s one level of the way we want to get our property. But sometimes, it’s very hard to really directly simulate the property you want. For example, for catalytic processes, it’s very hard to imagine how you really get those coefficients. It’s very hard. The process is just too complicated. So for that process, what we do is to really use the, what we call, embeddings learned from the entire material space. So basically that vector we learned for any arbitrary material. And then start from that, we build a very shallow layer of a neural network to predict the property, but that also means you need to bring in some of your experimental or simulation data from your side. And for that way of predicting a property to measure the uncertainty, it’s still like the two levels, right. So we don’t really have the statistical error anymore, but what we have is, like, only the model error. So you can still stick to the ensemble, and then it will work, right. So to be short, so MatterSim can provide you an uncertainty to make sure the prediction tells you whether it’s true or not.

KALTER: So in many ways, MatterSim is the realist in the equation, and it’s there to sort of be a gatekeeper for MatterGen, which is the idea generator. 

XIE: I really like the analogy. 

LU: Yeah. 

KALTER: As is the case with many AI models, the development of MatterGen and MatterSim relies on massive amounts of data. And here you use a simulation to create the needed training data. Can you talk about that process and why you’ve chosen that approach, Tian?

XIE: So one advantage here is that we can really use large-scale simulation to generate data. So we have a lot of compute here at Microsoft on our Azure platform, right. So how we generate the data is that we use a method called density functional theory, DFT, which is a quantum mechanical method. And we use a simulation workflow built on top with DFT to simulate the stability of materials. So what we do is that we curate a huge amount of material structures from multiple different sources of open data, mostly including Materials Project and Alexandria database, and in total, there are around 3 million materials candidates coming from these two databases. But not all of these structures, they are stable. So therefore, we try to use DFT to compute their stability and try to filter down the candidates such that we are making sure that our training data only have the most stable ones. This leads into around 600,000 training data, which was used to train the base model of MatterGen. So I want to note that actually we also use MatterSim as part of the workflow because MatterSim can be used to prescreen unstable candidates so that we don’t need to use DFT to compute all of them. I think at the end, we computed around 1 million DFT calculations where two-thirds of them, they are already filtered out by MatterSim, which saves us a lot of compute in generating our training data.

LU: Tian, you have a very good description of how we really get those ground state structures for the MatterGen model. Actually, we’ve been also using MatterGen for MatterSim to really get the training data. So if you think about the simulation space of materials, it’s extremely large. So we would think it in a way that it has three axis, so basically the elements, the temperature, and the pressure. So if you think about existing databases, they have pretty good coverage of the elements space. Basically, we think about Materials Project, NOMAD, they really have this very good coverage of lithium oxide, lithium sulfide, hydrogen sulfide, whatever, those different ground-state structures. But they don’t really tell you how these materials behave under certain temperature and pressure, especially under those extreme conditions like 1,600 Kelvin, which you really use to synthesize your materials. That’s where we really focused on to generate the data for MatterSim. So it’s really easy to think about how we generate the data, right. You put your wanted material into a pressure cooker, basically, molecular dynamics; it can simulate the materials behavior on the temperature and pressure. So that’s it. Sounds easy, right? But that’s not true because what we want is not one single material. What we want is the entire material space. So that will be making the effort almost impossible because the space is just so large. So that’s where we really develop this active learning pipeline. So basically, what we do is, like, we generate a lot of these structures for different elements and temperatures, pressures. Really, really a lot. And then what we do is, like, we ask the active learning or the uncertainty measurements to really say whether the model knows about this structure already. So if the model thinks, well, I think I know the structure already. So then, we don’t really calculate this structure using density function theory, as Tian just said. So this will really save us like 99% of the effort in generating the data. So in the end, by combining this molecular dynamics, basically pressure cooker, together with active learning, we gathered around 17 million data for MatterSim. So that was used to train the model. And now it can cover the entire periodic table and a lot of temperature and pressures. 

KALTER: Thank you, Ziheng. Now, I’m sure this is not news to either one of you, given that you’re both at the forefront of these efforts, but there are a growing number of tools aimed at advancing materials science. So what is it about MatterGen and MatterSim in their approach or capabilities that distinguish them? 

XIE: Yeah, I think I can start. So I think there is, in the past one year, there is a huge interest in building up generative AI tools for materials. So we have seen lots and lots of innovations from the community published in top conferences like NeurIPS, ICLR, ICML, etc. So I think what distinguishes MatterGen, in my point of view, are two things. First is that we are trained with a very big dataset that we curated very, very carefully, and we also spent quite a lot of time to refining our diffusion architecture, which means that our model is capable of generating very, kind of, high-quality, highly stable and novel materials. We have some kind of bar plot in our paper showcasing the advantage of our performance. I think that’s one key aspect. And I think the second aspect, which in my point of view is even more important, is that it has the ability to do property-guided generation. Many of the works that we saw in the community, they are more focused on the problem of crystal structure prediction, which MatterGen can also do, but we focus more on really property-guided generation because we think this is one of the key problems that really materials scientists care about. So the ability to do a very broad range of property-guided generation—and we have, kind of, both computational and now experimental result to validate those—I think that’s the second strong point for MatterGen. 

KALTER: Ziheng, do you want to add to that? 

LU: Yeah, thanks, Lindsay. So on the MatterSim side, I think it’s really the diverse condition it can handle that makes a difference. We’ve been talking about, like, the training data we collected really covers the entire periodic table and also, more importantly, the temperatures from 0 Kelvin to 5,000 Kelvin and the pressures from 0 gigapascal to 1,000 gigapascal. That really covers what humans can control nowadays. I mean, it’s very hard to go beyond that. If you know anyone [who] can go beyond that, let me know. So that really makes MatterSim different. Like, it can handle the realistic conditions. I think beyond that, I would say the combo between MatterSim and MatterGen really makes these set of tools really different. So previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, they really can make things different, right. So we have predictor; we have the generator; you have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly. So I would say to me, now, when I think about it, it’s really the combo that makes these set of tools different. 

KALTER: I know that I’ve spoken with both of you recently about how there’s so much excitement around this, and it’s clear that we’re on the precipice of this—as both of you have called it—a paradigm shift. And Microsoft places a very strong emphasis on ensuring that its innovations are grounded in reality and capable of addressing real-world problems. So with that in mind, how do you balance the excitement of scientific exploration with the practical challenges of implementation? Tian, do you want to take this?

XIE: Yeah, I think this is a very, very important point, because … as there are so many hypes around AI that is happening right now, right. We must be very, very careful about the claims that we are making so that people will not have unrealistic expectations, right, over how these models can do. So for MatterGen, we’re pretty careful about that. We’re trying to, basically, we’re trying to say that this is an early stage of generative AI in materials where this model will be improved over time quite significantly, but you should not say, oh, all the materials generated by MatterGen is going to be amazing. That’s not what is happening today. So we try to be very careful to understand how far MatterGen is already capable of designing materials with real-world impact. So therefore, we went all the way to synthesize one material that was generated by MatterGen. So this material we generated is called tantalum chromium oxide[1]. So this is a new material. It has not been discovered before. And it was generated by MatterGen by conditioning a bulk modulus equal to 200 gigapascal. Bulk modulus is, like, the compressiveness of the material. So we end up measuring the experimental synthesized material experimentally, and the measured bulk modulus is 169 gigapascal, which is within 20% of error. So this is a very good proof concept, in our point of view, to show that, oh, you can actually give it a prompt, right, and then MatterGen can generate a material, and the material actually have the property that is very close to your target. But it’s still a proof of concept. And we’re still working to see how MatterGen can design materials that are much more useful with a much broader range of applications. And I’m sure that there will be more challenges we are seeing along the way. But we’re looking forward to further working with our experimental partners to, kind of, push this further. And also working with MatterSim, right, to see how these two tools can be used to design really useful materials and bringing this into real-world impact.

LU: Yeah, Tian, I think that’s very well said. It’s not really only for MatterGen. For MatterSim, we’re also very careful, right. So we really want to make sure that people understand how these models really behave under their instructions and understand, like, what they can do and they cannot do. So I think one thing that we really care about is that in the next few, maybe one or two years, we want to really work with our experimental partners to make this realistic materials, like, in different areas so that we can, even us, can really better understand the limitations and at the same time explore the forefront of materials science to make this excitement become true. 

KALTER: Ziheng, could you give us a concrete example of what exactly MatterSim is capable of doing? 

LU: Now MatterSim can really do, like, whatever you have on a potential energy surface. So what that means is, like, anything that can be simulated with the energy and forces, stresses alone. So to give you an example, we can compute … the first example would be the stability of a material. So basically, you input a structure, and from the energies of the relaxed structures, you can really tell whether the material is likely to be stable, like, the composition, right. So another example would be the thermal conductivity. Thermal conductivity is like a fundamental property of materials that tells you how fast heat can transfer in the material, right. So for MatterSim, it can really simulate how fast this heat can go through your diamond, your graphene, your copper, right. So basically, those are two examples. So these examples are based on energies and forces alone. But there are things MatterSim cannot do—at least for now. For example, you cannot really do anything related to electronic structures. So you cannot really compute the light absorption of a semitransparent material. That would be a no-no for now. 

KALTER: It’s clear from speaking with researchers, both from MatterSim and MatterGen, that despite these very rapid advancements in technology, you take very seriously the responsibility to consider the broader implications of the challenges that are still ahead. How do you think about the ethical considerations of creating entirely new materials and simulating their properties, particularly in terms of things like safety, sustainability, and societal impact? 

XIE: Yeah, that’s a fantastic question. So it’s extremely important that we are making sure that these AI tools, they are not misused. A potential misuse, right, as you just mentioned, is that people begin to use these AI tools—MatterGen, MatterSim—to, kind of, design harmful materials. There was actually extensive discussion over how generative AI tools that was originally purposed for drug design can be then misused to create bioweapons. So at Microsoft, we take this very seriously because we believe that when we create new technologies, you must also ensure that the technology is used responsibly. So we have an extensive process to ensure that all of our models respect those ethical considerations. In the meantime, as you mentioned, maybe sustainability and the societal impact, right, so there’s a huge amount these AI tools—MatterGen, MatterSim—can do for sustainability because a lot of the sustainability challenges, they are really, at the end, materials design challenges, right. So therefore, I think that MatterGen and MatterSim can really help with that in solving, in helping us to alleviate climate change and having positive societal impact for the broader society. 

KALTER: And, Ziheng, how about from a simulation standpoint? 

LU: Yeah, I think Tian gave a very good, like, description. At Microsoft, we are really careful about these ethical, like, considerations. So I would add a little bit on the more, like, the bright side of things. Like, so for MatterSim, like, it really carries out these simulations at atomic scales. So one thing you can think about is really the educational purpose. So back in my bachelor and PhD period, so I would sit, like, at the table and really grab a pen to really deal with those very complex equations and get into those statistics using my pen. It’s really painful. But now with MatterSim, these simulation tools at atomic level, what you can do is to really simulate the reactions, the movement of atoms, at atomic scale in real time. You can really see the chemical reactions and see the statistics. So you can get really the feeling, like very direct feeling, of how the system works instead of just working on those toy systems with your pen. I think it’s going to be a very good educational tool using MatterSim, yeah. Also MatterGen. MatterGen as, like, a generative tool and generating those i.i.d. (independent and identically distributed) distributions, it will be a perfect example to show the students how the Boltzmann distribution works. I think, Tian, you will agree with that, right?

XIE: 100%. Yeah, I really, really like the example that Ziheng mentioned about the educational purposes. I still remember, like, when I was, kind of, learning material simulation class, right. So everything is DFT. You, kind of, need to wait for an hour, right, for getting some simulation. Maybe then you’ll make some animation. Now you can do this in real time. This is, like, a huge step forward, right, for our young researchers to, kind of, gaining a sense, right, about how atoms interact at an atomic level. 

LU: Yeah, and the results are really, I mean, true; not really those toy models. I think it’s going to be very exciting stuff. 

KALTER: And, Tian, I’m directing this question to you, even though, Ziheng, I’m sure you can chime in, as well. But, Tian, I know that you and I have previously discussed this specifically. I know that you said back in, you know, 2017, 2018, that you knew an AI-based approach to materials science was possible but that even you were surprised by how far the technology has come so fast in aiding this area. What is the status of these tools right now? Are they in use? And if so, who are they available to? And, you know, what’s next for them? 

XIE: Yes, this is a fantastic question, right. So I think for AI generative tools like MatterGen, as I said many times earlier, it’s still in its early stages. MatterGen is the first tool that we managed to show that generative AI can enable very broad property-guided generation, and we have managed to have experimental validation to show it’s possible. But it will take more work to show, OK, it can actually design batteries, can design solar cells, right. It can design really useful materials in these broader domains. So this is, kind of, exactly why we are now taking a pretty open approach with MatterGen. We make our code, our training data, and model weights available to the general public. We’re really hoping the community can really use our tools to the problem that they care about and even build on top of that. So in terms of what next, I always like to use what happened with generative AI for drugs, right, to kind of predict how generative AI will impact materials. Three years ago, there is a lot of research around generative model for drugs, first coming from the machine learning community, right. So then all the big drug companies begin to take notice, and then there are, kind of, researchers in these drug companies begin to use these tools in actual drug design processes. From my colleague, Marwin Segler, because he, kind of, works together with Novartis in Microsoft and Novartis collaboration, he has been basically telling me that at the beginning, all the chemists in the drug companies, they’re all very suspicious, right. The molecules generated by these generative models, they all look a bit weird, so they don’t believe this will work. But once these chemists see one or two examples that actually turns out to be performing pretty well from the experimental result, then they begin to build more trust, right, into these generative AI models. And today, these generative AI tools, they are part of the standard drug discovery pipeline that is widely used in all the drug companies. That is today. So I think generative AI for materials is going through a very similar period. People will have doubts; people will have suspicions at the beginning. But I think in three years, right, so it will become a standard tool over how people are going to design new solar cells, design new batteries, and many other different applications.

KALTER: Great. Ziheng, do you have anything to add to that? 

LU: So actually for MatterSim, we released the model, I think, back in last year, December. I mean, both the weights and the models, right. So we’re really grateful how much the community has contributed to the repo. And now, I mean, we really welcome the community to contribute more to both MatterSim and MatterGen via our open-source code bases. So, I mean, the community effort is really important, yeah. 

KALTER: Well, it has been fascinating to pick your brains, and as we close, you know, I know that you’re both capable of quite a bit, which you have demonstrated. I know that asking you to predict the future is a big ask, so I won’t explicitly ask that. But just as a fun thought exercise, let’s fast-forward 20 years and look back. How have MatterGen and MatterSim and the big ideas behind them impacted the world, and how are people better off because of how you and your teams have worked to make them a reality? Tian, you want to start? 

XIE: Yeah, I think one of the biggest challenges our human society is going to face, right, in the next 20 years is going to be climate change, right, and there are so many materials design problems people need to solve in order to properly handle climate change, like finding new materials that can absorb CO2 from atmosphere to create a carbon capture industry or have a battery materials that is able to do large-scale energy grid storage so that we can fully utilizing all the wind powers and the solar power, etc., right. So if you want me to make one prediction, I really believe that these AI tools, like MatterGen and MatterSim, is going to play a central role in our human’s ability to design these new materials for climate problems. So therefore in 20 years, I would like to see we have already solved climate change, right. We have large-scale energy storage systems that was designed by AI that is … basically that we have removed all the fossil fuels, right, from our energy production, and for the rest of the carbon emissions that is very hard to remove, we will have a carbon capture industry with materials designed by AI that absorbs the CO2 from the atmosphere. It’s hard to predict exactly what will happen, but I think AI will play a key role, right, into defining how our society will look like in 20 years. 

LU: Tian, very well said. So I think instead of really describing the future, I would really quote a science fiction scene in Iron Man. So basically in 20 years, I will say when we want to really get a new material, we will just sit in an office and say, “Well, J.A.R.V.I.S., can you design us a new material that really fits my newest MK 7 suit?” That will be the end. And it will run automatically, and we get this auto lab running, and all those MatterGen and MatterSim, these AI models, running, and then probably in a few hours, in a few days, we get the material. 

KALTER: Well, I think I speak for many people from several industries when I say that I cannot wait to see what is on the horizon for these projects. Tian and Ziheng, thank you so much for joining us on Ideas. It’s been a pleasure. 

[MUSIC] 

XIE: Thank you so much. 

LU: Thank you. 

[MUSIC FADES]


[1] Learn more about MatterGen and the new material tantalum chromium oxide in the Nature paper “A generative model for inorganic materials design (opens in new tab).”

The post Ideas: AI for materials discovery with Tian Xie and Ziheng Lu appeared first on Microsoft Research.

]]>
Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness http://approjects.co.za/?big=en-us/research/podcast/ideas-ai-and-democracy-with-madeleine-daepp-and-robert-osazuwa-ness/ Thu, 19 Dec 2024 20:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1112883 As the “biggest election year in history” comes to an end, researchers Madeleine Daepp and Robert Osazuwa Ness and Democracy Forward GM Ginny Badanes discuss AI’s impact on democracy, including Daepp and Ness’s research into the tech’s use in Taiwan and India.

The post Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness appeared first on Microsoft Research.

]]>
Illustrated headshots of Ginny Badanes, Madeleine Daepp and Robert Osazuwa Ness

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In 2024, with advancements in generative AI continuing to reach new levels and the world experiencing its “biggest election year in history (opens in new tab),” could there possibly be a better time to examine the technology’s emerging role in global democracies? Inspired by the moment, senior researchers Madeleine Daepp and Robert Osazuwa Ness conducted research in Taiwan, studying the technology’s influence on disinformation, and in India, documenting its impact on digital communications more broadly. In this episode, Daepp and Ness join guest host Ginny Badanes (opens in new tab), general manager of the Democracy Forward program at Microsoft. They discuss how leveraging commonly understood language such as fraud can help people understand potential risks associated with generative AI; the varied ways in which Daepp and Ness saw the tech being deployed to promote or discredit candidates; and the opportunities for the technology to be a force for fortifying democracy.

Learn more:  

Video will kill the truth if monitoring doesn’t improve, argue two researchers (opens in new tab)
The Economist, March 2024

Microsoft Research Special Projects
Group homepage

Democracy Forward
Program homepage, Microsoft Corporate Social Responsibility

As the US election nears, Russia, Iran and China step up influence efforts (opens in new tab)
Microsoft On the Issues blog, October 2024

Combatting AI Deepfakes: Our Participation in the 2024 Political Conventions (opens in new tab)
Microsoft On the Issues blog, July 2024

China tests US voter fault lines and ramps AI content to boost its geopolitical interests (opens in new tab)
Microsoft On the Issues, April 2024

Project Providence (opens in new tab)
Project homepage

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

MADELEINE DAEPP: Last summer, I was working on all of these like pro-democracy applications, trying to build out, like, a social data collection tool with AI, all this kind of stuff. And I went to the elections workshop that the Democracy Forward team at Microsoft had put on, and Dave Leichtman, who, you know, was the MC of that work, was really talking about how big of a global elections year 2024 was going to be. Over 70 countries around the world. And, you know, we’re coming from Microsoft Research, where we were so excited about this technology. And then, all of a sudden, I was at the elections workshop, and I thought, oh no, [LAUGHS] like, this is not good timing.

ROBERT OSAZUWA NESS: What are we really talking about in the context of deepfakes in the political context, elections context? It’s deception, right. I’m trying to use this technology to, say, create some kind of false record of events in order to convince people that something happened that actually did not happen. And so that goal of deceiving, of creating a false record, that’s kind of how I have been thinking about deepfakes in contrast to the broader category of generative AI.

[TEASER ENDS]

GINNY BADANES: Welcome to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

I’m your guest host, Ginny Badanes, and I lead Microsoft’s Democracy Forward program, where we’ve spent the past year deeply engaged in supporting democratic elections around the world, including the recent US elections. We have been working on everything from raising awareness of nation-state propaganda efforts to helping campaigns and election officials prepare for deepfakes to protecting political campaigns from cyberattacks. Today, I’m joined by two researchers who have also been diving deep into the impact of generative AI on democracy.

Microsoft senior researchers Madeleine Daepp and Robert Osazuwa Ness are studying generative AI’s influence in the political sphere with the goal of making AI systems more robust against misuse while supporting the development of AI tools that can strengthen democratic processes and systems. They spent time in Taiwan and India earlier this year, where both had big democratic elections. Madeleine and Robert, welcome to the podcast!

MADELEINE DAEPP: Thanks for having us.

ROBERT OSAZUWA NESS: Thanks for having us.

BADANES: So I have so many questions for you all—from how you conducted your research to what you’ve learned—and I’m really interested in what you think comes next. But first, let’s talk about how you got involved in this in the first place. Could you both start by telling me a little bit about your backgrounds and just what got you into AI research in the first place?

DAEPP: Sure. So I’m a senior researcher here at Microsoft Research in the Special Projects team. But I did my PhD at MIT in urban studies and planning. And I think a lot of folks hear that field and think, oh, you know, housing, like upzoning housing and figuring out transportation systems. But it really is a field that’s about little “d” democracy, right. About how people make choices about shared public spaces every single day. You know, I joined Microsoft first off to run this, sort of, technology deployment in the city of Chicago, running a low-cost air-quality-sensor network for the city. And when GPT-4 came out, you know, first ChatGPT, and then we, sort of, had this big recognition of, sort of, how well this technology could do in summarizing and in representing opinions and in making sense of big unstructured datasets, right. I got actually very excited. Like, I thought this could be used for town planning processes. [LAUGHS] Like, I thought we could … I had a whole project with a wonderful intern, Eva Maxfield Brown, looking at, can we summarize planning documents using AI? Can we build out policies from conversations that people have in shared public spaces? And so that was very much the impetus for thinking about how to apply and build things with this amazing new technology in these spaces.

BADANES: Robert, I think your background is a little bit different, yet you guys ended up in a similar place. So how did you get there?

NESS: Yeah, so I’m also on Special Projects, Microsoft Research. My work is focusing on large language models, LLMs. And, you know, so I focus on making these models more reliable and controllable in real-world applications. And my PhD is in statistics. And so I focus a lot on using just basic bread-and-butter statistical methods to try and control and understand LLM behavior. So currently, for example, I’m leading a team of engineers and running experiments designed to find ways to enhance a graphical approach to combining information retrieval in large language models. I work on statistical tests for testing significance of adversarial attacks on these models.

BADANES: Wow.

NESS: So, for example, if you find a way to trick one of these models into doing something it’s not supposed to do, I make sure that it’s not, like, a random fluke; that it’s something that’s reproducible. And I also work at this intersection between generative AI and, you know, Bayesian stuff, causal inference stuff. And so I came at looking at this democracy work through an alignment lens. So alignment is this task in AI of making sure these models align with human values and goals. And what I was seeing was a lot of research in the alignment space was viewing it as a technical problem. And, you know, as a statistician, we’re trained to consult, right. Like, to go to the actual stakeholders and say, hey, what are your goals? What are your values? And so this democracy work was an opportunity to do that in Microsoft Research and connected with Madeleine. So she was planning to go to Taiwan, and kind of from a past life, I wanted to become a trade economist and learned Mandarin. And so I speak fluent Mandarin and seemed like a good matchup of our skill sets …

BADANES: Yeah.

NESS: … and interests. And so that’s, kind of, how we got started.

BADANES: So, Madeleine, you brought the two of you together, but what started it for you? This podcast is all about big ideas. What sparked the big idea to bring this work that you’ve been doing on generative AI into the space of democracy and then to go out and find Robert and match up together?

DAEPP: Yeah, well, Ginny, it was you. [LAUGHS] It was actually your team.

BADANES: I didn’t plant that! [LAUGHS]

DAEPP: So, you know, I think last summer, I was working on all of these like pro-democracy applications, trying to build out, like, a social data collection tool with AI, all this kind of stuff. And I went to the elections workshop that the Democracy Forward team at Microsoft had put on, and Dave Leichtman, who, you know, was the MC of that work, was really talking about how big of a global elections year 2024 was going to be, that this—he was calling it “Votorama.” You know, that term didn’t take off. [LAUGHTER] The term that has taken off is biggest election year in history, right. Over 70 countries around the world. And, you know, we’re coming from Microsoft Research, where we were so excited about this technology. Like, when it started to pass theory of mind tests, right, which is like the ability to think about how other people are thinking, like, we were all like, oh, this is amazing; this opens up so many cool application spaces, right. When it was, like, passing benchmarks for multilingual communication, again, like, we were so excited about the prospect of building out multilingual systems. And then, all of a sudden, I was at the elections workshop, and I thought, oh no, [LAUGHS] this is not good timing.

BADANES: Yeah …

DAEPP: And because so much of my work focuses on, you know, building out computer science systems like, um, data science systems or AI systems but with communities in the loop, I really wanted to go to the folks most affected by this problem. And so I proposed a project to go to Taiwan and to study one of the … it was the second election of 2024. And Taiwan is known to be subject to more external disinformation than any other place in the world. So if you were going to see something anywhere, you would see it there. Also, it has amazing civil society response so really interesting people to talk to. But I do not speak, Chinese, right. Like, I don’t have the context; I don’t speak the language. And so part of my process is to hire a half-local team. We had an amazing interpreter, Vickie Wang, and then a wonderful graduate student, Ti-Chung Cheng, who supported this work. But then also my team, Special Projects, happened to have this person who, like, not only is a leading AI researcher publishing in NeurIPS, like building out these systems, but who also spoke Chinese, had worked in technology security, and had a real understanding of international studies and economics as well as AI. And so for me, like, finding Robert as a collaborator was kind of a unicorn moment.

BADANES: So it sounds like it was a match made in heaven of skill sets and abilities. Before we get into what you all found there, which I do want to get into, I first think it’s helpful—I don’t know, when we’re dealing with these, like, complicated issues, particularly things that are moving and changing really quickly, sometimes I found it’s helpful to agree on definitions and sort of say, this is what we mean when we say this word. And that helps lead to understanding. So while I know that this research is about more than deepfakes—and we’ll talk about some of the things that are more than deepfakes—I am curious how you all define that term and how you think of it. Because this is something that I think is constantly moving and changing. So how have you all been thinking about the definition of that term?

NESS: So I’ve been thinking about it in terms of the intention behind it, right. We say deepfake, and I think colloquially that means kind of all of generative AI. That’s a bit unfortunate because there are things that are … you know, you can use generative AI to generate cartoons …

BADANES: Right.

NESS: … or illustrations for a children’s book. And so in thinking about what are we really talking about in the context of deepfakes in the political context, elections context, it’s deception, right. I’m trying to use this technology to, say, create some kind of false record of events, say, for example, something that a politician says, in order to convince people that something happened that actually did not happen.

BADANES: Right.

NESS: And so that goal of deceiving, of creating a false record, that’s kind of how I have been thinking about deepfakes in contrast to the broader category of generative AI and deepfakes in terms of being a malicious use case. There are other malicious use cases that don’t necessarily have to be deceptive, as well, as well as positive use cases.

BADANES: Well, that really, I mean, that resonates with me because what we found was when you use the term deception—or another term we hear a lot that I think works is fraud—that resonates with other people, too. Like, that helps them distinguish between neutral uses or even positive uses of AI in this space and the malicious use cases, though to your point, I suppose there’s probably even deeper definitions of what malicious use could look like. Are you finding that distinction showing up in your work between fraud and deception in these use cases? Is that something that has been coming through?

DAEPP: You know, we didn’t really think about the term fraud until we started prepping for this interview with you. As Robert said, so much of what we were thinking about in our definition was this representation of people or events, you know, done in order to deceive and with malicious intent. But in fact, in all of our conversations, no matter who we were talking to, no matter what political bent, no matter, you know, national security, fact-checking, et cetera, you know, they all agreed that using AI for the purposes of scamming somebody financially was not OK, right. That’s fraud. Using AI for the purposes of nudifying, like removing somebody’s clothes and then sextorting them, right, extorting them for money out of fear that this would be shared, like, that was not OK. And those are such clear lines. And it was clear that there’s a set of uses of generative AI also in the political space, you know, of saying this person said something that they didn’t, …

BADANES: Mm-hmm.

DAEPP: … of voter suppression, that in general, there’s a very clear line that when it gets into that fraudulent place, when it gets into that simultaneously deceptive and malicious space, that’s very clearly a no-go zone.

NESS: Oftentimes during this research, I found myself thinking about this dichotomy in cybersecurity of state actors, or broadly speaking, kind of, political actors, versus criminals.

BADANES: Right.

NESS: And it’s important to understand the distinction because criminals are typically trying to target targets of opportunity and make money, while state-sponsored agents are willing to spend a lot more money and have very specific targets and have a very specific definition of success. And so, like, this fraud versus deception kind of feels like that a little bit in the sense that fraud is typically associated with criminal behavior, while, say, I might put out deceptive political messaging, but it might fall within the bounds of free speech within my country.

BADANES: Right, yeah.

NESS: And so this is not to say I disagree with that, but it just, actually, that it could be a useful contrast in terms of thinking about the criminal versus the political uses, both legitimate and illegitimate.

BADANES: Well, I also think those of us who work in the AI space are dealing in very complicated issues that the majority of the world is still trying to understand. And so any time you can find a word that people understand immediately in order to do the, sort of, storytelling: the reason that we are worried about deepfakes in elections is because we do not want voters to be defrauded. And that, we find really breaks through because people understand that term already. That’s a thing that they already know that they don’t want to be; they do not want to be defrauded in their personal life or in how they vote. And so that really, I found, breaks through. But as much as I have talked about deepfakes, I know that you—and I know there’s a lot of interest in talking about deepfakes when we talk about this subject—but I know your research goes beyond that. So what other forms of generative AI did you include in your research or did you encounter in the effort that you were doing both in Taiwan and India?

DAEPP: Yeah. So let me tell you just, kind of, a big overview of, like, our taxonomy. Because as you said, like, so much of this is just about finding a word, right. Like, so much of it is about building a shared vocabulary so that we can start to have these conversations. And so when we looked at the political space, right, elections, so much of what it means to win an election is kind of two things. It’s building an image of a candidate, right, or changing the image of your opposition and telling a story, right.

BADANES: Mm-hmm.

DAEPP: And so if you think about image creation, of course, there are deepfakes. Like, of course, there are malicious representations of a person. But we also saw a lot of what we’re calling auth fakes, like authorized fakes, right. Candidates who would actually go to a consultancy and, like, get their bodies scanned so that videos could be made of them. They’d get their voices, a bunch of snippets of their voices, recorded so that then there could be personalized phone calls, right. So these are authorized uses of their image and likeness. Then we saw a term I’ve heard in, sort of, the ether is soft fakes. So again, likenesses of a candidate, this time not necessarily authorized but promotional. They weren’t … people on Twitter—I guess, X—on Instagram, they were sharing images of the candidate that they supported that were really flattering or silly or, you know, just really sort of in support of that person. So not with malicious intent, right, with promotional intent. And then the last one, and this, I think, was Robert’s term, but in this image creation category, you know, one thing we talked about was just the way that people were also making fun of candidates. And in this case, this is a bit malicious, right. Like, they’re making fun of people; they’re satirizing them. But it’s not deceptive because, …

BADANES: Right …

DAEPP: … you know, often it has that hyper-saturated meme aesthetic. It’s very clearly AI or just, you know, per like, sort of, US standards for satire, like, a reasonable person would know that it was silly. And so Robert said, you know, oh, these influencers, they’re not trying to deceive people; like, they’re not trying to lie about candidates. They’re trying to roast them. [LAUGHTER] And so we called it a deep roast. So that’s, kind of, the images of candidates. I will say we also looked at narrative building, and there, one really important set of things that we saw was what we call text to b-roll. So, you know, a lot of folks think that you can’t really make AI videos because, like, Sora isn’t out yet[1]. But in fact, what there is a lot of is tooling to, sort of, use AI to pull from stock imagery and b-roll footage and put together a 90-second video. You know, it doesn’t look like AI; it’s a real video. So text to b- roll, AI pasta? So if you know the threat intelligence space, there’s this thing called copy pasta, where people just …

BADANES: Sure.

DAEPP: … it’s just a fun word for copy-paste. People just copy-paste terms in order to get a hashtag trending. And we talked to an ex-influencer who said, you know, we’re using AI to do this. And I asked him why. And he said, well, you know, if you just do copy-paste, the fact-checkers catch it. But if you use AI, they don’t. And so AI pasta. And there’s also some research showing that this is potentially more persuasive than copy-paste …

BADANES: Interesting.

DAEPP:  … because people think there’s a social consensus. And then the last one, this is my last of the big taxonomy, and, Robert, of course, jump in on anything you want to go deeper on, but Fake News 2.0. You know, I’m sure you’ve seen this, as well. Just this, like, creation of news websites, like entire new newspapers that nobody’s ever heard of. AI avatars that are newscasters. And this is something that was happening before. Like, there’s a long tradition of pretending to be a real news pamphlet or pretending to be a real outlet. But there’s some interesting work out of … Patrick Warren at Clemson has looked at some of these and shown the quality and quantity of articles on these things has gotten a lot better and, you know, improves as a step function of, sort of, when new models come out.

NESS: And then on the flip side, you have people using the same technologies but stated clearly that it’s AI generated, right. So we mentioned the AI avatars. In India, there’s this … there’s Bhoomi, which is a AI news anchor for agricultural news, and it states there in clear terms that she’s not real. But of course, somebody who wanted to be deceptive could use the same technology to portray something that looks like a real news broadcast that isn’t. You know, and, kind of, going back, Madeleine mentioned deep roasts, right, so, kind of, using this technology to create satirical depictions of, say, a political opponent. Somebody, a colleague, sent something across my desk. It was a Douyin account—so Douyin is the version of TikTok that’s used inside China; …

BADANES: OK.

NESS: … same company, but it’s the internal version of TikTok—that was posting AI-generated videos of politicians in Taiwan. And these were excellent, real good-quality AI-generated deepfakes of these politicians. But some of them were, first off, on the bottom of all of them, it said, this is AI-generated content.

BADANES: Oh.

NESS: And some of them were, kind of, obviously meant to be funny and were clearly fake, like still images that were animated to make somebody singing a funny song, for example. A very serious politician singing a very silly song. And it’s a still image. It’s not even, it’s not even …

BADANES: a video.

NESS: …like video.

BADANES: Right, right.

NESS: And so I messaged Puma Shen, who is one of the legislators in Taiwan who was targeted by these attacks, and I said, what do you think about this? And, you know, he said, yeah, they got me. [LAUGHTER] And I said, you know, do you think people believe this? I mean, there are people who are trying to debunk it. And he said, no, our supporters don’t believe it, but, you know, people who support the other side or people who are apolitical, they might believe it, or even if it says it’s fake—they know it’s fake—but they might still say that, yeah, but this is something they would do, right. This is …

BADANES: Yeah, it fits the narrative. Yeah.

NESS: … it fits the narrative, right. And that, kind of, that really, you know, I had thought of this myself, but just hearing somebody, you know, who’s, you know, a politician who’s targeted by these attacks just saying that it’s, like, even if they believe it’s … even if they know it’s fake, they still believe it because it’s something that they would do.

BADANES: Sure.

NESS: That’s, you know, as a form of propaganda, even relative to the canonical idea of deepfake that we have, this could be more effective, right. Like, just say it’s AI and then use it to, kind of, paint the picture of the opponent in any way you like.

BADANES: Sure, and this gets into that, sort of, challenging space I think we find ourselves in right now, which is people don’t know necessarily how to tell what’s real or not. And the case you’re describing, it has labeling, so that should tell you. But a lot of the content we come across online does not have labeling. And you cannot tell just based on your eyes whether images were generated by AI or whether they’re real. One of the things that I get asked a lot is, why can’t we just build good AI to detect bad AI, right? Why don’t we have a solution where I just take a picture and I throw it into a machine and it tells me thumbs-up or thumbs-down if this is AI generated or not? And the question around detection is a really tricky one. I’m curious what you all think about, sort of, the question of, can detection solve this problem or not?

NESS: So I’ll mention one thing. So Madeleine mentioned an application of this technology called text to b-roll. And so what this is, technically speaking, what this is doing is you’re taking real footage, you stick it in a database, it’s quote, unquote “vectorized” into these representations that the AI can understand, and then you say, hey, generate a video that illustrates this narrative for me. And you provide it the text narrative, and then it goes and pulls out a whole bunch of real video from a database and curates them into a short video that you could put on TikTok, for example. So this was a fully AI-generated product, but none of the actual content is synthetic.

BADANES: Ah, right.

NESS: So in that case, your quote, unquote “AI detection tool” is not going to work.

DAEPP: Yeah, I mean, something that I find really fascinating any time that you’re dealing with a sociotechnical system, right—a technical system embedded in social context—is folks, you know, think that things are easy that are hard and things are hard that are easy, right. And so with a lot of the detections work, right, like if you put a deepfake detector out, you make that available to anyone, then what they can do is they can run a bunch of stuff by it, …

BADANES: Yeah.

DAEPP: … add a little bit of random noise, and then the deepfake detector doesn’t work anymore. And so that detection, actually, technically becomes an arms race, you know. And we’re seeing now some detectors that, like, you know, work when you’re not looking at a specific image or a specific piece of text but you’re looking at a lot all at once. That seems more promising. But, just, this is a very, very technically difficult problem, and that puts us as researchers in a really tricky place because, you know, you’re talking to folks who say, why can’t you just solve this? If you put this out, then you have to put the detector out. And we’re like, that’s actually not, that’s not a technically feasible long-term solution in this space. And the solutions are going to be social and regulatory and, you know, changes in norms as well as technical solutions that maybe are about everything outside of AI, right.

BADANES: Yeah.

DAEPP: Not about fixing the AI system but fixing the context within which it’s used.

BADANES: It’s not just a technological solution. There’s more to it. Robert?

NESS: So if somebody were to push back there, they could say, well, great; in the long term, maybe it’s an arms race, but in the short term, right, we can have solutions out there that, you know, at least in the next election cycle, we could maybe prevent some of these things from happening. And, again, kind of harkening back to cybersecurity, maybe if you make it hard enough, only the really dedicated, really high-funded people are going to be doing it rather than, you know, everybody who wants to throw a bunch of deepfakes on the internet. But the problem still there is that it focuses really on video and images, right.

BADANES: Yeah. What about audio?

NESS: What about audio? And what about text? So …

BADANES: Yeah. Those are hard. I feel like we’ve talked a lot about definitions and theoretical, but I want to make sure we talk more about what you guys saw and researched and understood on the ground, in particular, your trips to India and Taiwan and even if you want to reflect on how those compare to the US environment. What did you actually uncover? What surprised you? What was different between those countries?

DAEPP: Yeah, I mean, right, so Taiwan … both of these places are young democracies. And that’s really interesting, right. So like in Taiwan, for example, when people vote, they vote on paper. And anybody can go watch. That’s part of their, like, security strategies. Like, anyone around the world can just come and watch. People come from far. They fly in from Canada and Japan and elsewhere just to watch Taiwanese people vote. And then similarly in India, there’s this rule where you have to be walking distance from your polling place, and so the election takes two months. And, like, your polling places move from place to place, and sometimes, it arrives on an elephant. And so these were really interesting places to, like, I as an American, just, like, found it very, very fascinating to and important to be outside of the American context. You know, we just take for granted that how we do democracy is how other people do it. But Taiwan was very much a joint, like, civil society–government everyday response to this challenge of having a lot of efforts to manipulate public opinion happening with, you know, real-world speeches, with AI, with anything that you can imagine. You know, and I think the Microsoft Threat Analysis Center released a report documenting some of the, sort of, video stuff[2]. There’s a use of AI to create videos the night before the election, things like this. But then India is really thinking of … so India, right, it’s the world’s biggest democracy, right. Like, nearly a billion people were eligible to vote.

BADANES: Yeah.

NESS: And arguably the most diverse, right?

DAEPP: Yeah, arguably the most diverse in terms of languages, contexts. And it’s also positioning itself as the AI laboratory for the Global South. And so folks, including folks at the MSR (Microsoft Research) Bangalore lab, are leaders in thinking about representing low-resource languages, right, thinking about cultural representation in AI models. And so there you have all of these technologists who are really trying to innovate and really trying to think about what’s the next clever application, what’s the next clever use. And so that, sort of, that taxonomy that we talked about, like, I think just every week, every interview, we, sort of, had new things to add because folks there were just constantly trying all different kinds of ways of engaging with the public.

NESS: Yeah, I think for me, in India in particular, you know, India is an engineering culture, right. In terms of, like, the professional culture there, they’re very, kind of, engineering skewed. And so I think one of the bigger surprises for me was seeing people who were very experienced and effective campaign operatives, right, people who would go and, you know, hit the pavement; do door knocking; kind of, segment neighborhoods by demographics and voter block, these people were also, you know, graduated in engineering from an IIT (Indian Institute of Technology), …

BADANES: Sure.

NESS: … right, and so … [LAUGHS]  so they were happy to pick up these tools and leverage them to support their expertise in this work, and so some of the, you know, I think a lot of the narrative that we tell ourselves in AI is how it’s going to be, kind of, replacing people in doing their work. But what I saw in India was that people who were very effective had a lot of domain expertise that you couldn’t really automate away and they were the ones who are the early adopters of these tools and were applying it in ways that I think we’re behind on in terms of, you know, ideas in the US.

BADANES: Yeah, I mean, there’s, sort of, this sentiment that AI only augments existing problems and can enhance existing solutions, right. So we’re not great at translation tools, but AI will make us much better at that. But that also can then be weaponized and used as a tool to deceive people, which propaganda is not new, right? We’re only scaling or making existing problems harder, or adversaries are trying to weaponize AI to build on things they’ve already been doing, whether that’s cyberattacks or influence operations. And while the three of us are in different roles, we do work for the same company. And it’s a large technology company that is helping bring AI to the world. At the same time, I think there are some responsibilities when we look at, you know, bad actors who are looking to manipulate our products to create and spread this kind of deceptive media, whether it’s in elections or in other cases like financial fraud or other ways that we see this being leveraged. I’m curious what you all heard from others when you’ve been doing your research and also what you think our responsibilities are as a big tech company when it comes to keeping actors from using our products in those ways.

DAEPP: You know, when I started using GPT-4, one of the things I did was I called my parents, and I said, if you hear me on a phone call, …

BADANES: Yeah.

DAEPP: … like, please double check. Ask me things that only I would know. And when I walk around Building 99, which is, kind of, a storied building in which a lot of Microsoft researchers work, everybody did that call. We all called our parents.

BADANES: Interesting.

DAEPP: Or, you know, we all checked in. So just as, like, we have a responsibility to the folks that we care about, I think as a company, that same, sort of, like, raising literacy around the types of fraud to expect and how to protect yourself from them—I think that gets back to that fraud space that we talked about—and, you know, supporting law enforcement, sharing what needs to be shared, I think that without question is a space that we need to work in. I will say a lot of the folks we talked with, they were using Llama on a local GPU, right.

BADANES: OK.

DAEPP: They were using open-source models. They were sometimes … they were testing out Phi. They would use Phi, Grok, Llama, like anything like that. And so that raises an interesting question about our guardrails and our safety practices. And I think there, we have an, like, our obligation and our opportunity actually is to set the standard, right. To say, OK, like, you know, if you use local Llama and it spouts a bunch of stuff about voter suppression, like, you can get in trouble for that. And so what does it mean to have a safe AI that wins in the marketplace, right? That’s an AI that people can feel confident and comfortable about using and one that’s societally safe but also personally safe. And I think that’s both a challenge and a real opportunity for us.

BADANES: Yeah … oh, go ahead, Robert, yeah …

NESS: Going back to the point about fraud. It was this year, in January, when that British engineering firm Arup, when somebody used a deepfake to defraud that company of about $25 million, …

BADANES: Yeah.

NESS: … their Hong Kong office. And after that happened, some business managers in Microsoft reached out to me regarding a major client who wanted to start red teaming. And by red teaming, I mean intentionally targeting your executives and employees with these types of attacks in order to figure out where your vulnerabilities as an organization are. And I think, yeah, it got me thinking like, wow, I would, you know, can we do this for my dad? [LAUGHS] Because I think that was actually a theme that came out from a lot of this work, which was, like, how can we empower the people who are really on the frontlines of defending democracy in some of these places in terms of the tooling there? So we talked about, say, AI detection tools, but the people who are actually doing fact-checking, they’re looking more than at just the video or the images; they’re actually looking at a, kind of, holistic … taking a holistic view of the news story and doing some proper investigative journalism to see if something is fake or not.

BADANES: Yeah.

NESS: And so I think as a company who creates products, can we take a more of a product mindset to building tools that support that entire workflow in terms of fact-checking or investigative journalism in the context of democratic outcomes …

BADANES: Yeah.

NESS: … where maybe looking at individual deepfake content is just a piece of that.

BADANES: Yeah, you know, I think there’s a lot of parallels here to cybersecurity. That’s also what we’ve found, is this idea that, first of all, the “no silver bullet,” as we were talking about earlier with the detection piece. Like, you can’t expect your system to be secure just because you have a firewall, right. You have to have this, like, defense in-depth approach where you have lots of different layers. And one of those layers has been on the literacy side, right. Training and teaching people not to click on a phishing link, understanding that they should scroll over the URL. Like, these are efforts that have been taken up, sort of, in a broad societal sense. Employers do it. Big tech companies do it. Governments do it through PSAs and other things. So there’s been a concerted effort to get a population who might not have been aware of the fact that they were about to be scammed to now know not to click on that link. I think, you know, you raised the point about literacy. And I think there’s something to be said about media literacy in this space. It’s both AI literacy—understanding what it is—but also understanding that people may try to defraud you. And whether that is in the political sense or in the financial sense, once you have that, sort of, skill set in place, you’re going to be protected. One thing that I’ve heard, though, as I have conversations about this challenge … I’ve heard a couple things back from people specifically in civil society. One is not to put the impetus too much on the end consumer, which I think I’m hearing that we also recognize there’s things that we as technology companies should be focusing on. But the other thing is the concern that in, sort of, the long run, we’re going to all lose trust in everything we see anyway. And I’ve heard some people refer to that as the trust deficit. Have you all seen anything promising in the space to give you a sense around, can we ever trust what we’re looking at again, or are we actually just training everyone to not believe anything they see? Which I hope is not the case. I am an optimist. But I’d love to hear what you all came across. Are there signs of hope here where we might actually have a place where we can trust what we see again? 

DAEPP: Yeah. So two things. There is this phenomenon called the liar’s dividend, right, … 

BADANES: Sure, yeah.

DAEPP: … which is where that if you educate folks about how AI can be used to create fake clips, fake audio clips, fake videos, then if somebody has a real audio clip, a real video, they can claim that it’s AI. And I think we talk, you know, again, this is, like, in a US-centric space, we talk about this with politicians, but the space in which this is really concerning, I think, is war crimes, right …

BADANES: Oh, yeah.

DAEPP: … I think are these real human rights infractions where you can prevent evidence from getting out or being taken seriously. And we do see that right after invasions, for example, these days. But this is actually a space … like, I just told you, like, oh, like, detection is so hard and not technically, like, that’ll be an arms race! But actually, there is this wonderful project, Project Providence, that is a Microsoft collaboration with a company called Truepic that … it’s, like, an app, right. And what happens is when you take a photo using this app, it encrypts the, you know, hashes the GPS coordinates where the photo was taken, the time, the day, and uploads that with the pixels, with the image, to Azure. And then later, when a journalist goes to use that image, they can see that the pixels are exactly the same, and then they can check the location and they can confirm the GPS. And this actually meets evidentiary standards for the UN human rights tribunal, right.

BADANES: Right.

DAEPP: So this is being used in Ukraine to document war crimes. And so, you know, what if everybody had that app on their phone? That means you don’t … you know, most photos you take, you can use an AI tool and immediately play with. But in that particular situation where you need to confirm provenance and you need to confirm that this was a real event that happened, that is a technology that exists, and I think folks like the C2PA coalition (Coalition for Content Provenance and Authenticity) can make that happen across hardware providers.

NESS: And I think the challenge for me is, we can’t separate this problem from some of the other, kind of, fundamental problems that we have in our media environment now, right. So, for example, if I go on to my favorite social media app and I see videos from some conflicts around the world, and these videos could be not AI generated and I still could be, you know, the target of some PR campaign to promote certain content and suppress other ones. The videos could be authentic videos, but not actually be accurate depictions of what they claim to be. And so I think that this is a … the AI presents a complicating factor in an already difficult problem space. And I think, you know, trying to isolate these different variables and targeting them individually is pretty tricky. I do think that despite the liar’s dividend that media literacy is a very positive area to, kind of, focus energy …

BADANES: Yeah.

NESS: … in the sense that, you know, you mentioned earlier, like, using this term fraud, again, going back to this analogy with cybersecurity and cybercrime, that it tends to resonate with people. We saw that, as well, especially in Taiwan, didn’t we, Madeleine? Well, in India, too, with the sextortion fears. But in Taiwan, a lot of just cybercrime in terms of defrauding people of money. And one of the things that we had observed there was that talking about generative AI in the context of elections was difficult to talk to people about it because people, kind of, immediately went into their political camps, right.

BADANES: Yeah.

NESS: And so you had to, kind of, penetrate … you know, people were trying to, kind of, suss out which side you were on when you’re trying to educate them about this topic.

BADANES: Sure.

NESS: But if you talk to—but everybody’s, like, fraud itself is a lot less partisan.

BADANES: Yeah, it’s a neutral term.

NESS: Exactly. And so it becomes a very useful way to, kind of, get these ideas out there.

BADANES: That’s really interesting. And I love the provenance example because it really gets to the question about authenticity. Like, where did something come from? What is the origin of that media? Where has it traveled over time? And if AI is a component of it, then that’s a noted fact. But it doesn’t put us into the space of AI or not AI, which I think is where a lot of the, sort of, labeling has gone so far. And I understand the instinct to do that. But I like the idea of moving more towards how do you know more about an image of which whether there was AI involved or not is a component but does not have judgment. That does not make the picture good or bad. It doesn’t make it true or false. It’s just more information for you to consume. And then, of course, the media literacy piece, people need to know to look for those indicators and want them and ask for them from the technology company. So I think that’s a good, that’s a good silver lining. You gave me the light at the end of the tunnel I think I was looking for on the post-truth world. So, look, here’s the big question. You guys have been spending this time focusing on AI and democracy in this big, massive global election year. There was a lot of hype. [LAUGHS] There was a lot of hype. Lots of articles written about how this was going to be the AI election apocalypse. What say you? Was it? Was it not?

NESS: I think it was, well, we definitely have documented cases where this happened. And I’m wary of this question, particularly again from the cybersecurity standpoint, which is if you were not the victim of a terrible hack that brought down your entire company, would you say, like, well, it didn’t happen, so it’s not going to happen, right. You would never …

BADANES: Yeah.

NESS: That would be a silly attitude to have, right. And also, you don’t know what you don’t know, right. So, like, a lot of the, you know, we mentioned sextortion; we mentioned these cybercrimes. A lot of these are small-dollar crimes, which means they don’t get reported or they don’t get reported for reasons of shame. And so we don’t even have numbers on a lot of that. And we know that the political techniques are going to mirror the criminal techniques.

BADANES: Yeah.

NESS: And also, I worry about, say, down-ballot elections. Like, so much of, kind of, our election this year, a lot of the focus was on the national candidates, but, you know, if local poll workers are being targeted, if disinformation campaigns are being put out about local candidates, it’s not going get the kind of play in the national media such that you and I might hear about it. And so I’m, you know, so I’ll hand it off to Madeleine, but yeah.

DAEPP: So absolutely agree with Robert’s point, right. If your child was affected by sextortion, if you are a country that had an audio clip go viral, this was the deepfake deluge for you, right. That said, something that happened, you know, in India as in the United States, there were major prosecutions very early on, right.

BADANES: Yeah.

DAEPP: So in India, there was a video. It turned out not to be a deepfake. It turned out to be a “cheap fake,” to your point about, you know, the question isn’t whether there’s AI involved; the question is whether this is an attempt to defraud. And five people were charged for this video.

BADANES: Yeah.

DAEPP: And in the United States, right, those Biden robocalls using Biden’s voice to tell folks not to vote, like, that led to a million-dollar fine, I think, for the telecoms and $6 million for the consultant who created that. And when we talk to people in India, you know, people who work in this space, they said, well, I’m not going to do that; like, I’m going to focus on other things. So internal actors pay attention to these things. That really changes what people do and how they do it. And so that, I do think the work that your team did, right, to educate candidates about looking out for the stuff, the work that the MTAC (Microsoft Threat Analysis Center) did to track usage and report it, all of that, I think, was, actually, those interventions, I think, worked. I think they were really important, and I do think that what we are … this absence of a deluge is actually a huge number of people making a very concerted effort to prevent it from happening.

BADANES: That’s encouraging.

NESS: Madeleine, you made a really important point that this deterrence from prosecution, it’s effective for internal actors, …

BADANES: Yeah.

DAEPP: Yeah, that’s right.

NESS: … right. So for foreign states who are trying to interfere with other people’s elections, the fear of prosecution is not going to be as much of a deterrent.

BADANES: That is true. I will say what we saw in this election cycle, in particular in the US, was a concerted effort by the intelligence community to call out and name nation-state actors who were either doing cyberattacks or influence operations, specific videos that they identified, whether there was AI involved or not. I think that level of communication with the public while maybe doesn’t lead to those actors going to jail—maybe someday—but does in fact lead to a more aware public and therefore hopefully a less effective campaign. If people on the other end … and it’s a little bit into the literacy space, and it’s something that we’ve seen government again in this last cycle do very effectively, to name and shame essentially when they see these things in part, though, to make sure voters are aware of what’s happening. We’re not quite through this big global election year; we have a couple more elections before we really hit the end of the year, but it’s winding down. What is next for you all? Are you all going to continue this work? Are you going build on it? What comes next?

DAEPP: So our research in India actually wasn’t focused specifically on elections. It was about AI and digital communications.

BADANES: Ahh.

DAEPP: Because, you know, again, like India is this laboratory.

BADANES: Sure.

DAEPP: And I think what we learned from that work is that, you know, this is going to be a part of our digital communications and our information system going forward without question. And the question is just, like, what are the viable business models, right? What are the applications that work? And again, that comes back to making sure that whatever AI … you know, people when they build AI into their entire, you know, newsletter-writing system, when they build it into their content production, that they can feel confident that it’s safe and that it meets their needs and that they’re protected when they use it. And similarly, like, what are those applications that really work, and how do you empower those lead users while mitigating those harms and supporting civil society and mitigating those harms? I think that’s an incredible, like, that’s—as a researcher—that’s, you know, that’s a career, right.

BADANES: Yeah.

DAEPP: That’s a wonderful research space. And so I think understanding how to support AI that is safe, that enables people globally to have self-determination in how models represent them, and that is usable and powerful, I think that’s broadly …

BADANES: Where this goes.

DAEPP: … what I want to drive.

BADANES: Robert, how about you?

NESS: You know, so I mentioned earlier on these AI alignment issues.

BADANES: Yeah.

NESS: And I was really fascinated by how local and contextual those issues really are. So to give an example from Taiwan, we train these models on training data that we find from the internet. Well, when it comes to, say, Mandarin Chinese, you can imagine the proportion of content, of just the quantity of content, on the internet that comes from China is a lot more than the quantity that comes from Taiwan. And of course, what’s politically correct in China is different from what’s politically correct in Taiwan. And so when we were talking to Taiwanese, a lot of people had these concerns about, you know, having these large language models that reflected Taiwanese values. We heard the same thing in India about just people on different sides of the political spectrum and, kind of, looking at … a YouTuber in India had walked us through this … how, for example, a founding father of India, there was a disparate literature in favor of this person and some more critical of this person, and he had spent time trying to suss out whether GPT-4 was on one side or the other.

BADANES: Oh. Whose side are you on? [LAUGHS]

NESS: Right, and so I think for our alignment research at Microsoft Research, this becomes the beginning of, kind of, a very fruitful way of engaging with local stakeholders and making sure that we can reflect these concerns in the models that we develop and deploy.

BADANES: Yeah. Well, first, I just want to thank you guys for all the work you’ve done. This is amazing. We’ve really enjoyed partnering with you. I’ve loved learning about the research and the efforts, and I’m excited to see what you do next. I always want to end these kinds of conversations on a more positive note, because we’ve talked a lot about the weaponization of AI and, you know, how … ethical areas that are confusing and … but I am sure at some point in your work, you came across really positive use cases of AI when it comes to democracy, or at least I hope you have. [LAUGHS] Do you have any examples or can you leave us with something about where you see either it going or actively being used in a way to really strengthen democratic processes or systems?

DAEPP: Yeah, I mean, there is just a big paper in Science, right, which, as researchers, when something comes out in Science, you know your field is about to change, right, …

BADANES: Yeah.

DAEPP: … showing that an AI model in, like, political deliberations, small groups of UK residents talking about difficult topics like Brexit, you know, climate crisis, difficult topics, that in these conversations, an AI moderator created, like, consensus statements that represented the majority opinion, still showed the minority opinion, but that participants preferred to a human-written statement and in fact preferred to their original opinion.

BADANES: Wow.

DAEPP: And that this, you know, not only works in these randomized controlled trials but actually works in a real citizens deliberation. And so that potential of, like, carefully fine-tuned, like, carefully aligned AI to actually help people find points of agreement, that’s a really exciting space.

BADANES: So next time my kids are in a fight, I’m going to point them to Copilot and say, work with Copilot to mediate. [LAUGHS] No, that’s really, really interesting. Robert, how about you?

NESS: She, kind of, stole my example. [LAUGHTER] But I’ll take it from a different perspective. So, yes, like how these technologies can enable people to collaborate and ideally, I think, from a democratic standpoint, at a local level, right. So, I mean, I think so much of our politics were, kind of, focused at the national-level campaign, but our opportunity to collaborate is much more … we’re much more easily … we can collaborate much more easily with people who are in our local constituencies. And I think to myself about, kind of, like, the decline particularly of local newspapers, local media.

BADANES: Right.

NESS: And so I wonder, you know, can these technologies help address that problem in terms of just, kind of, information about, say, your local community, as well as local politicians. And, yeah, and to Madeleine’s point, so Madeleine started the conversation talking about her background in urban planning and some of the work she did, you know, working on a local level with local officials to bring technology to the level of cities. And I think, like, well, you know, politics are local, right. So, you know, I think that that’s where there’s a lot of opportunity for improvement.

BADANES: Well, Robert, you just queued up a topic for a whole other podcast because our team also does a lot of work around journalism, and I will say we have seen that AI at the local level with local news is really a powerful tool that we’re starting to see a lot of appetite and interest for in order to overcome some of the hurdles they face right now in that industry when it comes to capacity, financing, you know, not able to be in all of the places they want to be at once to make sure that they’re reporting equally across the community. This is, like, a perfect use case for AI, and we’re starting to see folks who are really using it. So maybe we’ll come back and do this again another time on that topic. But I just want to thank you both, Madeleine and Robert, for joining us today and sharing your insights. This was really a fascinating conversation. I know I learned a lot. I hope that our listeners learned a lot, as well.

[MUSIC]

And, listeners, I hope that you tune in for more episodes of Ideas, where we continue to explore the technologies shaping our future and the big ideas behind them. Thank you, guys, so much.

DAEPP: Thank you.

NESS: Thank you.

[MUSIC FADES]

[1] The video generation model Sora was released publicly earlier this month (opens in new tab).

[2] For a summary of and link to the report, see the Microsoft On the Issues blog post China tests US voter fault lines and ramps AI content to boost its geopolitical interests (opens in new tab).

The post Ideas: AI and democracy with Madeleine Daepp and Robert Osazuwa Ness appeared first on Microsoft Research.

]]>
Ideas: Economics and computation with Nicole Immorlica http://approjects.co.za/?big=en-us/research/podcast/ideas-economics-and-computation-with-nicole-immorlica/ Thu, 05 Dec 2024 15:26:25 +0000 http://approjects.co.za/?big=en-us/research/?p=1107621 When research manager Nicole Immorlica discovered she could use math to make the world a better place for people, she was all in. She discusses working in computer science theory and economics, including studying the impact of algorithms and AI on markets.

The post Ideas: Economics and computation with Nicole Immorlica appeared first on Microsoft Research.

]]>
Line illustration of Nicole Immorlica

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Senior Principal Research Manager Nicole Immorlica. As Immorlica describes it, when she and others decided to take a computational approach to pushing the boundaries of economic theory, there weren’t many computer scientists doing research in economics. Since then, contributions such as applying approximation algorithms to the classic economic challenge of pricing and work on the stable marriage problem have earned Immorlica numerous honors, including the 2023 Test of Time Award from the ACM Special Interest Group on Economics and Computation and selection as a 2023 Association for Computing Machinery (ACM) Fellow. Immorlica traces the journey back to a graduate market design course and a realization that captivated her: she could use her love of math to help improve the world through systems that empower individuals to make the best decisions possible for themselves.

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE]

NICOLE IMMORLICA: So honestly, when generative AI came out, I had a bit of a moment, a like crisis of confidence, so to speak, in the value of theory in my own work. And I decided to dive into a data-driven project, which was not my background at all. As a complete newbie, I was quite shocked by what I found, which is probably common knowledge among experts: data is very messy and very noisy, and it’s very hard to get any signal out of it. Theory is an essential counterpart to any data-driven research. It provides a guiding light. But even more importantly, theory allows us to illuminate things that have not even happened. So with models, we can hypothesize about possible futures and use that to shape what direction we take.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest on this episode is Nicole Immorlica, a senior principal research manager at Microsoft Research New England, where she leads the Economics and Computation Group. Considered by many to be an expert on social networks, matching markets, and mechanism design, Nicole has a long list of accomplishments and honors to her name and some pretty cool new research besides. Nicole Immorlica, I’m excited to get into all the things with you today. Welcome to Ideas

NICOLE IMMORLICA: Thank you. 

HUIZINGA: So before we get into specifics on the big ideas behind your work, let’s find out a little bit about how and why you started doing it. Tell us your research origin story and, if there was one, what big idea or animating “what if” inspired young Nicole and launched a career in theoretical economics and computation research? 

IMMORLICA: So I took a rather circuitous route to my current research path. In high school, I thought I actually wanted to study physics, specifically cosmology, because I was super curious about the origins and evolution of the universe. In college, I realized on a day-to-day basis, what I really enjoyed was the math underlying physics, in particular proving theorems. So I changed my major to computer science, which was the closest thing to math that seemed to have a promising career path. [LAUGHTER] But when graduation came, I just wasn’t ready to be a grownup and enter the workforce! So I defaulted to graduate school thinking I’d continue my studies in theoretical computer science. It was in graduate school where I found my passion for the intersection of CS theory and micro-economics. I was just really enthralled with this idea that I could use the math that I so love to understand society and to help shape it in ways that improve the world for everyone in it. 

HUIZINGA: I’ve yet to meet an accomplished researcher who didn’t have at least one inspirational “who” behind the “what.” So tell us about the influential people in your life. Who are your heroes, economic or otherwise, and how did their ideas inspire yours and even inform your career? 

IMMORLICA: Yeah, of course. So when I was a graduate student at MIT, you know, I was happily enjoying my math, and just on a whim, I decided to take a course, along with a bunch of my other MIT graduate students, at Harvard from Professor Al Roth. And this was a market design course. We didn’t even really know what market design was, but in the context of that course, Al himself and the course content just demonstrated to me the transformative power of algorithms and economics. So, I mean, you might have heard of Al. He eventually won a Nobel Prize in economics for his work using a famous matching algorithm to optimize markets for doctors and separately for kidney exchange programs. And I thought to myself, wow, this is such meaningful work. This is something that I want to do, something I can contribute to the world, you know, something that my skill set is well adapted to. And so I just decided to move on with that, and I’ve never really looked back. It’s so satisfying to do something that’s both … I like both the means and I care very deeply about the ends. 

HUIZINGA: So, Nicole, you mentioned you took a course from Al Roth. Did he become anything more to you than that one sort of inspirational teacher? Did you have any interaction with him? And were there any other professors, authors, or people that inspired you in the coursework and graduate studies side of things? 

IMMORLICA: Yeah, I mean, Al has been transformative for my whole career. Like, I first met him in the context of that course, but I, and many of the graduate students in my area, have continued to work with him, speak to him at conferences, be influenced by him, so he’s been there throughout my career for me. 

HUIZINGA: Right, right, right … 

IMMORLICA: In terms of other inspirations, I’ve really admired throughout my career … this is maybe more structurally how different individuals operate their careers. So, for example, Jennifer Chayes, who was the leader of the Microsoft Research lab that I joined … 

HUIZINGA: Yeah! 

IMMORLICA: … and nowadays Sue Dumais. Various other classic figures like Éva Tardos. Like, all of these are incredibly strong, driven women that have a vision of research, which has been transformative in their individual fields but also care very deeply about the community and the larger context than just themselves and creating spaces for people to really flourish. And I really admire that, as well. 

HUIZINGA: Yeah, I’ve had both Sue and Jennifer on the show before, and they are amazing. Absolutely. Well, listen, Nicole, as an English major, I was thrilled—and a little surprised—to hear that literature has influenced your work in economics. I did not have that on my bingo card. Tell us about your interactions with literature and how they broadened your vision of optimization and economic models.

IMMORLICA: Oh, I read a lot, especially fiction. And I care very deeply about being a broad human being, like, with a lot of different facets. And so I seek inspiration not just from my fellow economists and computer scientists but also from artists and writers. One specific example would be Walt Whitman. So I took up this poetry class as an MIT alumni, Walt Whitman, and we, in the context of that course, of course, read his famous poem “Song of Myself.” And I remember one specific verse just really struck me, where he writes, “Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes.)” And this just was so powerful because, you know, in traditional economic models, we assume that individuals seek to optimize a single objective function, which we call their utility, but what Whitman is pointing out is that we actually have many different objective functions, which can even conflict with one another, and some at times are more salient than others, and they arise from my many identities as a member of my family, as an American, as you know, a computer scientist, as an economist, and maybe we should actually try to think a little bit more seriously about these multiple identities in the context of our modeling. 

HUIZINGA: That just warms my English major heart … [LAUGHS] 

IMMORLICA: I’m glad! [LAUGHS] 

HUIZINGA: Oh my gosh. And it’s so interesting because, yeah, we always think of, sort of, singular optimization. And so it’s like, how do we expand our horizon on that sort of optimization vision? So I love that. Well, you’ve received what I can only call a flurry of honors and awards last year. Most recently, you were named an ACM Fellow—ACM being Association for Computing Machinery, for those who don’t know—which acknowledges people who bring, and I quote, “transformative contributions to computing science and technology.” Now your citation is for, and I quote again, “contributions to economics and computation, including market design, auctions, and social networks.” That’s a mouthful, but if we’re talking about transformative contributions, how were things different before you brought your ideas to this field, and how were your contributions transformative or groundbreaking? 

IMMORLICA: Yeah, so it’s actually a relatively new thing for computer scientists to study economics, and I was among the first cohort to do so seriously. So before our time, economists mostly focused on finding optimal solutions to the problems they posed without regard for the computational or informational requirements therein. But computer scientists have an extensive toolkit to manage such complexities. So, for example, in a paper on pricing, which is a classic economic problem—how do we set up prices for goods in a store?—my coauthors and I used the computer science notion of approximation to show that a very simple menu of prices generates almost optimal revenue for the seller. And prior to this work, economists only knew how to characterize optimal but infinitely large and thereby impractical menus of prices. So this is an example of the kind of work that I and other computer scientists do that can really transform economics. 

HUIZINGA: Right. Well, in addition to the ACM fellowship, another honor you received from ACM in 2023 was the Test of Time Award, where the Special Interest Group on Economics and Computation, or SIGecom, recognizes influential papers published between 10 and 25 years ago that significantly impacted research or applications in economics and computation. Now you got this award for a paper you cowrote in 2005 called “Marriage, Honesty, and Stability.” Clearly, I’m not an economist because I thought this was about how to avoid getting a divorce, but actually, it’s about a well-known and very difficult problem called the stable marriage problem. Tell us about this problem and the paper and why, as the award states, it’s stood the test of time. 

IMMORLICA: Sure. You’re not the only one to have misinterpreted the title. [LAUGHTER] I remember I gave a talk once and someone came and when they left the talk, they said, I did not think that this was about math! But, you know, math, as I learned, is about life, and the stable marriage problem has, you know, interpretation about marriage and divorce. In particular, the problem asks, how can we match market participants to one another such that no pair prefer each other to their assigned match? So to relate this to the somewhat outdated application of marriage markets, the market participants could be men and women, and the stable marriage problem asks if there is a set of marriages such that no pair of couples seeks a divorce in order to marry each other. And so, you know, that’s not really a problem we solve in real life, but there’s a lot of modern applications of this problem. For example, assigning medical students to hospitals for their residencies, or if you have children, many cities in the United States and around the world use this stable marriage problem to think about the assignment of K-to-12 students to public schools. And so in these applications, the stability property has been shown to contribute to the longevity of the market. And in the 1960s, David Gale and Lloyd Shapley proved, via an algorithm, interestingly, that stable matches exist! Well, in fact, there can be exponentially many stable matches. And so this leads to a very important question for people that want to apply this theory to practice, which is, which stable match should they select among the many ones that exist, and what algorithm should they use to select it? So our work shows that under very natural conditions, namely that preference lists are short and sufficiently random, it doesn’t matter. Most participants have a unique stable match. And so, you know, you can just design your market without worrying too much about what algorithm you use or which match you select because for most people it doesn’t matter. And since our paper, many researchers have followed up on our work studying conditions under which matchings are essentially unique and thereby influencing policy recommendations. 

HUIZINGA: Hmm. So this work was clearly focused on the economics side of things like markets. So this seems to have wide application outside of economics. Is that accurate? 

IMMORLICA: Well, it depends how you define economics, so I would … 

HUIZINGA: I suppose! [LAUGHTER] 

IMMORLICA: I define economics as the problem … I mean, Al Roth, for example, wrote a book whose title was Who Gets What—and Why. 

HUIZINGA: Ooh.

IMMORLICA: So economics is all about, how do we allocate stuff? How do we allocate scarce resources? And many economic problems are not about spending money. It’s about how do we create outcomes in the world. 

HUIZINGA: Yeah. 

IMMORLICA: And so I would say all of these problem domains are economics. 

HUIZINGA: Well, finally, as regards the “flurry” of honors, besides being named an ACM Fellow and also this Test of Time Award, you were also named an Economic Theory Fellow by the Society for [the] Advancement of Economic Theory, or SAET. And the primary qualification here was to have “substantially or creatively advanced theoretical economics.” So what were the big challenges you tackled, and what big ideas did you contribute to advance economic theory? 

IMMORLICA: So as we’ve discussed, I and others with my background have done a lot to advance economic theory through the lens of computational thinking. 

HUIZINGA: Mmm … 

IMMORLICA: We’ve introduced ideas such as approximation, which we discussed earlier, or machine learning to economic models and proposing them as solution concepts. We’ve also used computer science tools to solve problems within these models. So two examples from my own work include randomized algorithm analysis and stochastic gradient descent. And importantly, we’ve introduced very relevant new settings to the field of economics. So, you know, I’ve worked hard on large-scale auction design and associated auto-bidding algorithms, for instance, which are a primary source of revenue for tech companies these days. I’ve thought a lot about how data enters into markets and how we should think about data in the context of market design. And lately, I’ve spent a lot of time thinking about generative AI and its impact in the economy at both the micro and macro levels. 

HUIZINGA: Yeah. Let’s take a detour for a minute and get into the philosophical weeds on this idea of theory. And I want to cite an article that was written way back in 2008 by the editor of Wired magazine at the time, Chris Anderson. He wrote an article titled “The End of Theory,” which was provocative in itself. And he began by quoting the British statistician George Box, who famously said, “All models are wrong, but some are useful.” And then he argued that in an era of massively abundant data, companies didn’t have to settle for wrong models. And then he went even further and attacked the very idea of theory and, citing Google, he said, “Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity.” So, Nicole, from your perch, 15 years later, in the age of generative AI, what did Chris Anderson get right, and what did he get wrong? 

IMMORLICA: So, honestly, when generative AI came out, I had a bit of a moment, a like crisis of confidence, so to speak, in the value of theory in my own work. 

HUIZINGA: Really! 

IMMORLICA: And I decided to dive into a data-driven project, which was not my background at all. As a complete newbie, I was quite shocked by what I found, which is probably common knowledge among experts: data is very messy and very noisy, and it’s very hard to get any signal out of it. Theory is an essential counterpart to any data-driven research. It provides a guiding light. But even more importantly, theory allows us to illuminate things that have not even happened. So with models, we can hypothesize about possible futures and use that to shape what direction we take. Relatedly, what I think that article got most wrong was the statement that correlation supersedes causation, which is actually how the article closes, this idea that causation is dead or dying. I think causation will never become irrelevant. Causation is what allows us to reason about counterfactuals. It’s fundamentally irreplaceable. It’s like, you know, data, you can only see data about things that happened. You can’t see data about things that could happen but haven’t or, you know, about alternative futures. 

HUIZINGA: Interesting. 

IMMORLICA: And that’s what theory gives you. 

HUIZINGA: Yeah. Well, let’s continue on that a little bit because this show is yet another part of our short “series within a series” featuring some of the work going on in the AI, Cognition, and the Economy initiative at Microsoft Research. And I just did an episode with Brendan Lucier and Mert Demirer on the micro- and macro-economic impact of generative AI. And you were part of that project, but another fascinating project you’re involved in right now looks at the impact of generative AI on what you call the “content ecosystem.” So what’s the problem behind this research, and what unique incentive challenges are content creators facing in light of large language and multimodal AI models? 

IMMORLICA: Yeah, so this is a project with Brendan, as well, whom you interviewed previously, and also Nageeb Ali, an economist and AICE Fellow at Penn State, and Meena Jagadeesan, who was my intern from Microsoft Research from UC Berkeley. So when you think about content or really any consumption good, there’s often a whole supply chain that produces it. For music, for example, there’s the composition of the song, the recording, the mixing, and finally the delivery to the consumer. And all of these steps involve multiple humans producing things, generating things, getting paid along the way. One way to think about generative AI is that it allows the consumer to bypass this supply chain and just generate the content directly. 

HUIZINGA: Right … 

IMMORLICA: So, for example, like, I could ask a model, an AI model, to compose and play a song about my cat named Whiskey. [LAUGHTER] And it would do a decent job of it, and it would tailor the song to my specific situation. But there are drawbacks, as well. One thing many researchers fear is that AI needs human-generated content to train. And so if people start bypassing the supply chain and just using AI-generated content, there won’t be any content for AI to train on and AI will cease to improve.

HUIZINGA: Right. 

IMMORLICA: Another thing that could be troubling is that there are economies of scale. So there is a nontrivial cost to producing music, even for AI, and if we share that cost among many listeners, it becomes more affordable. But if we each access the content ourselves, it’s going to impose a large per-song cost. And then finally, and this is, I think, most salient to most people, there’s some kind of social benefit to having songs that everyone listens to. It provides a common ground for understanding. It’s a pillar of our culture, right. And so if we bypass that, aren’t we losing something? So for all of these reasons, it becomes very important to understand the market conditions under which people will choose to bypass supply chains and the associated costs and benefits of this. What we show in this work, which is very much work in progress, is that when AI is very costly, neither producers nor consumers will use it, but as it gets cheaper, at first, it actually helps content producers that can leverage it to augment their own ability, creating higher-quality content, more personalized content more cheaply. But then, as the AI gets super cheap, this bypassing behavior starts to emerge, and the content creators are driven out of the market. 

HUIZINGA: Right. So what do we do about that? 

IMMORLICA: Well, you know, you have to take a stance on whether that’s even a good thing or a bad thing, … 

HUIZINGA: Right! 

IMMORLICA: … so it could be that we do nothing about it. We could also impose a sort of minimum wage on AI, if you like, to artificially inflate its costs. We could try to amplify the parts of the system that lead towards more human-generated content, like this sociability, the fact that we all are listening to the same stuff. We could try to make that more salient for people. But, you know, generally speaking, I’m not really in a place to take a stance on whether this is a good thing or a bad thing. I think this is for policymakers. 

HUIZINGA: It feels like we’re at an inflection point. I’m really interested to see what your research in this arena, the content ecosystem, brings. You know, I’ll mention, too, recently I read a blog written by Yoshua Bengio and Vincent Conitzer, and they acknowledged that the image that they used at the top had been created by an AI bot. And then they said they made a donation to an art museum to say, we’re giving something back to the artistic community that we may have used. Where do you see this, you know, #NoLLM situation coming in this content ecosystem market? 

IMMORLICA: Yeah, that’s a very interesting move on their part. I know Vince quite well, actually. I’m not sure that artists of the sort of “art museum nature” suffer, so … 

HUIZINGA: Right? [LAUGHS] 

IMMORLICA: One of my favorite artists is Laurie Anderson. I don’t know if you’ve seen her work at all … 

HUIZINGA: Yeah, I have, yeah. 

IMMORLICA: … but she has a piece in the MASS MoCA right now, which is just brilliant, where she actually uses generative AI to create a sequence of images that creates an alternate story about her family history. And it’s just really, really cool. I’m more worried about people who are doing art vocationally, and I think, and maybe you heard some of this from Mert and Brendan, like what’s going to happen is that careers are going to shift and different vocations will become more salient, and we’ve seen this through every technological revolution. People shift their work towards the things that are uniquely human that we can provide and if generating an image at the top of a blog is not one of them, you know, so be it. People will do something else. 

HUIZINGA: Right, right, right. Yeah, I just … we’re on the cusp, and there’s a lot of things that are going to happen in the next couple of years, maybe a couple of months, who knows? [LAUGHTER] Well, we hear a lot of dystopian fears—some of them we’ve just referred to—around AI and its impact on humanity, but those fears are often dismissed by tech optimists as what I might call “unwishful thinking.” So your research interests involve the design and use of sociotechnical systems to quote, “explain, predict, and shape behavioral patterns in various online and offline systems, markets, and games.” Now I’m with you on the “explain and predict” but when we get to shaping behavioral patterns, I wonder how we tease out the bad from the good. So, in light of the power of these sociotechnical systems, what could possibly go wrong, Nicole, if in fact you got everything right? 

IMMORLICA: Yeah, first I should clarify something. When I say I’m interested in shaping behavioral patterns, I don’t mean that I want to impose particular behaviors on people but rather that I want to design systems that expose to people relevant information and possible actions so that they have the power to shape their own behavior to achieve their own goals. And if we’re able to do that, and do it really well, then things can only really go wrong if you believe people aren’t good at making themselves happy. I mean, there’s certainly evidence of this, like the field of behavioral economics, to which I have contributed some, tries to understand how and when people make mistakes in their behavioral choices. And it proposes ways to help people mitigate these mistakes. But I caution us from going too far in this direction because at the end of the day, I believe people know things about themselves that no external authority can know. And you don’t want to impose constraints that prevent people from acting on that information. 

HUIZINGA: Yeah. 

IMMORLICA: Another issue here is, of course, externalities. It could be that my behavior makes me happy but makes you unhappy. [LAUGHTER] So another thing that can go wrong is that we, as designers of technology, fail to capture these underlying externalities. I mean, ideally, like an economist would say, well, you should pay with your own happiness for any negative externality you impose on others. And the fields of market and mechanism design have identified very beautiful ways of making this happen automatically in simple settings, such as the famous Vickrey auction. But getting this right in the complex sociotechnical systems of our day is quite a challenge. 

HUIZINGA: OK, go back to that auction. What did you call it? The Vickrey auction? 

IMMORLICA: Yeah, so Vickrey was an economist, and he proposed an auction format that … so an auction is trying to find a way to allocate goods, let’s say, to bidders such that the bidders that value the goods the most are the ones that win them. 

HUIZINGA: Hm. 

IMMORLICA: But of course, these bidders are imposing a negative externality on the people who lose, right? [LAUGHTER] And so what Vickrey showed is that a well-designed system of prices can compensate the losers exactly for the externality that is imposed on them. A very simple example of a Vickrey auction is if you’re selling just one good, like a painting, then what you should do, according to Vickrey, is solicit bids, give it to the highest bidder, and charge them the second-highest price. 

HUIZINGA: Interesting … 

IMMORLICA: And so … that’s going to have good outcomes for society. 

HUIZINGA: Yeah, yeah. I want to expand on a couple of thoughts here. One is as you started out to answer this question, you said, well, I’m not interested in shaping behaviors in terms of making you do what I want you to do. But maybe someone else is. What happens if it falls into the wrong hands? 

IMMORLICA: Yeah, I mean, there’s definitely competing interests. Everybody has their own objectives, and … 

HUIZINGA: Sure, sure. 

IMMORLICA: … I might be very fundamentally opposed to some of them, but everybody’s trying to optimize something, and there are competing optimization objectives. And so what’s going to happen if people are leveraging this technology to optimize for themselves and thereby harming me a lot? 

HUIZINGA: Right? 

IMMORLICA: Ideally, we’ll have regulation to kind of cover that. I think what I’m more worried about is the idea that the technology itself might not be aligned with me, right. Like at the end of the day, there are companies that are producing this technology that I’m then using to achieve my objectives, but the company’s objectives, the creators of the technology, might not be completely aligned with the person’s objectives. And so I’ve looked a little bit in my research about how this potential misalignment might result in outcomes that are not all that great for either party. 

HUIZINGA: Wow. Is that stuff that’s in the works? 

IMMORLICA: We have a few published papers on the area. I don’t know if you want me to get into them. 

HUIZINGA: No, actually, what we’ll probably do is put some in the show notes. We’ll link people to those papers because I think that’s an interesting topic. Listen, most research is incremental in nature, where the ideas are basically iterative steps on existing work. But sometimes there are out-of-the-box ideas that feel like bigger swings or even outrageous, and Microsoft is well known for making room for these. Have you had an idea that felt outrageous, any idea that felt outrageous, or is there anything that you might even consider outrageous now that you’re currently working on or even thinking about? 

IMMORLICA: Yeah, well, I mean, this whole moment in history feels outrageous, honestly! [LAUGHTER] It’s like I’m kind of living in the sci-fi novels of my youth. 

HUIZINGA: Right? 

IMMORLICA: So together with my economics and social science colleagues at Microsoft Research, one thing that we’re really trying to think through is this outrageous idea of agentic AI

HUIZINGA: Mmm … 

IMMORLICA: That is, every single individual and business can have their own AI that acts like their own personal butler that knows them intimately and can take actions on their behalf. In such a world, what will become of the internet, social media, platforms like Amazon, Spotify, Uber? On the one hand, you know, maybe this is good because these individual agentic AIs can just bypass all of these kinds of intermediaries. For example, if I have a busy day of back-to-back meetings at work, my personal AI can notice that I have no time for lunch, contact the AI of some restaurant to order a sandwich for me, make sure that sandwich is tailored to my dietary needs and preferences, and then contact the AI of a delivery service to make sure that sandwich is sitting on my desk when I walk into my noon meeting, right. 

HUIZINGA: Right … 

IMMORLICA: And this is a huge disruption to how things currently work. It’s shifting the power away from centralized platforms, back to individuals and giving them the agency over their data and the power to leverage it to fulfill their needs. So the, sort of, big questions that we’re thinking about right now is, how will such decentralized markets work? How will they be monetized? Will it be a better world than the one we live in now, or are we losing something? And if it is a better world, how can we get from here to there? And if it’s a worse world, how can we steer the ship in the other direction, you know? 

HUIZINGA: Right. 

IMMORLICA: These are all very important questions in this time. 

HUIZINGA: Does this feel like it’s imminent? 

IMMORLICA: I do think it’s imminent. And I think, you know, in life, you can, kind of, decide whether to embrace the good or embrace the bad, see the glass as half-full or half-empty, and … 

HUIZINGA: Yeah. 

IMMORLICA: … I am hoping that society will see the half-full side of these amazing technologies and leverage them to do really great things in the world. 

HUIZINGA: Man, I would love to talk to you for another hour, but we have to close things up. To close this show, I want to do something new with you, a sort of lightning round of short questions with short answers that give us a little window into your life. So are you ready? 

IMMORLICA: Yup! 

HUIZINGA: OK. First one, what are you reading right now for work? 

IMMORLICA: Lots of papers of my students that are on the job market to help prepare recommendation letters. It’s actually very inspiring to see the creativity of the younger generation. In terms of books, I’m reading the Idea Factory, which is about the creation of Bell Labs. 

HUIZINGA: Ooh! Interesting! 

IMMORLICA: You might be interested in it actually. It talks about the value of theory and understanding the fundamentals of a problem space and the sort of business value of that, so it’s very intriguing. 

HUIZINGA: OK, second question. What are you reading for pleasure? 

IMMORLICA: The book on my nightstand right now is the Epic of Gilgamesh, the graphic novel version. I’m actually quite enthralled by graphic novels ever since I first encountered Maus by Art Spiegelman in the ’90s. But my favorite reading leans towards magic realism, so like Gabriel García Márquez, Italo Calvino, Isabel Allende, and the like. I try to read nonfiction for pleasure, too, but I generally find life is a bit too short for that genre! [LAUGHTER] 

HUIZINGA: Well, and I made an assumption that what you were reading for work wasn’t pleasurable, but um, moving on, question number three, what app doesn’t exist but should? 

IMMORLICA: Teleportation. 

HUIZINGA: Ooh, fascinating. What app exists but shouldn’t? 

IMMORLICA: That’s much harder for me. I think all apps within legal bounds should be allowed to exist and the free market should decide which ones survive. Should there be more regulation of apps? Perhaps. But more at the level of giving people tools to manage their consumption at their own discretion and not outlawing specific apps; that just feels too paternalistic to me. 

HUIZINGA: Interesting. OK, next question. What’s one thing that used to be very important to you but isn’t so much anymore? 

IMMORLICA: Freedom. So by that I mean the freedom to do whatever I want, whenever I want, with whomever I want. This feeling that I could go anywhere at any time without any preparation, that I could be the Paul Erdős of the 21st century, traveling from city to city, living out of a suitcase, doing beautiful math just for the art of it. This feeling that I have no responsibilities. Like, I really bought into that in my 20s. 

HUIZINGA: And not so much now? 

IMMORLICA: No. 

HUIZINGA: OK, so what’s one thing that wasn’t very important to you but is now? 

IMMORLICA: Now, as Janis Joplin sang, “Freedom is just another word for nothing left to lose.” [LAUGHTER] And so now it’s important to me to have things to lose—roots, family, friends, pets. I think this is really what gives my life meaning. 

HUIZINGA: Yeah, having Janis Joplin cited in this podcast wasn’t on my bingo card either, but that’s great. Well, finally, Nicole, I want to ask you this question based on something we talked about before. Our audience doesn’t know it, but I think it’s funny. What do Norah Jones and oatmeal have in common for you? 

IMMORLICA: Yeah, so I use these in conversation as examples of comfort and nostalgia in the categories of music and food because I think they’re well-known examples. But for me personally, comfort is the Brahms Cello Sonata in E Minor, which was in fact my high school cello performance piece, and nostalgia is spaghetti with homemade marinara sauce, either my boyfriend’s version or, in my childhood, my Italian grandma’s version. 

HUIZINGA: Man! Poetry, art, cooking, music … who would have expected all of these to come into an economist/computer scientist podcast on the Microsoft Research Podcast. Nicole Immorlica, how fun to have you on the show! Thanks for joining us today on Ideas

IMMORLICA: Thank you for having me. 

[MUSIC] 

The post Ideas: Economics and computation with Nicole Immorlica appeared first on Microsoft Research.

]]>