Abstracts Archives - Microsoft Research

Abstracts: November 14, 2024

Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) — Thu, 14 Nov 2024 15:00:00 +0000

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Microsoft Senior Researcher Tong Wang joins guest host Bonnie Kruft, partner and deputy director of Microsoft Research AI for Science, to discuss “Ab initio characterization of protein molecular dynamics with AI²BMD.” In the paper, which was published by the scientific journal Nature, Wang and his coauthors detail a system that leverages AI to advance the state of the art in simulating the behavior of large biomolecules. AI²BMD, which is generalizable across a wide range of proteins, has the potential to advance solutions to scientific problems and enhance biomedical research in drug discovery, protein design, and enzyme engineering.

Read the paper

Get the code

Transcript

[MUSIC]

BONNIE KRUFT: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

I’m Bonnie Kruft, partner and deputy director of Microsoft Research AI for Science and your host for today. Joining me is Tong Wang, a senior researcher at Microsoft. Tong is the lead author of a paper called “Ab initio characterization of protein molecular dynamics with AI²BMD,” which has just been published by the top scientific journal Nature. Tong, thanks so much for joining us today on Abstracts!

TONG WANG: Thank you, Bonnie.

KRUFT: Microsoft Research is one of the earliest institutions to apply AI in biomolecular simulation research. Why did the AI for Science team choose this direction, and—with this work specifically, AI²BMD—what problem are you and your coauthors addressing, and why should people know about it?

WANG: So as Richard Feynman famously said, “Everything that living things do can be understood in terms of the jigglings and the wigglings of atoms.” To study the mechanisms behind the biological processes and to develop biomaterials and drugs requires a computational approach that can accurately characterize the dynamic motions of biomolecules. When we review the computational research for biomolecular structure, we can get two key messages. First, in recent years, predicting the crystal, or static, protein structures with methods powered by AI has achieved great success and just won the Nobel Prize in Chemistry in the last month. However, characterizing the dynamic structures of proteins is more meaningful for biology, drug, and medicine fields but is much more challenging. Second, molecular dynamics simulation, or MD, is one of the most widely used approaches to study protein dynamics, which can be roughly divided into classical molecular dynamics simulation and quantum molecular dynamics simulation. Both approaches have been developed for more than a half century and won Nobel Prize. Classical MD is fast but less accurate, while quantum MD is very accurate but computationally prohibitive for the protein study. However, we need both the accuracy and the efficiency to detect the biomechanisms. Thus, applying AI in biomolecular simulation can become the third way to achieve both ab initio—or first principles—accuracy and high efficiency. In the winter of 2020, we have foreseen the trend that AI can make a difference in biomolecular simulations. Thus, we chose this direction.

KRUFT: It took four years from the idea to the launch of AI²BMD, and there were many important milestones along the way. First, talk about how your work builds on and/or differs from what’s been done previously in this field, and then give our audience a sense of the key moments and challenges along the AI²BMD research journey.

WANG: First, I’d like to say applying AI in biomolecular simulation is a novel research field. For AI-powered MD simulation for large biomolecules, there is no existing dataset, no well-designed machine learning model for the interactions between the atoms and the molecules, no clear technical roadmap, no mature AI-based simulation system. So we face various new challenges every day. Second, there are some other works exploring this area at the same time. I think a significant difference between AI²BMD and other works is that other works require to generate new data and train the deep learning models for any new proteins. So it takes a protein-specific solution. As a contrast, AI²BMD proposes a generalizable solution for a wide range of proteins. To achieve it, as you mentioned, there are some key milestones during the four-year journey. The first one is we proposed the generalizable protein fragmentation approach that divides proteins into the commonly used 20 kinds of dipeptides. Thus, we don’t need to generate data for various proteins. Instead, we only need to sample the conformational space of such dipeptides. So we built the protein unit dataset that contains about 20 million samples with ab initio accuracy. Then we proposed ViSNet, the graph neural network for molecular geometry modeling as the machine learning potential for AI²BMD. Furthermore, we designed AI²BMD simulation system by efficiently leveraging CPUs and GPUs at the same time, achieving hundreds of times simulation speed acceleration than one year before and accelerating the AI-driven simulation with only ten to a hundred millisecond per simulation step. Finally, we examined AI²BMD on energy, force, free energy, J coupling, and many kinds of property calculations for tens of proteins and also applied AI²BMD in the drug development competition. All things are done by the great team with science and engineering expertise and the great leadership and support from AI for Science lab.

KRUFT: Tell us about how you conducted this research. What was your methodology?

WANG: As exploring an interdisciplinary research topic, our team consists of experts and students with biology, chemistry, physics, math, computer science, and engineering backgrounds. The teamwork with different expertise is key to AI²BMD research. Furthermore, we collaborated and consulted with many senior experts in the molecular dynamics simulation field, and they provided very insightful and constructive suggestions to our research. Another aspect of the methodology I’d like to emphasize is learning from negative results. Negative results happened most of the time during the study. What we do is to constantly analyze the negative results and adjust our algorithm and model accordingly. There’s no perfect solution for a research topic, and we are always on the way.

KRUFT: AI²BMD got some upgrades this year, and as we mentioned at the top of the episode, the work around the latest system was published in the scientific journal Nature. So tell us, Tong—what is new about the latest AI²BMD system?

WANG: Good question. We posted a preliminary version of AI²BMD manuscript on bioRxiv last summer. I’d like to share three important upgrades through the past one and a half year. The first is hundreds of times of simulation speed acceleration for AI²BMD, which becomes one of the fastest AI-driven MD simulation system and leads to perform much longer simulations than before. The second aspect is AI²BMD was applied for many protein property calculations, such as enthalpy, heat capacity, folding free energy, pK_a, and so on. Furthermore, we have been closely collaborating with the Global Health Drug Discovery Institute, GHDDI, a nonprofit research institute founded and supported by the Gates Foundation, to leverage AI²BMD and other AI capabilities to accelerate the drug discovery processes.

KRUFT: What significance does AI²BMD hold for research in both biology and AI? And also, what impact does it have outside of the lab, in terms of societal and individual benefits?

WANG: Good question. For biology, AI²BMD provides a much more accurate approach than those used in the past several decades to simulate the protein dynamic motions and study the bioactivity. For AI, AI²BMD proves AI can make a big difference to the dynamic protein structure study beyond AI for the protein static structure prediction. Raised by AI²BMD and other works, I can foresee there is a coming age of AI-driven biomolecular simulation, providing binding free-energy calculation with quantum simulation accuracy for the complex of drug and the target protein for drug discovery, detecting more flexible biomolecular conformational changes that molecular mechanics cannot do, and opening more opportunities for enzyme engineering and vaccine and antibody design.

KRUFT: AI is having a profound influence on the speed and breadth of scientific discovery, and we’re excited to see more and more talented people joining us in this space. What do you want our audience to take away from this work, particularly those already working in the AI for Science space or looking to enter it?

WANG: Good question. I’d like to share three points from my research experience. First is aim high. Exploring a disruptive research topic is better than doing 10 incremental works. In the years of research, our organization always encourages us to do the big things. Second is persistence. I remembered a computer scientist previously said about 90% of the time during research is failure and frustration. The rate is even higher when exploring a new research direction. In AI²BMD study, when we suffered from research bottlenecks that cannot be tackled for several months, when we received critical comments from reviewers, when some team members wanted to give up and leave, I always encourage everyone to persist, and we will make it. More importantly, the foundation of persistence is to ensure your research direction is meaningful and constantly adjust your methodology from failures and critical feedback. The third one is real-world applications. Our aim is to leverage AI for advancing science. Proposing scientific problems is a first step, then developing AI tools and evaluating on benchmarks and, more importantly, examining its usefulness in the real-world applications and further developing your AI algorithms. In this way, you can close the loop of AI for Science research.

KRUFT: And, finally, Tong, what unanswered questions or unsolved problems remain in this area, and what’s next on the agenda for the AI²BMD team?

WANG: Well, I think AI²BMD is a starting point for the coming age of AI-driven MD for biomolecules. There are lots of new scientific questions and challenges coming out in this new field. For example, how to expand the simulated molecules from proteins to other kinds of biomolecules; how to describe the biochemical reactions during the simulations; how to further improve the simulation efficiency and robustness; and how to apply it for more real-world scenarios. We warmly welcome any people from both academic and industrial fields to work together with us to make the joint efforts to push the frontier of this new field moving forward.

[MUSIC]

KRUFT: Well, Tong, thank you for joining us today, and to our listeners, thanks for tuning in. If you want to read the full paper on AI²BMD, you can find a link at aka.ms/abstracts, or you can read it on the Nature website. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: November 14, 2024 appeared first on Microsoft Research.

Abstracts: November 5, 2024

Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) — Tue, 05 Nov 2024 19:30:00 +0000

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Microsoft senior principal researchers Chris Hawblitzel and Jay Lorch join host Amber Tingle to discuss “Verus: A Practical Foundation for Systems Verification,” which received the Distinguished Artifact Award at this year’s Symposium on Operating Systems Principles, or SOSP. In their research, Hawblitzel, Lorch, and their coauthors leverage advances in programming languages and formal verification with two aims. The first aim is to help make software verification more accessible for systems developers so they can demonstrate their code will behave as intended. The second aim is to provide the research community with sound groundwork to tackle the application of formal verification to large, complex systems.

Read the paper

Transcript

[MUSIC]

AMBER TINGLE: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Amber Tingle. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Our guests today are Chris Hawblitzel and Jay Lorch. They are both senior principal researchers at Microsoft and two of the coauthors on a paper called “Verus: A Practical Foundation for Systems Verification.” This work received the Distinguished Artifact Award at the 30th Symposium on Operating Systems Principles, also known as SOSP, which is happening right now in Austin, Texas. Chris and Jay, thank you for joining us today for Abstracts and congratulations!

JAY LORCH: Thank you for having us.

CHRIS HAWBLITZEL: Glad to be here.

TINGLE: Chris, let’s start with an overview. What problem does this research address, and why is Verus something that the broader research community should know about?

HAWBLITZEL: So what we’re trying to address is a very simple problem where we’re trying to help developers write software that doesn’t have bugs in it. And we’re trying to provide a tool with Verus that will help developers show that their code actually behaves the way it’s supposed to; it obeys some sort of specification for what the program is supposed to do.

TINGLE: How does this publication build on or differ from other research in this field, including your previous Verus-related work?

HAWBLITZEL: So formal verification is a process where you write down what it is that you want your program to do in mathematical terms. So if you’re writing an algorithm to sort a list, for example, you might say that the output of this algorithm should be a new list that is a rearrangement of the elements of the old list, but now this rearrangement should be in sorted order. So you can write that down using standard mathematics. And now given that mathematical specification, the challenge is to prove that your piece of software written in a particular language, like Java or C# or Rust, actually generates an output that meets that mathematical specification. So this idea of using verification to prove that your software obeys some sort of specification, this has been around for a long time, so, you know, even Alan Turing talked about ways of doing this many, many decades ago. The challenge has always been that it’s really hard to develop these proofs for any large piece of software. It simply takes a long time for a human being to write down a proof of correctness of their software. And so what we’re trying to do is to build on earlier work in verification and recent developments in programming languages to try to make this as easy as possible and to try to make it as accessible to ordinary software developers as possible. So we’ve been using existing tools. There are automated theorem provers—one of them from Microsoft Research called Z3—where you give it a mathematical formula and ask it to prove that the formula is valid. We’re building on that. And we’re also taking a lot of inspiration from tools developed at Microsoft Research and elsewhere, like Dafny and F* and so on, that we’ve used in the past for our previous verification projects. And we’re trying to take ideas from those and make them accessible to developers who are using common programming languages. In this case, the Rust programming language is what we’re focusing on.

TINGLE: Jay, could you describe your methodology for us and maybe share a bit about how you and your coauthors tested the robustness of Verus.

LORCH: So the question we really want to answer is, is Verus suitable for systems programming? So that means a variety of things. Is it amenable to a variety of kinds of software that you want to build as part of a system? Is it usable by developers? Can they produce compact proofs? And can they get timely feedback about those proofs? Can the verifier tell you quickly that your proof is correct or, if it’s wrong, that it’s wrong and guide you to fix it? So the main two methodological techniques we used were millibenchmarks and full systems. So the millibenchmarks are small pieces of programs that have been verified by other tools in the past, and we built them in Verus and compared to what other tools would do to find whether we could improve usability. And we found generally that we could verify the same things but with more compact proofs and proofs that would give much snappier feedback. The difference between one second and 10 seconds might not seem a lot, but when you’re writing code and working with the verifier, it’s much nicer to get immediate feedback about what is wrong with your proof so you can say, oh, what about this? And it can say, oh, well, I still see a problem there. And you could say, OK, let me fix that. As opposed to waiting 10, 20 seconds between each such query to the verifier. So the millibenchmarks helped us evaluate that. And the macrobenchmarks, the building entire systems, we built a couple of distributed systems that had been verified before—a key value store and a node replication system—to show that you could do them more effectively and with less verification time. We also built some new systems, a verified OS page table, a memory allocator, and a persistent memory append-only log.

TINGLE: Chris, the paper mentions that successfully verifying system software has required—you actually use the word heroic to describe the developer effort. Thinking of those heroes in the developer community and perhaps others, what real-world impact do you expect Verus to have? What kind of gains are we talking about here?

HAWBLITZEL: Yeah, so I think, you know, traditionally verification or this formal software verification that we’re doing has been considered a little bit of a pie-in-the-sky research agenda. Something that people have applied to small research problems but has not necessarily had a real-world impact before. And so I think it’s just, you know, recently, in the last 10 or 15 years, that we started to see a change in this and started to see verified software actually deployed in practice. So on one of our previous projects, we worked on verifying the cryptographic primitives that people use when, say, they browse the web or something and their data is encrypted. So in these cryptographic primitives, there’s a very clear specification for exactly what bytes you’re supposed to produce when you encrypt some data. And the challenge is just writing software that actually performs those operations and does so efficiently. So in one of our previous projects that we worked on called HACL* and EverCrypt, we verified some of the most commonly used and efficient cryptographic primitives for things like encryption and hashing and so on. And these are things that are actually used on a day-to-day basis. So we, kind of, took from that experience that the tools that we’re building are getting ready for prime time here. We can actually verify software that is security critical, reliability critical, and is in use. So some of the things that Jay just mentioned, like verifying, you know, persistent memory storage systems and so on, those are the things that we’re looking at next for software that would really benefit from reliability and where we can formally prove that your data that’s written to disk is read correctly back from disk and not lost during a crash, for example. So that’s the kind of software that we’re looking to verify to try to have a real-world impact.

LORCH: The way I see the real-world impact, is it going to enable Microsoft to deal with a couple of challenges that are severe and increasing in scale? So the first challenge is attackers, and the second challenge is the vast scale at which we operate. There’s a lot of hackers out there with a lot of resources that are trying to get through our defenses, and every bug that we have offers them purchase, and techniques like this, that can get rid of bugs, allow us to deal with that increasing attacker capability. The other challenge we have is scale. We have billions of customers. We have vast amounts of data and compute power. And when you have a bug that you’ve thoroughly tested but then you run it on millions of computers over decades, those rare bugs eventually crop up. So they become a problem, and traditional testing has a lot of difficulty finding those. And this technology, which enables us to reason about the infinite possibilities in a finite amount of time and observe all possible ways that the system can go wrong and make sure that it can deal with them, that enables us to deal with the vast scale that Microsoft operates on today.

HAWBLITZEL: Yeah, and I think this is an important point that differentiates us from testing. Traditionally, you find a bug when you see that bug happen in running software. With formal verification, we’re catching the bugs before you run the software at all. We’re trying to prove that on all possible inputs, on all possible executions of the software, these bugs will not happen, and it’s much cheaper to fix bugs before you’ve deployed the software that has bugs, before attackers have tried to exploit those bugs.

TINGLE: So, Jay, ideally, what would you like our listeners and your fellow SOSP conference attendees to tell their colleagues about Verus? What’s the key takeaway here?

LORCH: I think the key takeaway is that it is possible now to build software without bugs, to build systems code that is going to obey its specification on all possible inputs always. We have that technology. And this is possible now because a lot of technology has advanced to the point where we can use it. So for one thing, there’s advances in programming languages. People are moving from C to Rust. They’ve discovered that you can get the high performance that you want for systems code without having to sacrifice the ability to reason about ownership and lifetimes, concurrency. The other thing that we build on is advances in computer-aided theorem proving. So we can really make compact and quick-to-verify mathematical descriptions of all possible behaviors of a program and get fast answers that allow us to rapidly turn around proof challenges from developers.

TINGLE: Well, finally, Chris, what are some of the open questions or future opportunities for formal software verification research, and what might you and your collaborators tackle next? I heard a few of the things earlier.

HAWBLITZEL: Yes, I think despite, you know, the effort that we and many other researchers have put into trying to make these tools more accessible, trying to make them easier to use, there still is a lot of work to prove a piece of software correct, even with advanced state-of-the-art tools. And so we’re still going to keep trying to push to make that easier. Trying to figure out how to automate the process better. There’s a lot of interest right now in artificial intelligence for trying to help with this, especially if you think about artificial intelligence actually writing software. You ask it to write a piece of software to do a particular task, and it generates some C code or some Rust code or some Java code, and then you hope that that’s correct because it could have generated any sort of code that performs the right thing or does total nonsense. So it would be really great going forward if when we ask AI to develop software, we also expect it to create a proof that the software is correct and does what the user asked for. We’ve started working on some projects, and we found that the AI is not quite there yet for realistic code. It can do small examples this way. But I think this is still a very large challenge going forward that could have a large payoff in the future if we can get AI to develop software and prove that the software is correct.

LORCH: Yeah, I see there’s a lot of synergy between—potential synergy—between AI and verification. Artificial intelligence can solve one of the key challenges of verification, namely making it easy for developers to write that code. And verification can solve one of the key challenges of AI, which is hallucinations, synthesizing code that is not correct, and Verus can verify that that code actually is correct.

TINGLE: Well, Chris Hawblitzel and Jay Lorch, thank you so much for joining us today on the Microsoft Research Podcast to discuss your work on Verus.

[MUSIC]

HAWBLITZEL: Thanks for having us.

LORCH: Thank you.

TINGLE: And to our listeners, we appreciate you, too. If you’d like to learn more about Verus, you’ll find a link to the paper at aka.ms/abstracts or you can read it on the SOSP website. Thanks for tuning in. I’m Amber Tingle, and we hope you’ll join us again for Abstracts.

[MUSIC FADES]

The post Abstracts: November 5, 2024 appeared first on Microsoft Research.

Abstracts: November 4, 2024

Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) — Mon, 04 Nov 2024 17:30:00 +0000

In this episode, Senior Principal Research Manager Shan Lu and Bogdan Stoica, a PhD candidate at the University of Chicago, join host Gretchen Huizinga to discuss “If At First You Don’t Succeed, Try, Try, Again … ? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems.” In the paper, which was accepted at this year’s Symposium on Operating Systems Principles, or SOSP, Lu, Stoica, and their coauthors examine typical retry issues and present techniques that leverage traditional program analysis and large language models to help detect them.

Read the paper

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Today I’m talking to Dr. Shan Lu, a senior principal research manager at Microsoft Research, and Bogdan Stoica, also known as Bo, a doctoral candidate in computer science at the University of Chicago. Shan and Bogdan are coauthors of a paper called “If at First You Don’t Succeed, Try, Try, Again …? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems.” And this paper was presented at this year’s Symposium on Operating Systems Principles, or SOSP. Shan and Bo, thanks for joining us on Abstracts today!

SHAN LU: Thank you.

BOGDAN STOICA: Thanks for having us.

HUIZINGA: Shan, let’s kick things off with you. Give us a brief overview of your paper. What problem or issue does it address, and why should we care about it?

LU: Yeah, so basically from the title, we are looking at retry bugs in software systems. So what retry means is that people may not realize for big software like the ones that run in Microsoft, all kinds of unexpected failures—software failure, hardware failure—may happen. So just to make our software system robust, there’s often a retry mechanism built in. So if something unexpected happens, a task, a request, a job will be re-executed. And what this paper talks about is, it’s actually very difficult to implement this retry mechanism correctly. So in this paper, we do a study to understand what are typical retry problems and we offer a solution to detecting these problems.

HUIZINGA: Bo, this clearly isn’t a new problem. What research does your paper build on, and how does your research challenge or add to it?

STOICA: Right, so retry is a well-known mechanism and is widely used. And retry bugs, in particular, have been identified in other papers as root causes for all sorts of failures but never have been studied as a standalone class of bugs. And what I mean by that, nobody looked into, why is it so difficult to implement retry? What are the symptoms that occur when you don’t implement retry correctly? What are the causes of why developers struggle to implement retry correctly? We built on a few key bug-finding ideas that have been looked at by other papers but never in this context. We use fault injection. We repurpose existing unit tests to trigger this type of bugs as opposed to asking developers to write specialized tests to trigger retry bugs. So we’re, kind of, making the developer’s job easier in a sense. And in this pipeline, we also rely on large language models to augment the program and the code analysis that goes behind the fault injection and the reutilization of existing tests.

HUIZINGA: Have large language models not been utilized much in this arena?

LU: I want to say that, you know, actually this work was started about two years ago. And at that time, large language model was really in its infancy and people just started exploring what large language model can help us in terms of improving software reliability. And our group, and together with, you know, actually same set of authors from Microsoft Research, we actually did some of the first things in a workshop paper just to see what kind of things that we were able to do before like, you know, finding bugs can now be replicated by using large language model.

HUIZINGA: OK …

LU: But at that time, we were not very happy because, you know, just use large language model to do something people were able to do using traditional program analysis, I mean, it seems cool, right, but does not add new functionality. So I would say what is new, at least when we started this project, is we were really thinking, hey, are there anything, right, are there some program analysis, are there some bug finding that we were not able to do using traditional program analysis but actually can be enabled by large language model.

HUIZINGA: Gotcha …

LU: And so that was at, you know, what I feel like was novel at least, you know, when we worked on this. But of course, you know, large language model is a field that is moving so fast. People are, you know, finding new ways to using it every day. So yeah.

HUIZINGA: Right. Well, in your paper, you say that retry functionality is commonly undertested and thus prone to problems slipping into production. Why would it be undertested if it’s such a problem?

STOICA: So testing retry is difficult because what you need is to simulate the systemwide conditions that lead to retry. That often means simulating external transient errors that might happen on the system that runs your application. And to do this during testing and capture this in a small unit test is difficult.

LU: I think, actually, Bogdan said this very well. It’s like, why do we need a retry? It’s, like, when unexpected failure happen, right. And this is, like, something like Bogdan mentioned, like external transient error such as my network card suddenly does not work, right. And this may occur, you know, only for, say, one second and then it goes back on. But this one second may cause some job to fail and need retry. So during normal testing, these kind of unexpected things rarely, rarely happen, if at all, and it’s also difficult to simulate. That’s why it’s just not well tested.

HUIZINGA: Well, Shan, let’s talk about methodology. Talk a bit about how you tackled this work and why you chose the approach you did for this particular problem.

LU: Yeah, so I think this work includes two parts. One is a systematic study. We study several big open-source systems to see whether there are retry-related problems in this real system. Of course there are. And then we did a very systematic categorization to understand the common characteristics. And the second part is about, you know, detecting. And in terms of method, we have used, particularly in the detecting part, we actually used a hybrid of techniques of traditional static program analysis. We used this large language model-enabled program analysis. In this case, imagine we just asked a large language model saying, hey, tell us, are there any retry implemented in this code? If there is, where it is, right. And then we also use, as Bogdan mentioned, we repurposed unit test to help us to execute, you know, the part of code that large language model tell us there may be a retry. And addition to that, we also used fault injection, which means we simulate those transient, external, environmental failures such as network failures that very rarely would occur by itself.

HUIZINGA: Well, Bo, I love the part in every paper where the researchers say, “And what we found was …” So tell us, what did you find?

STOICA: Well, we found that implementing retry is difficult and complex! Not only find new bugs because, yes, that was kind of the end goal of the paper but also try to understand why these bugs are happening. As Shan mentioned, we started this project with a bug study. We looked at retry bugs across eight to 10 applications that are widely popular, widely used, and that the community is actively contributing to them. And the experiences of both users and developers, if we can condense that—what do you think about retries?—is that, yeah, they’re frustrated because it’s a simple mechanism, but there’s so many pitfalls that you have to be aware of. So I think that’s the biggest takeaway. Another takeaway is that when I was thinking about bug-finding tools, I was having this somewhat myopic view of, you know, you instrument at the program statement level, you figure out relationships between different lines of code and anti-patterns, and then you build your tools to find those anti-patterns. Well, with retry, this kind of gets thrown out the window because retry is a mechanism. It’s not just one line of code. It is multiple lines of code that span multiple functions, multiple methods, and multiple files. And you need to think about retry holistically to find these issues. And that’s one of the reasons we used large language models, because traditional static analysis or traditional program analysis cannot capture this. And, you know, large language models turns out to be actually great at this task, and we try to harness the, I would say, fuzzy code comprehension capabilities of large language models to help us find retry bugs.

HUIZINGA: Well, Shan, research findings are important, but real-world impact is the ultimate goal here. So who will this research help most and why?

LU: Yeah, that’s a great question. I would consider several groups of people. One is hopefully, you know, people who actually build, design real systems will find our study interesting. I hope it will resonate with them about those difficulties in implementing retry because we studied a set of systems and there was a little bit of comparison about how different retry mechanisms are actually used in different systems. And you can actually see that, you know, this different mechanism, you know, they have pros and cons, and we have a little bit of, you know, suggestion about what might be good practice. That’s the first group. The second group is, our tool actually did find, I would say, a relatively large number of retry problems in the latest version of every system we tried, and we find these problems, right, by repurposing existing unit tests. So I hope our tool will be used, you know, in the field by, you know, being maybe integrated with future unit testing so that our future system will become more robust. And I guess the third type of, you know, audience I feel like may benefit by reading our work, knowing our work: the people who are thinking about how to use large language model. And as I mentioned, I think a takeaway is large language model can repeat, can replace some of things we were able to do using traditional program analysis and it can do more, right, for those fuzzy code comprehension–related things. Because for traditional program analysis, we need to precisely describe what I want. Like, oh, I need a loop. I need a WRITE statement, right. For large language model, it’s imprecise by nature, and that imprecision sometimes actually match with the type of things we’re looking for.

HUIZINGA: Interesting. Well, both of you have just, sort of, addressed nuggets of this research. And so the question that I normally ask now is, if there’s one thing you want our listeners to take away from the work, what would it be? So let’s give it a try and say, OK, in a sentence or less, if I’m reading this paper and it matters to me, what’s my big takeaway? What is my big “aha” that this research helps me with?

STOICA: So the biggest takeaway of this paper is not to be afraid to integrate large language models in your bug-finding or testing pipelines. And I’m saying this knowing full well how imprecise large language models can be. But as long as you can trust but verify, as long as you have a way of checking what these models are outputting, you can effectively insert them into your testing framework. And I think this paper is showing one use case and bring us closer to, you know, having it integrated more ubiquitously.

HUIZINGA: Well, Shan, let’s finish up with ongoing research challenges and open questions in this field. I think you’ve both alluded to the difficulties that you face. Tell us what’s up next on your research agenda in this field.

LU: Yeah, so for me, personally, I mean, I learned a lot from this project and particularly this idea of leveraging large language model but also as a way to validate its result. I’m actually working on how to leverage large language model to verify the correctness of code, code that may be generated by large language model itself. So it’s not exactly, you know, a follow-up of this work, but I would say at idea, you know, philosophical level, it is something that is along this line of, you know, leverage large language model, leverage its creativity, leverage its … sometimes, you know … leverage its imprecision but has a way, you know, to control it, to verify it. That’s what I’m working on now.

HUIZINGA: Yeah … Bo, you’re finishing up your doctorate. What’s next on your agenda?

STOICA: So we’re thinking of, as Shan mentioned, exploring what large language models can do in this bug-finding/testing arena further and harvesting their imprecision. I think there are a lot of great problems that traditional code analysis has tried to tackle, but it was difficult. So in that regard, we’re looking at performance issues and how large language models can help identify and diagnose those issues because my PhD was mostly focused, up until this point, on correctness. And I think performance inefficiencies are such a wider field and with a lot of exciting problems. And they do have this inherent imprecision and fuzziness to them that also large language models have, so I hope that combining the two imprecisions maybe gives us something a little bit more precise.

HUIZINGA: Well, this is important research and very, very interesting.

[MUSIC]

Shan Lu, Bogdan Stoica, thanks for joining us today. And to our listeners, thanks for tuning in. If you’re interested in learning more about this paper, you can find a link at aka.ms/abstracts. And you can also find it on the SOSP website. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: November 4, 2024 appeared first on Microsoft Research.

Abstracts: September 30, 2024

Alyssa Hughes (2ADAPTIVE LLC dba 2A Consulting) — Mon, 30 Sep 2024 13:02:05 +0000

Members of the research community at Microsoft work continuously to advance their respective fields.  Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Researcher Daniela Massiceti (opens in new tab) and Principal Research Software Development Engineer Martin Grayson (opens in new tab) join host Amber Tingle to discuss the research project and AI-powered tool Find My Things. Find My Things is a personalizable object recognizer that people who are blind or have low vision can train to find personal items from just a few videos of those objects. It was recently recognized as a 2024 Innovation by Design Awards (opens in new tab) finalist in the accessibility design (opens in new tab) and AI categories (opens in new tab) by the US-based business media brand Fast Company and, earlier this year, became available as a feature in the Seeing AI app (opens in new tab).

The Find My Things story is an example of research at Microsoft enhancing Microsoft products and services. To try the Find My Things tool, download the free, publicly available Seeing AI app (opens in new tab).

Try Find My Things

Learn more:

Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision
Publication, May 2024
Understanding Personalized Accessibility through Teachable AI: Designing and Evaluating Find My Things for People who are Blind or Low Vision
Publication, October 2023
Teachable AI Experiences (Tai X)
Project page
Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė
Microsoft Research Podcast, December 2023
ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition
Publication, October 2021
PeopleLens
Publication, June 2021

Transcript

[MUSIC]

AMBER TINGLE: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Amber Tingle. In this series, members of the research community at Microsoft offer a quick snapshot—or a podcast abstract—of their new and noteworthy papers and achievements.

[MUSIC FADES]

Our guests today are Daniela Massiceti and Martin Grayson. Daniela is a senior researcher at Microsoft, and Martin is a software development engineer with the company. They are members of a team creating technology that can be personalized to meet individual needs. Their research project called Find My Things enables people who are blind or have low vision to train an AI system to find their personal items based on a few examples of the objects. Find My Things has now shipped as a new feature within Seeing AI, which is a free app that narrates a person’s surroundings, including nearby people, text, and objects. The team was also recently recognized by the US-based business media brand Fast Company as an Innovation by Design Awards finalist in both the accessible design and artificial intelligence categories. Daniela and Martin, congratulations and thank you so much for joining us today for Abstracts.

MARTIN GRAYSON: Pleasure, thank you.

DANIELA MASSICETI: Thanks very much, Amber. Nice to be here.

TINGLE: So, Daniela, let’s start with a Find My Things overview. What is it, how does it work, and who’s it for?

MASSICETI: I think the best way I can describe Find My Things is a personalizable object recognizer. So when we think about object recognizers in the past, they’ve, kind of, been what I would call generic object recognizers. So they can only really recognize generic things like maybe chairs, desks, tables. But for the blind and low-vision community, who are really key users of object recognition apps and technologies, for them, that’s not quite enough. They need to be able to recognize all of their personal objects and items. So things like their sunglasses, their partner’s sunglasses, um, perhaps their house keys. So a range of these really specific personal objects that generic object recognizers cannot recognize and help them find. And so Find My Things aims to tackle that by being a personalizable object recognizer. A user can essentially teach this object recognizer what their personal items look like, and then the personalized feature can then help them locate those objects at any point in the future. The experience is divided into two phases: a teaching phase and a finding phase. So in a teaching phase, a user would capture four short videos of each of their personal objects, and those videos are then ingested into the app[1], and the machine learning model that sits underneath that app learns what those objects actually look like. And then in the second, finding phase a user at any point in the future can, kind of, say, hey, I want to find my partner’s sunglasses or my sunglasses. And that will initiate this 3D localization experience, which will help guide them with sound and touch cues to that specific object, wherever it is in the room around them.

TINGLE: I’ve heard Find My Things described as a teachable AI system. Daniela alluded to this, but, Martin, break it down a bit more for us. What do you and your collaborators mean when you use the term teachable AI?

GRAYSON: Something you can say about every person is that we’re all unique. Unique in the things that we like, whether that’s music, movies, food; the things we do, whether it’s at home, at work, or in your hobbies; and of course, the things that we have and own and keep with us. The same applies to accessibility. Everyone has their own unique sets of skills and tools that help them get things done, and we have them set up in just the way that matches us. The other day, I, like, came into the office and I sat in my chair, and I realized immediately that it wasn’t right. And of course, somebody had borrowed my desk the previous day and changed the height of my chair, but it was no problem because I could just re-personalize the chair back to my liking. When it comes to tools for accessibility, we think that people should have the same ability to personalize those tools to work the very best way for them. Typically, these have been settings like text size, speech, and color display, but AI has become a more and more important component in those tools. And one way we’re really excited about how to enable that is through teachable AI. So I think for us, teachable AI means that we can take some already really smart AI technology that might have some great general skills, but with a tiny amount of time from a person, that AI can be taught what matters to them and what works for them and become an even better AI to help them get things done.

TINGLE: Describe the origins of this work for our listeners, Daniela. What influenced or inspired the Find My Things pursuit? And how does your work build on or differ from previous work in the accessible technology space?

MASSICETI: Yeah, great question. And this is going to require me to cast my mind back to around four years ago. Our team at Microsoft Research was developing a system called the PeopleLens. So this was a head-mounted camera device that could be worn by people who are blind or low vision, specifically children who are blind or low vision. And it would help them identify or it would describe to them all the people that are around them in their social scenario—where those people were, were those people looking at them. And I think the team realized very quickly that, as Martin was saying there, each person has a really unique need or a unique view of what they actually want described to them about the social environment around them. And so that got us thinking, well, actually, being able to personalize this system is really important. But in complex social environments, personalization is a really hard problem. And so that prompted the team to think, OK, well, we want to study this idea of personalization; let’s try and find almost the simplest possible example of an AI technology with which we could actually deeply explore this space of personalization. And that led us to object recognizers. Object recognizers are, as I mentioned, a very commonly used technology across the blind and low-vision community, and we know that there is a need for personalization there. And so that really prompted or started this journey along personalizable, or teachable, object recognizers, which we then have been working on for the last three or four years to eventually get us to a point now where we’re seeing this feature available in Seeing AI.

TINGLE: Your team identified few-shot learning and the availability of new datasets as keys to this work. Martin, how have those particular advances helped to make Find My Things possible? And are there other approaches you’ve incorporated to make sure that it’s both practical and valuable for people who are blind or have low vision?

GRAYSON: So AI loves data. In fact, data is essential to make AI work. But for AI to work for everyone, it needs to learn from data that somehow represents everyone. The earliest challenge for Find My Things was that people who are blind or low vision don’t often use their cameras to take lots of photos and videos. And this actually gives us two big data gaps. The first is that we don’t have lots of image data that is representative of their own lives, their environments, and their things. And the second is that if you’re someone who’s blind, you may hold your phone differently, or you may use your camera in different ways. And that’s, too, missing from the data, certainly in the established datasets that exist. So new datasets, like ORBIT, have collected thousands of images and videos by members of the blind and low-vision community, specifically of objects and environments that are really important to them. And this means that we’ve now addressed those two big data gaps. And the few-shot part is really important, too. Find My Things is not a general object recognizer. It’s a find my things. We want Find My Things to be able to recognize anything you throw at it—whether it’s your fluffy keyring, your colorful tote bag, or your favorite gadget or toy. However, traditional object detectors, they often need hundreds or thousands of images to learn how to recognize something accurately. Few-shot learning is a super-smart approach that means you only need to trouble our users for a couple of short five-second videos, and then our app will take it from there. Find My Things can use that tiny amount of data and still be able to spot your object from across the room.

Maybe one more thing we did, and this also became so important, was to build and try prototype experiences as soon as we possibly could. And we would try so many models and designs out and then iterate. The team has definitely seen so many videos of me trying to find things around my house. But it’s actually one of the things we’re most proud of in the project, is this, kind of, graveyard of interactive prototypes that have all led us to the final experience.

TINGLE: Daniela, what have you learned from the Find My Things journey that may help the broader research community create more inclusive and more human-centric AI experiences?

MASSICETI: The first one I would say is the importance of doing participatory research. And what that means is really working with the communities that you are developing technologies for. And the second is really learning how to balance this tension between developing something in a research environment and actually deploying that technology in a real-world environment. To jump to the first learning around participatory research, Martin mentioned the ORBIT dataset. The ORBIT dataset was collected in partnership with users who are blind or low vision across both the UK and Canada over the years 2020 to 2021. And it was really important for us to actually engage directly with users who are blind as we were collecting that dataset from them to really understand what they wanted from a personalizable object recognition technology, how they would use their cameras, how they would hold their phones, what kinds of objects they would use this technology to find. And all of that was really, really critical in helping us shape what that dataset ended up being. That dataset became such a pivotal part of the ultimate Find My Things experience. To the second point around this tension between building something in research and deploying something in the real world, I think often as a researcher, we don’t really have to engage with real-world constraints. But of course, when you build a machine learning model or a machine learning system and you want to deploy it in the real world, suddenly those constraints really hit you in the face. And that was exactly the case with Find My Things. I remember quite distinctly in the model development process, we had a number of different models. They were, sort of, ranging up in size in terms of how much memory they would take on a phone to run. And of course, the larger the model was, the more accurate it was. But when we deployed these models of varying sizes onto a phone, we saw that they each had vastly different reactions to being on this phone. And I think if I recall from memory, some of our largest models ended up basically draining the phone’s battery in a couple of minutes, which would mean that the experience would be totally unusable to the user. And so one of the key things we had to do there is really find this sweet spot, or this balance, between what is good enough performance that does not end up, kind of, degrading the actual experience of running this model on a phone.

TINGLE: You mentioned participatory research, and your team’s version feels a little different from what we typically encounter. Talk a little bit more about the citizens who helped you build out this app.

MASSICETI: So these were a group of perhaps eight to 10 users who are blind or low vision who we hosted at Microsoft Research a number of times over the course of the development of the Find My Things experience. And they were … perhaps the best way I can describe them is they were co-designers; they were really helping us design—co-design—what the Find My Things experience ultimately turned out to be. We weren’t coming to them as simply testers of our system. We, kind of, went to them with a blank slate and asked them, well, we have these ideas of what we want to build; what do you think? And from there, we, kind of, iterated upwards and ultimately crafted, co-crafted, the ultimate design of the Find My Things experience, both the teaching part and the finding part.

TINGLE: One of the members of that citizen design team, Karolina Pakėnaitė, visited the Microsoft Research Podcast back in December with your colleague Cecily Morrison. Martin, talk a bit more about how influential citizen designers like Karolina are to this effort.

GRAYSON: There were so many key ideas and innovations that came from the workshops with Karolina and the rest of the citizen design team. Maybe I can share a couple that have had the biggest impact on the app and the experience. So the first was the process of teaching an object. Our testing of AI models showed that collecting videos of objects from different sides and on different backgrounds was critical. So we developed this thing called the drawback technique, where we leaned on the phone’s augmented reality capabilities to make it possible. We’d ask the user to start with the phone right next to their object and then slowly draw it away. This meant that we could track all of the different distances the images were, and the user could really comfortably create a video without leaving their seat. And what’s more, you can do this so easily without even needing to look at the camera. It’s really natural. The second big design innovation came later on when you were actually looking for the thing. We called it the last yard. So many of the lost-item scenarios that we learned about from the citizen designers … they shared with us that they had dropped something in a public space. Their wallet fell out of their pocket as they took their phone out, or they knocked their earbud off the table onto the floor of the train on their way to work. And in both of those moments, the last thing anyone wants to be doing is feeling around on the floor, especially on public transport. So we tested these early versions of Find My Things with the design team, and they would get close to their object, overstep it, and then reach down. And they’d still be feeling around the floor before they found their object, which mostly ended up back behind them. So our last yard design completely changed this. As the user got close to their object, within the last yard, we change the sounds, and the app actually tells them to move down. The phone then responds to the distance to the object exactly like a metal detector. And this meant that when they reached down just at the right moment, they found their object on the floor and it was much easier. No more overstepping. We spent lots of time exploring how the experience and the phone capabilities like AR and AI could work best together, and our citizen design team gave us all of the key insights that led to us coming to these approaches.

TINGLE: So what’s next for Find My Things? I’d like you to share a bit about the opportunities or even the obstacles that exist for more widespread adoption of the teachable AI approach.

GRAYSON: So Find My Things was such a great project to work on. It sat right in the center of the triangle of AI innovation, designing with your community, and of course product impact. We’re taking so much of what we’ve learned during this project and building it into our research going forwards—how we build and evaluate AI, how to engage with the communities that we want to build for, and of course the value of building lots and lots of prototypes. Teachable AI, I think, is going to be a key approach in addressing the challenge for AI working equally well for everyone. The challenge is how do we ensure that we build these new fantastic models on data that gives representation to all that’ll use it. And so often, the people that might benefit the most from innovations in AI might have the smallest representation in data. And our work with people in the blind and low-vision community have really brought that into focus for us. AI can and will be transformational for them, so long as we can make it work just as well for everyone. And then that creates the opportunity: ensuring that these systems and technologies that we design can learn from and build in all of the diverse and wonderful uniqueness of being a human.

MASSICETI: I think one of the things I’m most excited about is unlocking this power of personalization. Hopefully, we’ve convinced you how impactful having personalized AI technologies would be for not only the blind and low-vision community, but for you and I. And so one of the things I’m most excited about is seeing how we can transplant some of these learnings and ideas that we’ve had in building Find My Things into now the generative AI era. And so, yeah, I think I’m really excited to, kind of, bring together these ideas of teachable AI with these new generative models to help really bring to life more useful AI technologies that service not just a small few but all the people across the user distribution.

TINGLE: Daniela and Martin, thank you so much for joining Abstracts today.

MASSICETI: Thank you, Amber.

GRAYSON: Thank you for having us.

[MUSIC]

TINGLE: And thanks to our listeners, too. If you’d like to learn more about Find My Things and teachable AI, visit aka.ms/TeachableAI. Thank you for tuning in. I’m Amber Tingle. Join us next time for more Abstracts.

[MUSIC FADES]

[1] (opens in new tab) The AI-powered tool Find My Things is not a standalone app. It is available as a feature in the Seeing AI app. Download the Seeing AI app (opens in new tab) to try Find My Things.

The post Abstracts: September 30, 2024 appeared first on Microsoft Research.

Abstracts: August 15, 2024

Alyssa Hughes — Fri, 16 Aug 2024 00:29:59 +0000

In this episode, Microsoft Product Manager Shrey Jain and OpenAI Research Scientist Zoë Hitzig join host Amber Tingle to discuss “Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online.” In their paper, Jain, Hitzig, and their coauthors describe how malicious actors can draw on increasingly advanced AI tools to carry out deception, making online deception harder to detect and more harmful. Bringing ideas from cryptography into AI policy conversations, they identify a possible mitigation: a credential that allows its holder to prove they’re a person––not a bot––without sharing any identifying information. This exploratory research reflects a broad range of collaborators from across industry, academia, and the civil sector specializing in areas such as security, digital identity, advocacy, and policy.

Read the paper

Transcript

[MUSIC]

AMBER TINGLE: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research—in brief. I’m Amber Tingle. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Our guests today are Shrey Jain and Zoë Hitzig. Shrey is a product manager at Microsoft, and Zoë is a research scientist at OpenAI. They are two of the corresponding authors on a new paper, “Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online.” This exploratory research comprises multidisciplinary collaborators from across industry, academia, and the civil sector. The paper is available now on arXiv. Shrey and Zoë, thank you so much for joining us, and welcome back to the Microsoft Research Podcast.

SHREY JAIN: Thank you. We’re happy to be back.

ZOË HITZIG: Thanks so much.

TINGLE: Shrey, let’s start with a brief overview of your paper. Why is this research important, and why do you think this is something we should all know about?

JAIN: Malicious actors have been exploiting anonymity as a way to deceive others online. And historically, deception has been viewed as this unfortunate but necessary cost as a way to preserve the internet’s commitment to privacy and unrestricted access to information. And today, AI is changing the way we should think about malicious actors’ ability to be successful in those attacks. It makes it easier to create content that is indistinguishable from human-created content, and it is possible to do so in a way that is only getting cheaper and more accessible. And so this paper aims to address a countermeasure to protect against AI-powered deception at scale while also protecting privacy. And I think the reason why people should care about this problem is for two reasons. One is it can very soon become very logistically annoying to deal with these various different types of scams that can occur. I think we’ve all been susceptible to different types of attacks or scams that, you know, people have had. But now these scams are going to become much more persuasive and effective. And so for various different recovery purposes, it can become very challenging to get access back to your accounts or rebuild your reputation that someone may damage online. But more importantly, there’s also very dangerous things that can happen. Kids might not be safe online anymore. Or our ability to communicate online for democratic processes. A lot of the way in which we shape political views today happens online. And that’s also at risk. And in response to that, we propose in this paper a solution titled personhood credentials. Personhood credentials enable people to prove that they are in fact a real person without revealing anything more about themselves online.

TINGLE: Zoë, walk us through what’s already been done in this field, and what’s your unique contribution to the literature here?

HITZIG: I see us as intervening on two separate bodies of work. And part of what we’re doing in this paper is bringing together those two bodies of work. There’s been absolutely amazing work for decades in cryptography and in security. And what cryptographers have been able to do is to figure out protocols that allow people to prove very specific claims about themselves without revealing their full identity. So when you think about walking into a bar and the bartender asks you to prove that you’re over 21—or over 18, depending on where you are—you typically have to show your full driver’s license. And now that’s revealing a lot of information. It says, you know, where you live, whether you’re an organ donor. It’s revealing a lot of information to that bartender. And online, we don’t know what different service providers are storing about us. So, you know, the bartender might not really care where we live or whether we’re an organ donor. But when we’re signing up for digital services and we have to show a highly revealing credential like a driver’s license just to get access to something, we’re giving over too much information in some sense. And so this one body of literature that we’re really drawing on is a literature in cryptography. The idea that I was talking about there, where you can prove privately just isolated claims about yourself, that’s an idea called an anonymous credential. It allows you to be anonymous with respect to some kind of service provider while still proving a limited claim about yourself, like “I am over 18,” or in the case of personhood credentials, you prove, “I am a person.” So that’s all one body of literature. Then there’s this huge other body of literature and set of conversations happening in policy circles right now around what to do about AI. Huge questions abounding. Shrey and I have written a prior paper called “Contextual Confidence and Generative AI,” which we talked about on this podcast, as well, and in that paper, we offered a framework for thinking about the specific ways that generative AI, sort of, threatens the foundations of our modes of communication online. And we outlined about 16 different solutions that could help us to solve the coming problems that generative AI might bring to our online ecosystems. And what we decided to do in this paper was focus on a set of solutions that we thought are not getting enough attention in those AI and AI policy circles. And so part of what this paper is doing is bringing together these ideas from this long body of work in cryptography into those conversations.

TINGLE: I’d like to know more about your methodology, Shrey. How did your team go about conducting this research?

JAIN: So we had a wide range of collaborators from industry, academia, the civil sector who work on topics of digital identity, privacy, advocacy, security, and AI policy which came together to think about, what is the clearest way in which we can explain what we believe is a countermeasure that can protect against AI-powered deception that, from a technological point of view, there’s already a large body of work that we can reference but from a “how this can be implemented.” Discussing the tradeoffs that various different types of academics and industry leaders are thinking about. Can we communicate that very clearly? And so the methodology here was really about bringing together a wide range of collaborators to really bridge these two bodies of work together and communicate it clearly—not just the technical solutions but also the tradeoffs.

TINGLE: So, Zoë, what are the major findings here, and how are they presented in the paper?

HITZIG: I am an economist by training. Economists love to talk about tradeoffs. You know, when you have some of this, it means you have a little bit less of that. It’s kind of like the whole business of economics. And a key finding of the paper, as I see it, is that we begin with what feels like a tradeoff, which is on the one hand, as Shrey was saying, we want to be able to be anonymous online because that has great benefits. It means we can speak truth to power. It means we can protect civil liberties and invite everyone into online spaces. You know, privacy is a core feature of the internet. And at the same time, the, kind of, other side of the tradeoff that we’re often presented is, well, if you want all that privacy and anonymity, it means that you can’t have accountability. There’s no way of tracking down the bad actors and making sure that they don’t do something bad again. And we’re presented with this tradeoff between anonymity on the one hand and accountability on the other hand. All that is to say, a key finding of this paper, as I see it, is that personhood credentials and more generally this class of anonymous credentials that allow you to prove different pieces of your identity online without revealing your entire identity actually allow you to evade the tradeoff and allow you to, in some sense, have your cake and eat it, too. What it allows us to do is to create some accountability, to put back some way of tracing people’s digital activities to an accountable entity. What we also present in the paper are a number of different, sort of, key challenges that will have to be taken into account in building any kind of system like this. But we present all of that, all of those challenges going forward, as potentially very worth grappling with because of the potential for this, sort of, idea to allow us to preserve the internet’s commitment to privacy, free speech, and anonymity while also creating accountability for harm.

TINGLE: So Zoë mentioned some of these tradeoffs. Let’s talk a little bit more about real-world impact, Shrey. Who benefits most from this work?

JAIN: I think there’s many different people that benefit. One is anyone who’s communicating or doing anything online in that they can have more confidence in their interactions. And it, kind of, builds back on the paper that Zoë and I wrote last year on contextual confidence and generative AI, which is that we want to have confidence in our interactions, and in order to do that, one component is being able to identify who you’re speaking with and also doing it in a privacy-preserving way. And I think another person who benefits is policymakers. I think today, when we think about the language and technologies that are being promoted, this complements a lot of the existing work that’s being done on provenance and watermarking. And I think the ability for those individuals to be successful in their mission, which is creating a safer online space, this work can help guide these individuals to be more effective in their mission in that it highlights a technology that is not currently as discussed comparatively to these other solutions and complements them in order to protect online communication.

HITZIG: You know, social media is flooded with bots, and sometimes the problem with bots is that they’re posting fake content, but other times, the problem with bots is that there are just so many of them and they’re all retweeting each other and it’s very hard to tell what’s real. And so what a personhood credential can do is say, you know, maybe each person is only allowed to have five accounts on a particular social media platform.

TINGLE: So, Shrey, what’s next on your research agenda? Are there lingering questions—I know there are—and key challenges here, and if so, how do you hope to answer them?

JAIN: We believe we’ve aggregated a strong set of industry, academic, and, you know, civil sector collaborators, but we’re only a small subset of the people who are going to be interacting with these systems. And so the first area of next steps is to gather feedback about the proposal of a solution that we’ve had and how can we improve that: are there tradeoffs that we’re missing? Are there technical components that we weren’t thinking as deeply through? And I think there’s a lot of narrow open questions that come out of this. For instance, how do personhood credentials relate to existing laws regarding identity theft or protection laws? In areas where service providers can’t require government IDs, how does that apply to personhood credentials that rely on government IDs? I think that there’s a lot of these open questions that we address in the paper that I think need more experimentation and thinking through but also a lot of empirical work to be done. How do people react to personhood credentials, and does it actually enhance confidence in their interactions online? I think that there’s a lot of open questions on the actual effectiveness of these tools. And so I think there’s a large area of work to be done there, as well.

HITZIG: I’ve been thinking a lot about the early days of the internet. I wasn’t around for that, but I know that every little decision that was made in a very short period of time had incredibly lasting consequences that we’re still dealing with now. There’s an enormous path dependence in every kind of technology. And I feel that right now, we’re in that period of time, the small window where generative AI is this new thing to contend with, and it’s uprooting many of our assumptions about how our systems can work or should work. And I’m trying to think about how to set up those institutions, make these tiny decisions right so that in the future we have a digital architecture that’s really serving the goals that we want it to serve.

[MUSIC]

TINGLE: Very thoughtful. With that, Shrey Jain, Zoë Hitzig, thank you so much for joining us today.

HITZIG: Thank you so much, Amber.

TINGLE: And thanks to our listeners, as well. If you’d like to learn more about Shrey and Zoë’s work on personhood credentials and advanced AI, you’ll find a link to this paper at aka.ms/abstracts, or you can read it on arXiv. Thanks again for tuning in. I’m Amber Tingle, and we hope you’ll join us next time on Abstracts.

[MUSIC FADES]

The post Abstracts: August 15, 2024 appeared first on Microsoft Research.

Abstracts: July 29, 2024

Alyssa Hughes — Mon, 29 Jul 2024 16:18:20 +0000

In this episode, Senior Researcher Li Lyna Zhang (opens in new tab) joins host Gretchen Huizinga to discuss “LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (opens in new tab),” which was accepted at this year’s International Conference on Machine Learning (ICML) (opens in new tab). LongRoPE, a method for increasing the input capabilities of language models, can expand context windows to 2-million-plus tokens while maintaining model performance—no major adjustments to the original model architecture needed. LongRoPE has been integrated into Phi-3 (opens in new tab), a family of small language models developed by Microsoft and available on Microsoft Azure (opens in new tab).

Read the paper

Get the code

Transcript

[MUSIC]

[MUSIC FADES]

My guest today is Dr. Li Lyna Zhang, a senior researcher at Microsoft Research. Dr. Zhang is coauthor of a paper called “LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens.” This paper was featured at this year’s International Conference on Machine Learning, or ICML. Li, thanks so much for joining us today on Abstracts!

LI LYNA ZHANG: Thank you for having me.

HUIZINGA: So let’s start with a brief overview of your paper. Tell us about the issue your research addresses and why it matters.

ZHANG: OK, so this paper is about how to effectively extend the context window of large language models beyond 2 million tokens. Why this is important? Because enabling longer input contexts can improve LLM capabilities. Right now, some LLMs can only handle a limited context window of 4K tokens, which is about 10 pages in a book. With our method, we can push LLM context window to over 2 million tokens. That means you can put all seven Harry Potter books to the LLM and ask any question about this story! Another important thing is that our method is super efficient. It requires minimal changes to the LLM architectures, and most existing optimizations can be reused. Therefore, our method can be easily applied in real production.

HUIZINGA: So it sounds like what you’re working on is improving the memory span of artificial intelligence or large language models. So what’s already been done in this field, and what unique contributions does your work bring?

ZHANG: Well, there has been a lot of work in building long-context LLMs. For example, pretraining with an efficient model architecture, using RAG (retrieval-augmented generation), and extending the context window with RoPE positional interpolation. Our approach uses the last technique. Let me briefly explain it. RoPE stands for rotary positional embedding, which encodes token position information for transformer models. When we pretrain an LLM, we set a context window size, and all token positions have a predefined range of RoPE values. Extending for a longer context window introduces new token positions that can be out of this predefined range, thus leading to out-of-distribution issues and making fine-tuning difficult. RoPE positional interpolation solves this by downscaling positional embeddings to fit within the pretrained range. However, positional embeddings like RoPE exhibit non-uniform information entropy in transformer models. Existing approaches do not effectively handle these non-uniformities during RoPE interpolation, leading to information loss and limiting the context window size. Our method addresses this challenge; therefore, it can achieve the longest context window size.

HUIZINGA: OK, so, Li, how would you describe the methodology you used for this work, and how did you go about conducting the research?

ZHANG: OK. So our method is to interpolate the RoPE positional embedding. It has three main steps. First, we introduce an efficient evolution search algorithm to perform non-uniform RoPE positional interpolation. Second, we propose progressive context window extension strategy. It begins by searching for a 256K length on the pretrained LLM and fine-tuning it at this length. Then, based on the fine-tuned 256K LLM, we did a second search for new RoPE interpolations to achieve 2048K context window size. Finally, since long-context LLMs will drop performance at its original context window, we readjusted the non-uniform positional interpolation at a 4K length to recover the short-context-window performance.

HUIZINGA: Let’s talk about findings. Tell us how things worked out for you and what you found as a result of your experiments.

ZHANG: Yeah. Our study verified two important non-uniformities in LLM context window extension. We identified that lower RoPE dimensions and initial token positions require less interpolation because they contain crucial and high-frequency information. Higher RoPE dimensions require more interpolation because these are sparse and low-frequency information.

HUIZINGA: So work in the lab is always interesting, but deployment in real-world settings is often another story. If everything is successful, Li, who benefits most from your LongRoPE research?

ZHANG: Well, our work significantly improves LLM’s capabilities to handle long context in real-world applications, such as long-context retrieval, code debugging, and even multi-modality LLM applications. Moreover, our method achieves this with minimal modifications to the RoPE positional embedding. Therefore, it can be widely applied to production. We have integrated LongRoPE into Microsoft Phi-3 128K family, which are the first long-context LLMs in its class. Before LongRoPE, Phi models have only 2K context window.

HUIZINGA: So who is your primary user?

ZHANG: I think any users who want to use the long-context LLMs, they can be our audience.

HUIZINGA: So it’s a wide audience.

ZHANG: Yeah, it’s a wide audience.

HUIZINGA: It’s about now that I always ask the “golden nugget” question. If you wanted to leave our listeners with one key takeaway from this research, what would it be?

ZHANG: Well, if there’s one key takeaway from our work, it must be our key findings that non-uniformities in rotary positional embedding are crucial for LLM context window extension. And if you want to build a high-quality long-context LLM, LongRoPE is all you need to know!

HUIZINGA: Talk about what’s left to do in this field in terms of open questions and outstanding challenges. What’s next on your research agenda, Li?

ZHANG: So far, there are still a couple of big questions in this field. First, it’s challenging to achieve both strong long and short capabilities at the same time. Although we have managed to recover some of the short performance for long-context LLM, it has not recovered 100 percent. We are trying different approaches to close these gaps. Second, we want to figure out how we can use these long-context LLMs to solve more challenging tasks, and then we can push this model to work harder and smarter for us.

[MUSIC]

HUIZINGA: Well, Li Lyna Zhang, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts, or you can find it on arXiv. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: July 29, 2024 appeared first on Microsoft Research.

Abstracts: July 18, 2024

Alyssa Hughes — Thu, 18 Jul 2024 13:00:00 +0000

In this episode, Senior Researcher Arindam Mitra joins host Gretchen Huizinga to discuss “AgentInstruct: Toward Generative Teaching with Agentic Flows.” In their paper, Mitra and his coauthors introduce an automated multi-agent framework for creating diverse, high-quality synthetic data at scale for language model post-training. In contrast to methods that create data from a seed set of existing prompts and responses, AgentInstruct uses raw data and specifications provided by model builders. The work—which post-trains a model, Orca-3, on AgentInstruct-generated data—is part of project Orca. Orca aims to develop techniques for creating small language models that can perform as well as large language models. Like Orca-3, the earlier Orca, Orca-2, and Orca-Math models show the effectiveness of leveraging synthetic data in training.

Read the paper

Learn more:

Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Publication | June 2023
Orca-2: Teaching Small Language Models How to Reason
Publication | November 2023
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Publication | February 2024

Transcript

[MUSIC PLAYS]

[MUSIC FADES]

I’m here today with Dr. Arindam Mitra, a senior researcher at Microsoft Research and the lead researcher for Microsoft’s Orca project. Dr. Mitra is coauthor of a paper called “AgentInstruct: Toward Generative Teaching with Agentic Flows.” Arindam, it’s a pleasure to have you on Abstracts today.

ARINDAM MITRA: Thank you, Gretchen.

HUIZINGA: So let’s start with a brief overview of your paper. What problem does your research address, and why does it matter?

MITRA: So the post-training phase is very important for language models. You can really improve the model a lot by creating high-quality synthetic data. The problem is, however, though, high-quality synthetic data creation requires lots of human effort and expertise. The problem that we’re trying to tackle is, how do you reduce human effort? How can you create high-quality data with really low amount of human effort? When you have a language model and, let’s say, you want to apply it somewhere, you might have to train a generic model before. Which could be small or big. Doesn’t matter. After that, you can specialize it on the domain that you are looking for, and when you want to do that—to make it really fast, this particular process—it’s best if you go for synthetic data. If you have a way to, actually, generate very high-quality synthetic data, you can fast-track this part of specialization process. Not only single model. So this year, you’re going to see a lot more multi-agent models. And when you are trying to build these multi-agent models, you’re fearing like, OK, it might increase the cost too much, the latency too much. So it’s also very much important that you have a multi-agent system and you can, sort of, replace some of those agents with specialized small models. And when you’re trying to address these goals, you want this process to be something which you know works fast. So that’s why we are trying to make sure we have a very good way to create synthetic data for your specific need.

HUIZINGA: No research exists in a vacuum, and most of it fills some kind of a gap. So tell us what’s already been done in this field and how this work is building on it.

MITRA: So previously, actually, we have seen that in post-training, the more data you have, the better the performance goes for the model you’re training. So what we wanted to test is how much we can scale and what happens if we scale a lot and lot. But we didn’t have the tools for it. So the other approaches people previously used was you had a small set of data and how do we expand this dataset into much larger and larger amount of data. That’s where people were mostly focusing. But it’s not that easy to create that initial seed set. [LAUGHTER] You need to be very expert. The way that we’re doing is, actually, rather you define what you want to create. Like, OK, you want to create tool-use data. So you say, OK, I have a bunch of tools, and I am looking for data in the scenarios where someone can just come give me a description and then maybe that person interact with the AI to figure out how to get the job done. It’s not a one-step thing. And maybe you also have a setting where it’s more like an app developer. You have a bunch of APIs in your phone. You just want to figure out which one is best for the user request, which came through voice command. So different scenarios could be there. So what we’re saying [is], OK, we are not going through the method where you have to come up with your initial own seed data and then we expand. It is more like you define what you want to do. It’s much more abstract. And then, we are, sort of, automating the effort of data creation. So this setting actually of synthetic data creation, we are referring [to] it as generative teaching, and that’s where we are, sort of, differing. So previously, it was more like expansion, and now we are trying from specification to the data that you need.

HUIZINGA: Gotcha. Well talk a little bit more about your methodology and how you went about conducting this research.

MITRA: So first of all, what we are proposing actually is a multi-agent solution. So you start with first describing what you really need. So you describe in detail, like, I need data for this specific skill or this specific scenario. Then, what we do is like, OK, you have some unstructured data or raw data like text documents or code files that you gather from web with permissible license or use something that you own. We don’t care much about what the content is really. So it’s more like we got some random stuff, some random content. And then we’ll guide you how to convert this random something which is not meaningful for you into something which is meaningful for your data creation. For example, like, if you are creating data to teach how to use APIs, you might think about, you need lots of APIs and how do you get these APIs. So what we are saying is, like, we can take something like code and we’ll have agents which will convert these raw code files into list of APIs which is more like a library. So you create automatically this input that is very meaningful for data creation. And then once we have that, we have basically the seed instruction creation step based on your specification. Like, what do you want to create data for? So you have all these different scenarios, and we have multiple agents creating data for different scenarios. And then the last step is actually what we call refinement step. So it’s more like whatever data you created, we’ll go through them and we’ll make them better and better—improve the quality, improve the complexity, improve the trickiness, we’ll teach when not to answer, etc., etc. So make sure we cover the whole space. So by changing the stochastic seed, we are trying to cover the entire possible data space.

HUIZINGA: Right.

MITRA: So that’s the key thing. The way we, sort of, conducted this research is actually we defined 17 skills. Skills meaning reading comprehension, tool use, text modification, content creation, RAG (retrieval-augmented generation) … we have, like, list of 17 skills … conversation … and then we created one multi-agent flow for each of the skills and we generate data. So one key thing I want to highlight is, like, this work, compared to other work, it was not benchmark driven. We want to teach a skill. We don’t care which benchmarks we’re trying to evaluate it on. So we define the skill, like tool use means this to us, reading comprehension means this to us, text modification means this to us. And then we, sort of, generate the data to teach everything for that skill. And then what we did, we created actually 22 million instructions. And we had previously in Orca series, we had 3 million, around, instructions. So the 25 million is what we, sort of, have at the end. And that’s where we actually trained a Mistral model as of now. And we’re going to measure, like, how much we improve the Mistral model by this post-training.

HUIZINGA: Moving from methods to findings, I always look forward to the part of the research paper that finishes the sentence “and what we found was … ,” so give us a quick overview of your results. What did you find?

MITRA: Yes, so the results were actually very exciting for us. So Mistral 7B was our main, sort of, baseline because that’s where we’re trying to showcase, like, how much improvement we are getting. On the other side, we have, like, frontier models—ChatGPT, GPT-4. We want to also measure how far we are from those frontier models, so that’s, sort of, our evaluation setup. So on average actually, we got like 20 percent performance gain over the Mistral, and we evaluated that across 14 benchmarks that test reasoning, content creation, instruction following, format following, etc. But what was more important to us was to do a skill-specific evaluation because we are trying to teach certain skills, and we had, like, 17 skills as we mentioned earlier. So, for example, like, if you are focusing on reading comprehension as a skill, we took LSAT, SAT, and DROP, and many other benchmarks; we created a collection of reading comprehension-based benchmark. And there, we are observing, like, 20 percent improvement over Mistral, and what it means, like, we’re actually achieving GPT-4–level performance. Similarly, if I’m focusing on math skill, there are many datasets which test, like, elementary math, high school math, college-level math. And we improved actually across all these different levels of math. So we see from 40 percent to 150 percent of improvement on different benchmarks of math. So it was more like what we wanted to see. We’re not optimizing for a particular benchmark. We wanted to optimize the skill, and that’s what you’re observing. So you’re observing improvement in math across all these levels, from elementary to high school to college to middle school, etc., everything. The same goes for RAG, as well. We’re observing on RAG skill 92 percent, around, improvement over Mistral. The format following numbers are pretty interesting to us. So format following is very important for SLMs (small language models). You want to make these models practical. You want to make sure that they follow the format so you can parse the result. And we were able to take Mistral beyond Gemini Pro. So that was a very strong performance from the post-training that we did. For summarization, actually we were able to reduce the hallucination rate by 31 percent while achieving the GPT-4–level quality. So overall, all these results were, sort of, highlighting that the methodology that we have, which we’re calling AgentInstruct, is very promising.

HUIZINGA: I think it’s important to get practical and talk about real-world impact. So tell us who you think this research will benefit most and why.

MITRA: Yeah, so again the model builders will, sort of, find it most beneficial. So the significance of our work actually lies in the way we are trying to revolutionize the language model development through scalable, low-effort synthetic creation. And the scalable and low effort is, sort of, the key thing, right. We have shown that we can create very high-quality data. That’s what the numbers are telling us. We want to mention that this is very scalable and low effort, and that’s what we think might help the most for model builders.

HUIZINGA: So, Arindam, let’s borrow a phrase from the machine learning lexicon and go for a little one-shot learning here: if you had to boil down why your work is important, what’s the one thing you want our listeners to take away from this research?

MITRA: The key takeaway would be, like, the AgentInstruct method enables the generation of vast, diverse, and high-quality synthetic data with very minimal human input. So that’s one thing I would, like, to remember from this paper.

HUIZINGA: So as we close, talk briefly about the limitations that you encountered in this project and directions for future research. What are the outstanding challenges in this field, and what’s on your research agenda to overcome them?

MITRA: Yes, so we’re exploring further automation. But apart from making this data creation more automated and less human involvement needed, we’re trying to focus on two other aspects. One is automated model debugging, and the other is automated model repairing. So now that we have the ability to generate data for a particular skill, let’s say math, for model debugging, what we need is basically an error handler. Like something we can plug in which takes the question and the answer coming from a different model and verifies if the answer is correct or not. So that’s the part we’re working on right now, figuring out this error handler. And the second aspect is repairing. So once we have the error, we figure out, OK, this is where the model is struggling. How can we give feedback or how can we give more knowledge so it can basically correct those errors? So those are some things we’re working on right now.

[MUSIC PLAYS]

HUIZINGA: Well, Arindam Mitra, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts, or you can find a preprint on arXiv. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: July 18, 2024 appeared first on Microsoft Research.

Abstracts: May 20, 2024

Brenda Potts — Mon, 20 May 2024 20:15:09 +0000

In this episode, Principal Research Manager Andrey Kolobov joins host Gretchen Huizinga to discuss “WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small uncrewed aerial vehicle,” or sUAV. sUAVs can fly farther and more safely if they can reason about the terrain-affected wind in their vicinity. Traditional wind predictions ignore small-terrain features and work at the scale of hours and miles, far too coarsely for sUAVs. WindSeer can estimate the terrain-dependent wind field around an sUAV in flight, with limited onboard compute and measurement data, paving the way for safer and more energy-efficient autonomous drone operation.

Read the paper

Get the code

Learn More:

WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small UAV (opens in new tab)

Transcript

[MUSIC]

[MUSIC FADES]

I’m here today with Dr. Andrey Kolobov, a principal research manager at Microsoft Research. Dr. Kolobov is coauthor of a paper called “WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small uncrewed aerial vehicle,” otherwise known as an sUAV. Andrey Kolobov, great to have you on Abstracts!

ANDREY KOLOBOV: Thank you for having me!

HUIZINGA: So let’s start with a sort of abstract of your abstract. In just a few sentences, tell us about the problem your research addresses and more importantly, why we should care about it.

KOLOBOV: Right, so the overarching goal of this work—and I have to thank my collaborators from ETH Zürich, without whom this work would have been impossible—so the overarching goal of our work was to give drones the ability to stay aloft longer, safer, and cover larger distances. The reason why this is important is because drones’ potential for, for instance, quick delivery of small goods has long been understood, but in practice, their usefulness has been limited by the time they can spend in the air, by how quickly they drain their battery. And lifting these limitations brings the reality of getting the stuff that you order on the internet delivered to you quickly by drones closer.

HUIZINGA: Is that the core problem, is drone delivery?

KOLOBOV: Of course, when we were starting this project, we were not interested in any one application. We were interested in implications of AI for drone flight. The limitations of drones’ time aloft ultimately come from drone flight technology, which is very well established, very well understood, and ultimately relies on drones actively fighting forces of nature, such as gravity and wind, and because of this draining their batteries quickly. So within the framework of that technology, it’s difficult to get around these limitations. So what we’re aiming to show is that using AI, drones can reason about their environment in ways that allow them to embrace these forces of nature rather than actively fight them and thereby save a lot on energy and increase their time in the air.

HUIZINGA: Right, so are we conflating drones with sUAVs, as it were, small uncrewed aerial vehicle?

KOLOBOV: Yes, this work, we are somewhat conflating them, but this work focused specifically on small UAVs, small drones, because these drones’ ability to fight forces of nature is quite limited. Their battery life is way more limited than that of larger drones, and for them, this work is especially important.

HUIZINGA: OK, and I’m assuming it’s not a new problem and also assuming that you’re not entering a field with no previous research! [LAUGHTER] So what’s been done in this area before, and what gap in the literature or the practice does your research fill?

KOLOBOV: Yeah, of course. Certainly, many other very, very smart people have thought about this area. What we have tried doing and what we have accomplished differs from previous efforts in how much compute, how little data at inference time, our method requires and also the fine scale at which it makes its predictions. Obviously, there are weather models that model various aspects of the atmosphere, and they can predict wind, but they can do this at the scales of hours, at spatial scales of tens of miles, which is way too crude to be useful for drone flights at low altitudes. And also, these models do this at much higher altitudes, not where drones fly close to the ground, where it’s very important for them to know about wind to avoid collision with terrain potentially, but very high up in the air. The tool that could solve the same problem that we were trying to solve conceptually are computational fluid dynamics simulations, so-called CFD simulations. However, they’re very expensive. They cannot run on the drone. And so if you want the drone to be fully autonomous, they’re not really a feasible solution.

HUIZINGA: So how would you describe then how you attacked this problem? What methodology did you use for this work, and how did you go about conducting the research?

KOLOBOV: So one thing that people reading about this work might find funny is this déjà vu feeling of seeing the overarching technical insight that we had in a completely different context, in the context of training models such as Phi, Microsoft’s Phi. The reason why it’s funny is because we were trying to solve an entirely different problem in a project that started in a different era, research era, in the pre-large model era, and yet we came up with something quite similar. And this overarching technical insight is this: if you want to build a small but powerful model, one way of doing this is to find a powerful but potentially computationally expensive—or expensive in some other way—generative data source, generate data from that source in a very carefully controlled manner, and use this carefully constructed dataset to train your model. This is exactly what we did. In our case, this powerful but expensive generative data source were the computational fluid dynamic simulations, which we used in combination with 3D terrain maps that are publicly available on the internet to generate a lot of high-quality data, throw in a few more tricks, and get the model that we wanted.

HUIZINGA: Can you talk about the “few more tricks”? [LAUGHS]

KOLOBOV: [LAUGHS] Well, so we needed to train this model to make predictions based on very little data. Computational fluid dynamics simulations typically need a lot of data at prediction time. And so the so-called boundary conditions essentially need to know the wind at many locations in order to be able to predict it at the location that you’re interested in. And so we had to structure the data generation in a way that allowed us to avoid this limitation.

HUIZINGA: Talk to me a little bit more about the datasets that you used.

KOLOBOV: Yes, so all the data was synthetically generated.

HUIZINGA: All of it?

KOLOBOV: All of it! All of it was generated from computational fluid dynamics simulations.

HUIZINGA: Um, and was this methodology unique and new, or is it, uh, kind of building on other ways of doing things?

KOLOBOV: So the idea of using high-quality data sources under various guises had been known in the community, to various research communities in any case. Some would refer to it as distillation. Some would refer to it as data simulation. So in the context of these predictive weather models, it would be known as data simulation. But none of them were doing what we were trying to do, again which is getting a model that will make predictions on a very limited compute with a very limited amount of data at inference time.

HUIZINGA: Well, let’s move from research methods to research findings. Give us a quick overview of how things worked out for you and what you found.

KOLOBOV: So in a nutshell, as trivial as it sounds, the surprising finding was that it works! [LAUGHTER] Again, the reason why it’s surprising is, again, we used only synthetic data to predict something very, very real and something that people have put a lot of thinking into modeling as part of weather models, for instance. And it turned out that using just synthetic data, you can get a small model that, as the drone is flying through the air and as it’s measuring wind at its current location, this model allows you to predict that there is a downdraft 300 feet away from the drone on the other side of the hill. It’s just amazing that something so small can do something so complex and powerful.

HUIZINGA: Right. Well, let’s drill in there and, kind of, talk about real-world impact here because this is really important for a lot of wind-prediction scenarios. How does this impact real-world scenarios? Who benefits most from the kinds of applications that you might get from this?

KOLOBOV: Yeah, so there is a number of scenarios where it’s valuable to have a drone—usually a fixed-wing drone that, due to its inherent characteristics, can stay in the air longer than a copter drone—where it’s beneficial to have such a drone stay in the air for long periods of time, silently observing something. So the applications range from agriculture to environment conservation, where you want to track the movements, migrations of animals, to security. And of course, the technology that we develop does not have to be applied to fixed-wing drones. It can also be applied to copter drones, which is the drone model that is usually considered for use in drone delivery, and those drones can also benefit from it, especially in city conditions, where presumably they will have to fly around skyscrapers and take into account the effects that the skyscrapers and other buildings and structures have on the wind near terrain.

HUIZINGA: So one more question on the real-world impact. In your paper, you talked a little bit about wind farming and other places where understanding how wind works and being able to predict it matters. Is that one? Are there others?

KOLOBOV: It for sure is one area. Again, in this work, we focused mostly on applications of wind prediction that have to do with drones.

HUIZINGA: OK.

KOLOBOV: Besides time aloft, one application is safety. In many places around rough terrain, you know, in the mountains, predicting wind, predicting downdrafts and updrafts, has safety implications because drones fly so close to terrain, and the winds, the airflow, can be so strong in some places over such terrain that it can basically drag the drone into the ground no matter what [the] drone does. It can do it very, very quickly. So again, predicting such phenomena there becomes a matter of drone safety. The same applies, or will apply, in city conditions, where drones will be flying among buildings and wind can be so strong that it can carry a drone into a building or into another obstacle.

HUIZINGA: Well, I assume you didn’t solve everything with this paper and that there might still be some open questions remaining in the field! So what are some of the big outstanding challenges people still face here, and what’s next on your research agenda to overcome them?

KOLOBOV: Of course, this work is, in some sense, just the beginning. This work is about helping drones make sense of the environment around them. But this ability to make sense is not by itself useful without drones being able to use the results of this estimation in order to plan how to fly in a safer and more energy-efficient way and to adapt their plans as the environment around them changes. So this is a natural next steps: have drones take their predictions into account when planning their actions.

HUIZINGA: Well, Andrey Kolobov, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab) or you can find one on arXiv. You can also read it on Nature Communications in Volume 15, April 25. See you next time on Abstracts!

[MUSIC]

The post Abstracts: May 20, 2024 appeared first on Microsoft Research.

Abstracts: May 6, 2024

Alyssa Hughes — Mon, 06 May 2024 13:00:00 +0000

In this episode, Senior Principal Researcher Michel Galley joins host Gretchen Huizinga to discuss “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts,” which was accepted at the 2024 International Conference on Learning Representations (ICLR). MathVista, an open-source benchmark, combines new and existing data to measure how good models are at solving a variety of math problems that involve processing images as well as text, helping to gain insight into their reasoning capabilities.

Read the paper

Get the code & dataset

Transcript

[MUSIC]

[MUSIC FADES]

My guest today is Dr. Michel Galley, a senior principal researcher at Microsoft Research. Dr. Galley is the coauthor of a paper called “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts.” Michel, thanks for joining us on Abstracts today!

MICHEL GALLEY: Thank you for having me.

HUIZINGA: So I like to start with a distillation or sort of an elevator pitch of your research. Tell us in just a couple sentences what problem or issue your paper addresses and why we should care about it.

GALLEY: So this paper is about evaluating large foundation models. So it’s a very important part of researching large language models because it’s a good way to evaluate, kind of, the capabilities—what these models are good at and not good at. And a part of the focus of MathVista is to evaluate these large foundation models in a multimodal setup, so when the input to the model is actually not just text but also text and images. And then, an example of a task that such a model would perform is, like, the input is maybe a mathematical question, and then there’s some visual support to that question, let’s say, of an image of a graph, and then the model has to respond to something related to that. And why this is important … there has been a lot of work, of course, on large foundation model. Especially when it comes to reasoning tasks, like mathematical reasoning, a lot has focused more on written form.

HUIZINGA: Yeah …

GALLEY: So MathVista is one of the very first datasets that has input that is both images and text.

HUIZINGA: Yeah, yeah. Well, reading your paper, it seems like this is an area that hasn’t been studied systematically. In fact, you actually say that! And say that the field is largely unexplored. But quickly tell us what has been done in this field, and then tell us how your research addresses the proverbial gap in the literature.

GALLEY: Well, there has been a lot of work on vision and language in other problems, like not just about reasoning. Maybe let me just mention why reasoning is important. So one reason I think it’s very interesting to evaluate these large language models in terms of reasoning skill is that we evaluate their capabilities beyond just memorization. So as many of your listeners probably know, these large foundation models are trained on large amounts of text that is public data from various sources. So when you ask a question to a large foundation model, it could be the case, in many cases, that it just memorizes things it has seen in the data.

HUIZINGA: Sure.

GALLEY: So what makes it interesting in terms of reasoning, the answer oftentimes is not there in the data. So it needs to develop this ability to connect the dots between various pieces of information to come up with a new answer. So the focus of our paper is really on mathematical reasoning, but it goes also a bit beyond that because what is also represented in the data is also science question and so on.

HUIZINGA: Yeah …

GALLEY: So this reasoning part has largely focused, until MathVista, on text-only modalities.

HUIZINGA: Yeah …

GALLEY: So it’s one of our very first ones that combines text and images in terms of evaluating these large foundation models. So you ask about what was done before. So, yes, there has been a lot of work, text only, on reasoning, for example, the mathematical question that’s just based on text. And there has been a different stream of work that was much more focused on vision. A lot of work has been on tasks such as visual question answering …

HUIZINGA: Yeah …

GALLEY: … where basically, you have an image and the question is about answer a question about this image. So, yes, we’re trying to fuse the two lines of research here.

HUIZINGA: Right …

GALLEY: And that’s one of the first works that does that.

HUIZINGA: Yeah. Well, let’s talk about your methodology for a minute. Tell us how you went about conducting this research, and what methods did you use?

GALLEY: Yes, sure. So that’s a bit different from a typical, kind of, machine learning paper because the focus on this work is really on benchmarking on the dataset. So the methodology is more about how we collect the data, process it. So they have two components to doing that. One was to look at existing data that already combines vision and text. And there are existing datasets that are actually already fairly big but that were not focused on reasoning. So we use those existing datasets and look for instances in the data that actually include some mathematical or science reasoning. And so that part is leveraging existing datasets, but the important part is, like, we really want to carve out what was interesting piece in terms of reasoning. And we had different stages of processing the data to identify the subset that was reasoning-based. So one first step was basically to apply some automatic filter to determine whether or not a given example, let’s say something that is visual and text, is actually … involves some mathematical reasoning. So we have different strategy. For example, if the answer is numerical, it’s likely that it might be something mathematically related. But that’s just the first stage. And the second stage, we actually had humans, annotators, just certify that the selected data is actually of high quality. So we do have an example of, “Oh, this is mathematical, and that’s either mathematical or scientific,” and so on. And that’s one part of the effort. The other part is that we realized while we collected the data, there are certain types of mathematical reasoning or related to mathematical reasoning that were not represented in the data. So we created three new datasets as part of MathVista. So when I said dataset, it’s more like, think of MathVista as like an aggregate of different types of data, and we added three of them, three new types of data. One is what you call PaperQA, which is basically data that is collected from scientific papers on arXiv, and that had questions asking about that paper and that included some visual components from the paper, typically a plot or a figure.

HUIZINGA: Yeah …

GALLEY: And then we had IQTest, which is basically, I mean, it’s vaguely related mathematically, but basically it also, kind of, tried to see maybe more abstractive thinking about maybe some input that is both text and visual. And the final is about FunctionQA, that is basically algebraic reasoning and function plots and so on.

HUIZINGA: OK …

GALLEY: The important part was actually to identify among vast amounts of data what is actually very interesting in terms of mathematical reasoning.

HUIZINGA: Yeah …

GALLEY: So that part, I think, was quite a big part of doing that work—finding existing data but also creating new data.

HUIZINGA: Yeah, yeah. Well, my favorite part of a research paper is where it says, “and what we found was … ,” so talk a little bit about your results. What did you find?

GALLEY: So we evaluated a wide variety of models, including GPT-4, Claude 2, GPT-4V, multimodal Bard, and LLaVA, and we categorized them into three categories. So one is text only. So, basically, you take a model that is by default just text, and we give it the text part of the question and ask it to answer the question. Of course, that’s, kind of, a bit of a, it’s a difficult task because oftentimes [LAUGHTER] we crucially build these questions so that you have to rely on the vision part. But that’s for, you know, scientific investigation to know how well they can do, and so that’s one category of model. A different category is still text only but that is given the detection from the image. So on the image, we do OCR. So we convert those words from images to text. It’s kind of an extension of the text-based model, except that what was images is translated into text, and then the input to the model is word only, and that’s a different category of model. And the third one is basically truly multimodal model. And what we found, I mean, not surprisingly, it’s, kind of, the one that was doing most poorly is the one that is text only. The second is text plus OCR. And then finally, the one that does best is the multimodal like GPT-4V. But while the ordering between these three categories makes sense, it was a bit surprising that maybe the gap between multimodal and text plus OCR was not bigger. Well, it’s big, but maybe not as big as we were expecting. So, for example, the best detection from the images model achieved like 35 percent accuracy while GPT-4V was 50 percent. So it’s a substantial gap but not huge.

HUIZINGA: Right. Just to clarify, you’re saying OCR. What does that stand for?

GALLEY: [Optical] character recognition.

HUIZINGA: Gotcha.

GALLEY: So, basically, it’s the task of taking text, sometimes typed, but sometimes written, and convert this into the actual text like you would have in a text file.

HUIZINGA: Right. Michel, does any of this have to do with the difficulty of the math problems that you present these models with? I mean, it seems to me, similar to humans, that the easier the problem, the easier it would be for the machine. So at what level of math are we talking for these tests?

GALLEY: What’s nice about MathVista is there’s continuum [of] different difficulties. So the spectrum is quite broad, going from elementary school to more advanced concepts such as calculus. So it’s quite broad. So in the paper, we do have this, kind of, broken down by level. So the number I gave you, like 50 percent, is an aggregate over all the difficulties. But …

HUIZINGA: Gotcha.

GALLEY: But the goal there was really, kind of, to compare different models, but we do have a fair amount of analysis in the appendix. Actually, we have 100 pages of appendices of plenty of analysis and so on. So if people, I mean …

HUIZINGA: I saw that. I saw the length of the paper, and I’m going, what? [LAUGHS] That’s a LONG paper! Well, research in the lab is one thing, I always like to say, but understanding real-world impact is important, too. So where’s this work going to make the most difference, and who does it help most at this point?

GALLEY: Well, I think perhaps that’s the main point of this kind of line of work in terms of reasoning is that when looking at this difficult problem that are mathematical, actually it’s a way to, kind of, abstract away maybe more complex capabilities, and I think while thinking just about mathematics might seem a bit narrow, I don’t think that really is. It’s more about seeing whether this model has the ability to do, kind of, multistep kind of processing of your input and think maybe somewhat intelligently about a given problem. So we focus mostly on math. There is some science, but we would be very interested, especially in future work, to, kind of, go beyond that.

HUIZINGA: OK, well, let me press in a little bit there because … just say I’m a regular person using a GPT model. Is your work more addressed upstream from that to the research community to say, how do we get these models to be better so that downstream people like me can be more confident of the models?

GALLEY: Yes, I would say at the moment, I mean, this line of work is perhaps more geared towards somewhat more research community, but I think it could be some seed for researchers to think about some applications perhaps that also requires some kind of step-by-step reasoning but perhaps not going beyond math.

HUIZINGA: Yeah. Michel, if there was one thing you wanted our listeners to take away from this research, kind of golden nugget, what would it be?

GALLEY: Well, I would say it’s the challenging part of these datasets. I think that’s what makes MathVista stand out compared to other datasets. By now, there are a few other vision and language datasets, and of course, many that are more text-based. And we’ve seen, for example, some recent papers showing that actually MathVista remains one of the most challenging ones. So I think it’s probably going to stay around for a while because of the difficulty it represents. So it’s open source of available datasets that everybody can use, and I very much encourage people to use it.

HUIZINGA: Is it on GitHub?

GALLEY: Yes, it’s on GitHub.

HUIZINGA: So what’s next on the research agenda for helping LLMs get better at math, Michel? What are the big challenges in the field yet? I mean, you’ve alluded to many of them already, sort of, but what’s next on your research agenda?

GALLEY: Well, I would say what we found so far is these models are very good at processing the textual part of problems it’s given, to the model, but you have the equivalent in images actually harder somehow. So I think a lot more work needs to be done in terms of vision capabilities, in terms of reasoning over images, because the capabilities you will see in text are actually quite advanced, whereas the equivalent in images doesn’t seem that good. I mean, a fair disclaimer: my background is more on the text side, [LAUGHTER] so some of my colleagues on the paper are more on the vision side, so maybe if a listener maybe run into some of our coauthors at the conference, they might want to talk to these vision people because that’s less of my background. [LAUGHS]

HUIZINGA: Well, and if you think about Venn diagrams, you know, you’ve got people that are doing text, people that are doing vision, and then the people that are trying to do both to see how the worlds collide.

[MUSIC]

Well, Michel Galley, thanks for joining us today. And to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab), or you can find it on arXiv. You can also read it on the website for the International Conference on Learning Representations, or ICLR. And if you happen to be at the ICLR conference this week, you can hear more about it there. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: May 6, 2024 appeared first on Microsoft Research.

Abstracts: April 16, 2024

Alyssa Hughes — Tue, 16 Apr 2024 13:00:00 +0000

In this episode, Senior Research Software Engineer Tusher Chakraborty joins host Gretchen Huizinga to discuss “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things,” which was accepted at the 2024 USENIX Symposium on Networked Systems Design and Implementation (NSDI). In the paper, Chakraborty and his coauthors share their efforts to address the challenges of delivering reliable and affordable IoT connectivity via satellite-based networks. They propose a method for leveraging the motion of small satellites to facilitate efficient communication between a large IoT-satellite constellation and devices on Earth within a limited spectrum.

Read the paper

Transcript

[MUSIC]

[MUSIC FADES]

I’m talking today to Tusher Chakraborty, a senior research software engineer at Microsoft Research. Tusher is coauthor of a paper called “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things.” Tusher, thanks for joining us on Abstracts!

TUSHER CHAKRABORTY: Hi. Thank you for having me here, Gretchen, today. Thank you.

HUIZINGA: So because this show is all about abstracts, in just a few sentences, tell us about the problem your paper addresses and why we should care about it.

CHAKRABORTY: Yeah, so think of, I’m a farmer living in a remote area and bought a sensor to monitor the soil quality of my farm. The big headache for me would be how to connect the sensor so that I can get access to the sensor data from anywhere. We all know that connectivity is a major bottleneck in remote areas. Now, what if, as a farmer, I could just click the power button of the sensor, and it gets connected from anywhere in the world. It’s pretty amazing, right? And that’s what our research is all about. Get your sensor devices connected from anywhere in the world with just the click of power button. We call it one-click connectivity. Now, you might be wondering, what’s the secret sauce? It’s not magic; it’s direct-to-satellite connectivity. So these sensors directly get connected to the satellites overhead from anywhere on Earth. The satellites, which are orbiting around the earth, collect the data from the sensing devices and forward to the ground stations in some other convenient parts of the world where these ground stations are connected to the internet.

HUIZINGA: So, Tusher, tell us what’s been tried before to address these issues and how your approach contributes to the literature and moves the science forward.

CHAKRABORTY: So satellite connectivity is nothing new and has been there for long. However, what sets us apart is our focus on democratizing space connectivity, making it affordable for everyone on the planet. So we are talking about the satellites that are at least 10 to 20 times cheaper and smaller than state-of-the-art satellites. So naturally, this ambitious vision comes with its own set of challenges. So when you try to make something cheaper and smaller, you’ll face lots of challenges that all these big satellites are not facing. So if I just go a bit technical, think of the antenna. So these big satellite antennas, they can actually focus on particular part of the world. So this is something called beamforming. On the other hand, when we try to make the satellites cheaper and smaller, we can’t have that luxury. We can’t have beamforming capability. So what happens, they have omnidirectional antenna. So it seems like … you can’t focus on a particular part of the earth rather than you create a huge footprint on all over the earth. So this is one of the challenges that you don’t face in the state-of-the-art satellites. And we try to solve these challenges because we want to make connectivity affordable with cheaper and smaller satellites.

HUIZINGA: Right. So as you’re describing this, it sounds like this is a universal problem, and people have obviously tried to make things smaller and more affordable in the past. How is yours different? What methodology did you use to resolve the problems, and how did you conduct the research?

CHAKRABORTY: OK, I’m thrilled that you asked this one because the research methodology was the most exciting part for me here. As a part of this research, we launched a satellite in a joint effort with a satellite company. Like, this is very awesome! So it was a hands-on experience with a real-deal satellite system. It was not simulation-based system. The main goal here was to learn the challenge from a real-world experience and come up with innovative solutions; at the same time, evaluate the solutions in real world. So it was all about learning by doing, and let me tell you, it was quite the ride! [LAUGHTER] We didn’t do anything new when we launched the satellites. We just tried to see how industry today does this. We want to learn from them, hey, what’s the industry practice? We launched a satellite. And then we faced a lot of problems that today’s industry is facing. And from there, we learned, hey, like, you know, this problem is industry facing; let’s go after this, and let’s solve this. And then we tried to come up with the solutions based on those problems. And this was our approach. We didn’t want to assume something beforehand. We want to learn from how industry is going today and help them. Like, hey, these are the problems you are facing, and we are here to help you out.

HUIZINGA: All right, so assuming you learned something and wanted to pass it along, what were your major findings?

CHAKRABORTY: OK, that’s a very good question. So I was talking about the challenges towards this democratization earlier, right? So one of the most pressing challenges: shortage of spectrum. So let me try to explain this from the high level. So we need hundreds of these satellites, hundreds of these small satellites, to provide 24-7 connectivity for millions of devices around the earth. Now, I was talking, the footprint of a satellite on Earth can easily cover a massive area, somewhat similar to the size of California. So now with this large footprint, a satellite can talk with thousands of devices on Earth. You can just imagine, right? And at the same time, a device on Earth can talk with multiple satellites because we are talking about hundreds of these satellites. Now, things get tricky here. [LAUGHTER] We need to make sure that when a device and a satellite are talking, another nearby device or a satellite doesn’t interfere. Otherwise, there will be chaos—no one hearing others properly. So when we were talking about this device and satellite chat, right, so what is that all about? This, all about in terms of communication, is packet exchange. So the device sends some packet to the satellite; satellite sends some packet to the device—it’s all about packet exchange. Now, you can think of, if multiple of these devices are talking with a satellite or multiple satellites are talking with a device, there will be a collision in this packet exchange if you try to send the packets at the same time. And if you do that, then your packet will be collided, and you won’t be able to get any packet on the receiver end. So what we do, we try to send this packet on different frequencies. It’s like a different sound or different tone so that they don’t collide with each other. And, like, now, I said that you need different frequencies, but frequency is naturally limited. And the choice of frequency is even limited. This is very expensive. But if you have limited frequency and you want to resolve this collision, then you have a problem here. How do you do that? So we solve this problem by smartly looking at an artifact of these satellites. So these satellites are moving really fast around the earth. So when they are moving very fast around the earth, they create a unique signature on the frequency that they are using to talk with the devices on Earth. And we use this unique signature, and in physics, this unique signature is known as Doppler signature. And now you don’t need a separate frequency to sound them different, to have packets on different frequencies. You just need to recognize that unique signature to distinguish between satellites and distinguish between their communications and packets. So in that sense, there won’t be any packet collision. And this is all about our findings. So with this, now multiple devices and satellites can talk with each other at the same time without interference but using the same frequency.

HUIZINGA: It sounds, like, very similar to a big room filled with a lot of people. Each person has their own voice, but in the mix, you, kind of, lose track of who’s talking and then you want to, kind of, tune in to that specific voice and say, that’s the one I’m listening to.

CHAKRABORTY: Yeah, I think you picked up the correct metaphor here! This is the scenario you can try to explain here. So, yeah, like what we are essentially doing, like, if you just, in a room full of people and they are trying to talk with each other, and then if they’re using the same tone, no one will be distinguished one person from another.

HUIZINGA: Right …

CHAKRABORTY: Everyone will sound same and that will be colliding. So you need to make sure that, how you can differentiate the tones …

HUIZINGA: Yeah …

CHAKRABORTY: … and the satellites differentiate their tones due to their fast movement. And we use our methodology to recognize that tone, which satellite is sending that tone.

HUIZINGA: So you sent up the experimental satellite to figure out what’s happening. Have you since tested it to see if it works?

CHAKRABORTY: Yeah, yeah, so we have tried it out, because this is a software solution, to be honest.

HUIZINGA: Ah.

CHAKRABORTY: As I was talking about, there is no hardware modification required at this point. So what we did, we just implemented this software in the ground stations, and then we tried to recognize which satellite is creating which sort of signature. That’s it!

HUIZINGA: Well, it seems like this research would have some solid real-world impact. So who would you say it helps most and how?

CHAKRABORTY: OK, that’s a very good one. So the majority of the earth still doesn’t have affordable connectivity. The lack of connectivity throws a big challenge to critical industries such as agriculture—the example that I gave—energy, and supply chain, so hindering their ability to thrive and innovate. So our vision is clear: to bring 24-7 connectivity for devices anywhere on Earth with just a click of power button. Moreover, affordability at the heart of our mission, ensuring that this connectivity is accessible to all. So in core, our efforts are geared towards empowering individuals and industries to unlock their full potential in an increasingly connected world.

HUIZINGA: If there was one thing you want our listeners to take away from this research, what would it be?

CHAKRABORTY: OK, if there is one thing I want you to take away from our work, it’s this: connectivity shouldn’t be a luxury; it’s a necessity. Whether you are a farmer in a remote village or a business owner in a city, access to reliable, affordable connectivity can transform your life and empower your endeavors. So our mission is to bring 24-7 connectivity to every corner of the globe with just a click of a button.

HUIZINGA: I like also how you say every corner of the globe, and I’m picturing a square! [LAUGHTER] OK, last question. Tusher, what’s next for research on satellite networks and Internet of Things? What big unanswered questions or unsolved problems remain in the field, and what are you planning to do about it?

CHAKRABORTY: Uh … where do I even begin? [LAUGHTER] Like, there are countless unanswered questions and unsolved problems in this field. But let me highlight one that we talked here: limited spectrum. So as our space network expands, so does our need for spectrum. But what’s the tricky part here? Just throw more and more spectrum. The problem is the chunk of spectrum that’s perfect for satellite communication is often already in use by the terrestrial networks. Now, a hard research question would be how we can make sure that the terrestrial and the satellite networks coexist in the same spectrum without interfering [with] each other. It’s a tough nut to crack, but it’s a challenge we are excited to tackle head-on as we continue to push the boundaries of research in this exciting field.

[MUSIC]

HUIZINGA: Tusher Chakraborty, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab). You can also read it on the Networked Systems Design and Implementation, or NSDI, website, and you can hear more about it at the NSDI conference this week. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: April 16, 2024 appeared first on Microsoft Research.

Abstracts Archives - Microsoft Research

Abstracts: November 14, 2024

Transcript

Abstracts: November 5, 2024

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: November 4, 2024

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: September 30, 2024

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: August 15, 2024

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: July 29, 2024

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: July 18, 2024

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: May 20, 2024

Learn More:

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: May 6, 2024

Subscribe to the Microsoft Research Podcast:

Transcript

Abstracts: April 16, 2024

Subscribe to the Microsoft Research Podcast:

Transcript