{"id":1003998,"date":"2024-01-30T12:05:00","date_gmt":"2024-01-30T20:05:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&p=1003998"},"modified":"2026-02-18T14:06:38","modified_gmt":"2026-02-18T22:06:38","slug":"research-forum-panel-discussion-ai-frontiers","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/research-forum-panel-discussion-ai-frontiers\/","title":{"rendered":"Panel: AI Frontiers"},"content":{"rendered":"\n
Hosted by Ashley Llorens<\/a>, with Ece Kamar<\/a>, S\u00e9bastien Bubeck, and Ahmed Awadallah<\/a> at Microsoft Research Forum, Season 1, Episode 1<\/strong><\/em><\/p>\n\n\n\n Hosted by Ashley Llorens, VP and Distinguished Scientist, Microsoft AI researchers, S\u00e9bastien Bubeck, Ahmed Awadallah, and Ece Kamar discuss frontiers in small language models and where AI research and capabilities are headed next.<\/p>\n\n\n\n Panel: AI Frontiers<\/strong><\/p>\n\n\n\n Ashley Llorens,<\/strong> VP and Distinguished Scientist, Microsoft I\u2019m Ashley Llorens, with Microsoft Research. My team works across research and product to incubate emerging technologies and runs programs that connect our research at Microsoft to the broader research community. I sat down with research leaders Ece Kamar, Ahmed Awadallah, and S\u00e9bastien Bubeck to explore some of the most exciting new frontiers in AI. We discussed their aspirations for AI, the research directions they\u2019re betting on to get us there, and how their team is working differently to meet this moment.<\/p>\n\n\n\n\t\t\t\t<\/span>\n\t\t\t\t\n\t\t\t\t\t\n\n\n\n ASHLEY LLORENS: <\/strong>So let’s dive in. We’re experiencing an inflection point in human technology where machines, broadly speaking, are starting to exhibit the sparks of general intelligence, and it’s hard to avoid the enthusiasm. Even if you wanted to. And I think it’s fair to say that there’s no shortage of that enthusiasm here among us. But as researchers, we’re also skeptics. You know, we go right in and try to understand the limitations of the technology as well as the capabilities, because it’s really those limitations that expose and define the frontiers that we want to push forward on. And so what I want to start here with is to sketch those frontiers here with you a little bit. I’d like to hear about an aspiration you have for AI and why the technology cannot <\/em>do that today. Then we’ll come back around to the research directions that you’re betting on to close those gaps. And, so, I don’t know. Ahmed, what do you think? What aspiration do you have for AI, and why can’t the tech do it today?<\/p>\n\n\n\n AHMED AWADALLAH<\/strong>: <\/strong>I have a lot of aspirations. I think … you just mentioned we saw the sparks of AGI, so naturally, we\u2019re looking forward to actually seeing AGI. But beyond that, more realistically, I think two of the things I’m really looking forward to is having AI that can actually perceive and operate in the real world. We have made significant advances with language models. We are seeing a lot of advantages with multimodality. It looks like an AI that can perceive and operate in the real world is not that far off from where we are. But there are a lot of challenges, as well. And I’m really excited to see how we can get to that. <\/p>\n\n\n\n LLORENS: <\/strong>What does that look like for you, when AI operates in the real world? What is it doing? <\/p>\n\n\n\n AWADALLAH: <\/strong>It looks … To me, it means that we go, first, <\/strong>we go beyond language, and we are getting a lot into multimodal models right now that can perceive images and languages. However, a big part of what we do is that we take actions in the world in different ways. We have a lot of behavior that we exhibit as we do tasks, and it’s not clear that we can do that right now with AI. So imagine that we have an AI system that we can ask to do things on our behalf, both in the digital and in the physical world. Imagine that we have guarantees that they will accomplish these tasks in a way that aligns with our original intent. <\/p>\n\n\n\n LLORENS: <\/strong>Yeah, it’s compelling. Ece, what do you think?<\/p>\n\n\n\n ECE KAMAR: <\/strong>My dream for the AI systems is that they become our helpers, companions, longer-term collaborators than just, like, prompting something and it gives me an answer. And we are, actually, still quite far from having AI systems that can really help us through our life for the different purposes that we have and also really understand our goals, intentions, and also preferences. So I think we have, right now, the sparks that we are seeing are really about having building blocks that give us the initial technologies to build on, to get to those AI systems that have a memory, that have a history, that have a deep understanding of human concepts, and that can carry out tasks that are a lot broader, a lot more complex than what we can do today. And our task right now is using these blocks to really imagine what those future systems are going to look like and discover those new innovations that will push the capabilities forward so that we can really build systems that create a difference in our lives, not only the systems that we want to play with or, you know, do small tasks for us\u2014that are already changing how I work, by the way. These things are not minor, but they can really be a part of my daily life and help me<\/em> with everything I do. <\/p>\n\n\n\n LLORENS: <\/strong>Seb, what do you think? <\/p>\n\n\n\n S\u00c9BASTIEN BUBECK:<\/strong> <\/strong>Yeah, my aspiration for AI, actually, has nothing to do with the technology itself. I hope that AI will illuminate how the human mind works. That’s really my real aspiration. You know, I think what’s going on in our minds and the way we reason is extremely mysterious. And anything that is mysterious, it looks kind of magical. We have no idea what are the basic elements for it. And with AI, we’re seeing that, at the very least, it’s mimicking the type of reasoning that’s going on in human beings. So I’m hoping that we’re going to be able to really uncover those building blocks of reasoning. That’s my dream for the next decade, I guess. <\/p>\n\n\n\n LLORENS: <\/strong>How good of an analogy do you think, I’ll say, transformers or, you know, today’s machine learning models are for how we think and reason? <\/p>\n\n\n\n BUBECK: <\/strong>It’s a terrible analogy. [LAUGHS] So it really \u2026 the transformer is absolutely not, in my mind, trying to mimic what the human brain is doing. It’s more like the emergent properties are similar. So, you know, it’s \u2026 the substrate is going to be obviously different. I mean, one is a machine and one is wetware, and the concrete algorithm that is running will be different. But it’s plausible that the emergent property will be similar. That’s what I’m hoping. <\/p>\n\n\n\n LLORENS: <\/strong>No, yeah. Super interesting. And now I want to understand a little bit about the research directions that you are most excited about to get there. I don’t think you’re going to tell me about your neuroscience research. [LAUGHS] <\/p>\n\n\n\n BUBECK: <\/strong>[LAUGHS] I wish. I wish. <\/p>\n\n\n\n LLORENS: <\/strong>That\u2019s an interesting place to start … <\/p>\n\n\n\n KAMAR: <\/strong>Not yet. Maybe in the next episode. [LAUGHS] <\/p>\n\n\n\n BUBECK: <\/strong>Exactly. <\/p>\n\n\n\n LLORENS: <\/strong>But what are you betting on right now to get us closer to that? <\/p>\n\n\n\n BUBECK: <\/strong>Yeah. No, it’s actually connected, the two things. So what we are experimenting with right now is the following. So to us, I think to all of us here, GPT-4 showed the sparks of AGI, early signs of humanlike reasoning. And to us, we see this as a, kind of, proof of concept. OK, it means you can get this type of intelligence<\/em>\u2014quote, unquote\u2014if you scale up a ton, if you have a very, very large neural network trained on a lot of data with a lot of compute for a very long time. OK, great. But exactly which one of those elements was needed? Is it the big data that’s necessary? Is it the large neural network? Is it a lot of compute? And what is a lot<\/em>, by the way? What is large? You know, is 1 billion large? Is 10 billion large? You know, questions like this. So to me, this comes from a scientific inquiry perspective. But at the end of the day, it has enormous economical impact, because when you answer these questions, you go make everything smaller. And this is what we’ve been doing with the Phi series of models, trying to build those small language models. Again, we come at it from the scientific perspective, but it has very, very concrete impact for the future of Microsoft.<\/p>\n\n\n\n LLORENS: <\/strong>So I think Phi is on a lot of minds right now. Let’s actually stick with Phi for a minute. What is the secret? [LAUGHS] What\u2014let’s stick with that\u2014what is the secret? What’s enabling you to get to the reasoning capabilities that you’re demonstrating with models of that size? <\/p>\n\n\n\n BUBECK: <\/strong>Yes, yes, yeah. There is \u2026 <\/p>\n\n\n\n LLORENS: <\/strong>What size is Phi, by the way? <\/p>\n\n\n\n BUBECK: <\/strong>Yeah, so the latest, Phi-2 (opens in new tab)<\/span><\/a>, is 2.7 billion parameters. Phi-1.5 (opens in new tab)<\/span><\/a> was 1.3 billion. So we have doubled the size. So the secret is actually very simple. The secret is in the title of the first paper that we wrote in the Phi series, which is \u201cTextbooks Are All You Need.\u201d So \u201cTextbooks Are All You Need,\u201d this is, of course, a play on the most famous paper of all time in machine learning, \u201cAttention Is All You Need,\u201d that introduced the attention mechanism for the transformer architecture. So in \u201cTextbooks Are All You Need,\u201d what we say is if you play with the data and you come up with data which is of \u201ctextbook quality\u201d\u2014so the meaning of this is a little bit fuzzy, and this is where part of the secret lies\u2014but if you come up with this textbook-quality data, we’re able to get a thousand x gains if you look at the total compute that you need to spend to reach a certain level in terms of benchmark, intelligence, etc. So now what is this textbook quality, this mysterious textbook quality? Well, the way I want to put it is as follows. What matters in text when you give text to this transformer to try to teach them a concept is how much reasoning is going on in the text? How, what kind of concept can you extract if you are to predict the next word in that text? So what we want is text which is reasoning dense, and, you know, like, novels, they are not really reasoning dense. Sometimes you need to reason a little bit to understand, OK, how all the characters are related, you know, why are they thinking or doing what they are doing. But where do you have really reasoning-dense text? Well, it’s in textbooks. So this is the secret, basically. <\/p>\n\n\n\n LLORENS: <\/strong>And, Ahmed, recently you and I have had conversations about a universe of different pretraining methods, textbook-like reasoning tokens, you know, being one, and then also the whole universe of, of post-training methods and how there’s a whole space to explore there. So maybe you can get into your research interests, you know, where are you pushing on that frontier? And, you know, what haven’t we talked about yet in terms of pretraining versus post-training? <\/p>\n\n\n\n AWADALLAH: <\/strong>Yeah, that’s a very good question. And, actually, it was very interesting that many, many similar insights would apply to what S\u00e9bastien was just describing. But if you look at how we have been pretraining models recently, we start with the pretraining stage, where we basically show the model a lot of text\u2014the textbooks\u2014and we have them learning to predict the next word. And with a lot of data and a lot of size, the big size, a lot of emergent properties were showing up in some models that we didn’t really even try to teach them to the model. But we have also been seeing that there are other stages of pretraining\u2014some people refer to it as post-training\u2014where after we pretrain the model, we actually start teaching it specific skills, and that comes in the form of input-output samples or sometimes an input and two different outputs, and we are trying to teach the model that the first output is preferred to the second output. We can do that to teach the model a particular style or a skillset or even for alignment, to teach it to act in a safer way. <\/p>\n\n\n\n But what we have found out is that now that we have these large models, as well\u2014and they are actually very powerful engines that can enable us to create all sorts of data\u2014many of these properties, we don’t have to wait for them to emerge with the size. We can, actually, go back and create synthetic tailored data to try to teach a smaller model that particular skill. We started with reasoning, as well, because reasoning is a pretty hard property, and we haven’t really seen reasoning emerging even to that level we have in models like GPT-4 right now, except after scaling to so large size in the model and in the data size, as well. So the question was, now that we have emerged it in these models, can we actually create data that teaches the model that particular skill? And we were not trying to teach the model any new knowledge, really. We were just trying to teach the small model how to behave, how to solve a task. So, for example, with a model like GPT-4, we are seeing that you can ask it to solve a task that requires breaking up a task into steps and going step by step into solving that task. We have never seen that with a small model, but what we have found out is that you can, actually, use a powerful model to demonstrate the solution strategy to the small model, and you can actually demonstrate so many solution strategies for so many tasks. And the small models are able, actually, to learn that, and the reasoning ability is significantly improved based on that. <\/p>\n\n\n\n LLORENS: <\/strong>I find the word reasoning<\/em> pretty loaded. <\/p>\n\n\n\n AWADALLAH: <\/strong>It is.<\/p>\n\n\n\n LLORENS: <\/strong>I think a lot of people mean a lot of different things by reasoning. Actually, I found some clarity. I had a nice discussion with two of our colleagues, Emre Kiciman and Amit Sharma, and, you know, they wrote a recent paper on reasoning. Sometimes we mean symbolic-style reasoning; sometimes we mean more commonsense reasoning. You talked about, kind of, more symbolic-style-reasoning tokens perhaps, or how do I think about the difference between those kinds of training data versus world knowledge that I might want a model to reason about? <\/p>\n\n\n\n BUBECK: <\/strong>Yeah, very good question. So if you take the perspective that you start with a neural network, which is a completely blank slate, you know, just purely random weights, then you need to teach it everything<\/em>. So going for the reasoning, the high-level reasoning that we do as human beings, this is like, you know, step No. 10. You have many, many steps that you need to satisfy before, including, as you said, commonsense reasoning. So, in fact, in our approach for the pretraining stage, we need to spend a lot of effort into the commonsense reasoning. And there, the textbooks approach is perhaps a little bit weird because there\u2019s no textbook to teach you commonsense reasoning. You know, you acquire commonsense reasoning by going outside, you know, seeing nature, talking to people, you know, interacting, etc. So we \u2026 you have to think a little bit outside the box to come up with textbooks that will teach commonsense reasoning. But this is, actually, what we do, a big, a huge part of what we did. In fact, everything that we did for Phi-1.5 was focused on commonsense reasoning. And then when we got to Phi-2, we got a little bit closer to the Orca<\/a> model, and we tried to teach also slightly higher-level reasoning, but we’re not there yet. There is still, you know, a few more layers. We’re not yet at step No. 10. <\/p>\n\n\n\nTranscript<\/h3>\n\n\n\n
Ece Kamar,<\/strong> Managing Director, Microsoft Research AI Frontiers
S\u00e9bastien Bubeck,<\/strong> VP, Microsoft GenAI
Ahmed Awadallah,<\/strong> Senior Principal Research Manager, Microsoft Research AI Frontiers <\/p>\n\n\n\n