Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.
In this episode, Senior Behavioral Science Researcher Lev Tankelevitch joins host Gretchen Huizinga to discuss “The Metacognitive Demands and Opportunities of Generative AI.” In their paper, Tankelevitch and his coauthors propose using the scientific study of how people monitor, understand, and adapt their thinking to address common challenges of incorporating generative AI into life and work—from crafting effective prompts to determining the value of AI-generated outputs.
To learn more about the paper and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.
Transcript
[MUSIC PLAYS]
GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.
[MUSIC FADES]
Today, I’m talking to Dr. Lev Tankelevitch, a senior behavioral science researcher from Microsoft Research. Dr. Tankelevitch is coauthor of a paper called “The Metacognitive Demands and Opportunities of Generative AI,” and you can read this paper now on arXiv. Lev, thanks for joining us on Abstracts!
LEV TANKELEVITCH: Thanks for having me.
HUIZINGA: So in just a couple sentences—a metacognitive elevator pitch, if you will—tell us about the issue or problem your paper addresses and, more importantly, why we should care about it.
TANKELEVITCH: Sure. So as generative AI has, sort of, rolled out over the last year or two, we’ve seen some user studies come out, and as we read these studies, we noticed there are a lot of challenges that people face with these tools. So people really struggle with, you know, writing prompts for systems like Copilot or ChatGPT. For example, they don’t even know really where to start, or they don’t know how to convert an idea they have in their head into, like, clear instructions for these systems. If they’re, sort of, working in a field that maybe they’re less familiar with, like a new programming language, and they get an output from these systems, they’re not really sure if it’s right or not. And then, sort of, more broadly, they don’t really know how to fit these systems into their workflows. And so we’ve noticed all these challenges, sort of, arise, and some of them relate to, sort of, the unique features of generative AI, and some relate to the design of these systems. But basically, we started to, sort of, look at these challenges, and try to understand what’s going on—how can we make sense of them in a more coherent way and actually build systems that really augment people and their capabilities rather than, sort of, posing these challenges?
HUIZINGA: Right. So let’s talk a little bit about the related research that you’re building on here and what unique insights or directions your paper adds to the literature.
TANKELEVITCH: So as I mentioned, we were reading all these different user studies that were, sort of, testing different prototypes or existing systems like ChatGPT or GitHub Copilot, and we noticed different patterns emerging, and we noticed that the same kinds of challenges were cropping up. But there weren’t any, sort of, clear coherent explanations that tied all these things together. And in general, I’d say that human-computer interaction research, which is where a lot of these papers are coming out from, it’s really about building prototypes, testing them quickly, exploring things in an open-ended way. And so we thought that there was an opportunity to step back and to try to see how we can understand these patterns from a more theory-driven perspective. And so, with that in mind, one perspective that became clearly relevant to this problem is that of metacognition, which is this idea of “thinking about thinking” or how we, sort of, monitor our cognition or our thinking and then control our cognition and thinking. And so we thought there was really an opportunity here to take this set of theories and research findings from psychology and cognitive science on metacognition and see how they can apply to understanding these usability challenges of generative AI systems.
HUIZINGA: Yeah. Well, this paper isn’t a traditional report on empirical research as many of the papers on this podcast are. So how would you characterize the approach you chose and why?
TANKELEVITCH: So the way that we got into this, working on this project, it was, it was quite organic. So we were looking at these user studies, and we noticed these challenges emerging, and we really tried to figure out how we can make sense of them. And so it occurred to us that metacognition is really quite relevant. And so what we did was we then dove into the metacognition research from psychology and cognitive science to really understand what are the latest theories, what are the latest research findings, how could we understand what’s known about that from that perspective, from that, sort of, fundamental research, and then go back to the user studies that we saw in human-computer interaction and see how those ideas can apply there. And so we did this, sort of, in an iterative way until we realized that we really have something to work with here. We can really apply a somewhat coherent framework onto these, sort of, disparate set of findings not only to understand these usability challenges but then also to actually propose directions for new design and research explorations to build better systems that support people’s metacognition.
HUIZINGA: So, Lev, given the purpose of your paper, what are the major takeaways for your readers, and how did you present them in the paper?
TANKELEVITCH: So I think the key, sort of, fundamental point is that the perspective of metacognition is really valuable for understanding the usability challenges of generative AI and potentially designing new systems that support metacognition. And so one analogy that we thought was really useful here is of a manager delegating tasks to a team. And so a manager has to determine, you know, what is their goal in their work? What are the different subgoals that that goal breaks down into? How can you communicate those goals clearly to a team, right? Then how do you assess your team’s outputs? And then how do you actually adjust your strategy accordingly as the team works in an iterative fashion? And then at a higher level, you have to really know how to—actually what to delegate to your team and how you might want to delegate that. And so we realized that working with generative AI really parallels these different aspects of what a manager does, right. So when people have to write a prompt initially, they really have to have self-awareness of their task goals. What are you actually trying to achieve? How does that translate into different subtasks? And how do you verbalize that to a system in a way that system understands? You might then get an output and you need to iterate on that output. So then you need to really think about, what is your level of confidence in your prompting ability? So is your prompting the main reason why the output isn’t maybe as satisfactory as you want, or is it something to do with the system? Then you actually might get the output [you’re] happy with, but you’re not really sure if you should fully rely on it because maybe it’s an area that is outside of your domain of expertise. And so then you need to maintain an appropriate level of confidence, right? Either to verify that output further or decide not to rely on it, for example. And then at a, sort of, broader level, this is about the question of task delegation. So this requires having self-awareness of the applicability of generative AI to your workflows and maintaining an appropriate level of confidence in completing tasks manually or relying on generative AI. For example, whether it’s worth it for you to actually learn how to work with generative AI more effectively. And then finally, it requires, sort of, metacognitive flexibility to adapt your workflows as you work with these tools. So are there some tasks where the way that you’re working with them is, sort of, slowing you down in specific ways? So being able to recognize that and then change your strategies as necessary really requires metacognitive flexibility. So that was, sort of, one key half of our findings.
And then beyond that we really thought about how we can use this perspective of metacognition to design better systems. And so one, sort of, general direction is really about supporting people’s metacognition. So we know from research from cognitive science and psychology that we can actually design interventions to improve people’s metacognition in a lasting and effective way. And so similarly, we can design systems that support people’s metacognition. For example, systems that support people in planning their tasks as they actually craft prompts. We can support people in actually reflecting on their confidence in their prompting ability or in assessing the output that they see. And so this relates a little bit to AI acting as a coach for you, which is an idea that the Microsoft Research New York City team came up with. So this is Jake Hofman, David Rothschild, and Dan Goldstein. And so, in this way, generative AI systems can really help you reflect as a coach and understand whether you have the right level of confidence in assessing output or crafting prompts and so on. And then similarly, at a higher level, they can help you manage your workflows, so helping you reflect on whether generative AI is really working for you in certain tasks or whether you can adapt your strategy in certain ways. And likewise, this relates also to explanations about AI, so how you can actually design systems that are explainable to users in a way that helps them achieve their goals? And explainability can be thought about as a way to actually reduce the metacognitive demand because you’re, sort of, explaining things in a way to people that they don’t have to keep in their mind and have to think about, and that, sort of, improves their confidence. It can help them improve their confidence or calibrate their confidence in their ability to assess outputs.
HUIZINGA: Talk for a minute about real-world impact of this research. And by that, I mean, who does it help most and how? Who’s your main audience for this right now?
TANKELEVITCH: In a sense, this is very broadly applicable. It’s really about designing systems that people can interact with in any domain and in any context. But I think, given how generative AI has rolled out in the world today, I mean, a lot of the focus has been on productivity and workflows. And so this is a really well-defined, clear area where there is an opportunity to actually help people achieve more and stay in control and actually be more intentional and be more aligned with their goals. And so this is, this is an approach where not only can we go beyond, sort of, automating specific tasks but actually use these systems to help people clarify their goals and track with them in a more effective way. And so knowledge workers are an obvious, sort of, use case or an obvious area where this is really relevant because they work in a complex system where a lot of the work is, sort of, diffused and spread across collaborations and artifacts and softwares and different ways of working. And so a lot of things are, sort of, lost or made difficult by that complexity. And so systems, um, that are flexible and help people actually reflect on what they want to achieve can really have a big impact here.
HUIZINGA: Mm-hmm. Are you a little bit upstream of that even now in the sense that this is a “research direction” kind of paper. I noticed that as I read it, I felt like this was how researchers can begin to think about what they’re doing and how that will help downstream from that.
TANKELEVITCH: Yes. That’s exactly right. So this is really about, we hope, unlocking a new direction of research and design where we take this perspective of metacognition—of how we can help people think more clearly and, sort of, monitor and control their own cognition—and design systems to help them do that. And in the paper, there’s a whole list of different questions, both fundamental research questions to understand in more depth how metacognition plays a role in human-AI interaction when people work with generative AI systems but also how we can then actually design new interventions or new systems that actually support people’s metacognition. And so there’s a lot of work to do in this, and we hope that, sort of, inspires a lot of further research, and we’re certainly planning to do a lot more follow-up research.
HUIZINGA: Yeah. So I always ask, if there was just one thing that you wanted our listeners to take away from this work, a sort of golden nugget, what would it be?
TANKELEVITCH: I mean, I’d say that if we really want generative AI to be about augmenting human agency, then I think we need to focus on understanding how people think and behave in their real-world context and design for that. And so I think specifically, the real potential of generative AI here, as I was saying, is not just to automate a bunch of tasks but really to help people clarify their intentions and goals and act in line with them. And so, in a way, it’s kind of about building tools for thought, which was the real vision of the early pioneers of computing. And so I hope that this, kind of, goes back to that original idea.
HUIZINGA: You mentioned this short list of open research questions in the field, along with a list of suggested interventions. You’ve, sort of, curated that for your readers at the end of the paper. But give our audience a little overview of that and how those questions inform your own research agenda coming up next.
TANKELEVITCH: Sure. So on the, sort of, fundamental research side of things, there are a lot of questions around how, for example, self-confidence that people have plays a role in their interactions with generative AI systems. So this could be self-confidence in their ability to prompt these systems. And so that is one interesting research question. What is the role of confidence and calibrating one’s confidence in prompting? And then similarly, on the, sort of, output evaluation side, when you get an output from generative AI, how do you calibrate your confidence in assessing that output, right, especially if it’s in an area where maybe you’re less familiar with? And so there’s these interesting, nuanced questions around self-confidence that are really interesting, and we’re actually exploring this in a new study. This is part of the AI, Cognition, and [the] Economy pilot project. So this is a collaboration that we’re running with Dr. Clara Colombatto, who’s a researcher in University of Waterloo and University College London, and we’re essentially designing a study where we’re trying to understand people’s confidence in themselves, in their planning ability, and in working with AI systems to do planning together, and how that influences their reliance on the output of generative AI systems.
[MUSIC PLAYS]
HUIZINGA: Well, Lev Tankelevitch, thank you for joining us today, and to our listeners, thanks for tuning in. If you want to read the full paper on metacognition and generative AI, you can find a link at aka.ms/abstracts, or you can read it on arXiv. Also, Lev will be speaking about this work at the upcoming Microsoft Research Forum, and you can register for this series of events at researchforum.microsoft.com. See you next time on Abstracts!
[MUSIC FADES]