Research Forum | Episode 2 - abstract chalkboard background

Research Forum Brief | March 2024

The Metacognitive Demands and Opportunities of Generative AI

Share this page

Lev Tankelevitch

“We believe that a metacognitive perspective can really help us analyze, measure, and evaluate the usability challenges of generative AI. And it can help us design generative AI systems that can augment human agency and workflows.”

Lev Tankelevitch, Senior Behavioral Science Researcher, Microsoft Research Cambridge

Transcript: Lightning Talk 2

The metacognitive demands and opportunities of generative AI

Lev Tankelevitch, Senior Behavioral Science Researcher, Microsoft Research Cambridge

Lev Tankelevitch explores how metacognition—the psychological capacity to monitor and regulate one’s cognitive processes—provides a valuable perspective for comprehending and addressing the usability challenges of generative AI systems around prompting, assessing and relying on outputs, and workflow optimization.

Microsoft Research Forum, March 5, 2024

LEV TANKELEVITCH: My name is Lev. I’m a researcher in the Collaborative Intelligence team in Microsoft Research Cambridge, UK, and today I’ll be talking about what we’re calling the metacognitive demands and opportunities of generative AI. So we know that AI has tremendous potential to transform personal and professional work. But as we show in our recent paper, a lot of usability challenges remain—from crafting the right prompts to evaluating and relying on outputs to integrating AI into our daily workflows. And [what] we propose in a recent paper is that metacognition offers a powerful framework to understand and design for these usability challenges.  

So metacognition is thinking about thinking and includes things like self-awareness, so our ability to be aware of our own goals, knowledge, abilities, and strategies; our confidence and its adjustment, so this is our ability to maintain an appropriate level of confidence in our knowledge and abilities and adjust that as new information comes in; task decomposition, our ability to take a cognitive task or goal and break it down into subtasks and address them in turn; and metacognitive flexibility, so our ability to recognize when a cognitive strategy isn’t working and adapt it accordingly. Let me walk you through a simple example workflow.

So let’s say you decided to ask an AI system to help you in crafting an email. So in the beginning, you might have to craft a prompt. And so you might ask yourself, what am I trying to convey with this email? Perhaps I need to summarize x, clarify y, or conclude z—all in the correct tone. You might then get an output and then need to evaluate that. And then you might ask yourself, well, how can I make sense of this output? In the case of an email example, it’s pretty straightforward. But what if you’re working with a programming language that you’re less familiar with? You might then need to iterate on your prompt. And so then you might ask yourself, well, how does it relate to my ability to craft the right prompt versus the system’s performance in a given task or domain?  

And now if you zoom out a little bit, there are these questions around what we’re calling automation strategy. So this is whether, when, and how you can apply AI to your workflows. So here you might ask yourself, is trying generative AI worth my time versus doing a task manually? And how confident am I that I can actually complete a task manually or learn AI effectively to help me do it? And then if I do decide to rely on AI on my workflows, how do I actually integrate it into my workflows most effectively? And so what we’re proposing is that all these questions really reflect the metacognitive demands that generative AI systems impose on users as they interact with these systems. So, for example, at the prompt formulation stage, this involves self-awareness of task goals. So knowing exactly what you want to achieve and break that down into subgoals and subtasks and then verbalize that explicitly for an effective prompt. At the output evaluation stage, it involves well-adjusted confidence in your ability to actually evaluate that output. And so that means disentangling your confidence in the domain you’re working with from the system’s performance in that task or domain.  

In the prompt iteration stage, it involves well-adjusted confidence in your prompting ability, so this is about disentangling your ability to craft an effective prompt from the system’s performance in that task or domain, and metacognitive flexibility, which is about recognizing when a prompting strategy isn’t working and then adjusting it accordingly. In the automation strategy level, this is about self-awareness of the applicability and impact of AI on your workflows and well-adjusted confidence in your ability to complete a task manually or learn generative AI systems effectively to actually help you do that. And then finally, it requires metacognitive flexibility in actually recognizing when your workflow with AI isn’t working effectively and adapting that accordingly.

So beyond reframing these usability challenges through the perspective of metacognition, we know from psychology research that metacognition is both measurable and teachable. And so we can now think about how we can design systems that actually support people’s metacognition as they interact with them. So, for example, you can imagine systems that support people in planning complex tasks. So let’s say you’ve decided to ask an AI system to help you craft an email. It might actually break that task down for you and remind you that certain types of content are more common in such emails and actually proactively prompt you to fill that content in. It might also make you aware of the fact that there’s a certain tone or length that you might want to have for this email. And so in this way, it, sort of, breaks the task down for you and actually improves your self-awareness about different aspects of your task.  

Similarly, we can imagine systems that support people in reflecting on their own cognition. So let’s say you’ve asked the system to help you craft a proposal based on a previous document. Now a smart system that knows in the past you’ve had to edit this output quite extensively might let you know that you should specify an outline or other details and provide you with examples so that you can save time later on. Similarly, at the output evaluation stage, you can imagine how such an approach can augment AI explanations. So this is work done by the Calc Intelligence team here at Microsoft Research, and it shows a system that can help users complete tasks in spreadsheets. And it shows a step-by-step breakdown of the approach that it took to complete that task. So you can imagine a system that proactively probes users about different steps and their uncertainty around those steps and then tailors explanations effectively to that user’s uncertainty.  

So in sum, we believe that a metacognitive perspective can really help us analyze, measure, and evaluate the usability challenges of generative AI. And it can help us design generative AI systems that can augment human agency and workflows. For more details, I encourage you to check out the full paper, and I thank you for your time.


The Metacognitive Demands and Opportunities of Generative AI

By Lev Tankelevitch

Generative AI (GenAI) systems offer unprecedented opportunities for transforming professional and personal work. This potential stems from a unique combination (opens in new tab) of generative AI’s model flexibility, in that systems can accommodate a wide range of information in prompts and outputs; generality, in that systems are applicable to a wide range of tasks; and originality, in that systems can generate novel content. However, these properties are a double-edged sword, in that they also pose usability challenges for people working with GenAI systems. Studies show that people find it difficult to craft effective prompts, evaluate and rely on AI outputs, and optimize their workflows with GenAI. In recent work (opens in new tab), we propose that metacognition—the psychological ability to monitor and control one’s thoughts and behavior— offers a valuable lens through which to understand and design for these usability challenges.

Current GenAI systems impose multiple metacognitive demands on users. A useful analogy for how people work with GenAI systems is that of a manager delegating tasks to a team. A manager needs to clearly understand and formulate their goals, break down those goals into communicable tasks, confidently assess the quality of the team’s output, and adjust plans accordingly along the way. Moreover, a manager needs to decide whether, when, and how to delegate tasks in the first place. All these responsibilities involve the metacognitive monitoring and control of one’s thought processes and behavior (i.e., cognition). Working successfully with GenAI systems requires these same abilities.

Prompting is the first challenge people face when interacting with GenAI systems, according to multiple (opens in new tab) studies (opens in new tab). With a manual task, such as drafting an email, many implicit goals and intentions can remain so without ever being verbalized. For example, one might implicitly know to adopt a certain tone when writing to a senior colleague, or that a given draft first requires summarizing certain content and then pulling out key points for discussion. In contrast, all these details need to be explicitly specified in order for GenAI systems to execute tasks effectively. This requires people to be aware of their goals, have the ability to break down tasks into sub-tasks, and to verbalize all of this within a set of prompts—abilities comprising metacognitive monitoring and control. After submitting a prompt, a common step is assessing the output (i.e., determining whether the system achieved what was intended) and iterating on the prompt. If the result is not satisfactory, a key challenge for users is disentangling the role of their prompting ability from that of the system’s capabilities. A poor result can arise due to a poorly worded prompt, the system’s settings, the system’s limitations on a given task, or an “unlucky draw” given the probabilistic nature of many GenAI systems. The range of possible explanations make it difficult for people to appropriately adjust their confidence in their prompting ability, a key form of metacognitive monitoring—much like a manager who can’t figure out whether their instructions are unclear or if their team is underperforming for other reasons. A subsequent challenge for users is being flexible enough to change prompting strategies as needed, whether that means rewording their most recent prompt, breaking up their task into further sub-tasks, or taking another approach.

Putting prompting aside, assessing the system’s output poses further challenges for people in terms of deciding whether to rely on it. This is particularly true when people use GenAI in a domain in which they have little expertise (opens in new tab), such as a new programming language. Again, the challenge here is to appropriately adjust one’s confidence, albeit now in the domain itself and the ability to assess the output. This is critical given the risk of incorrect or incoherent results, or other errors that GenAI systems can produce. Alongside the many possible explanations for a given output, this is further exacerbated by the speed and ease with which GenAI systems can produce extensive outputs, including entire presentations, reports, or software. This is akin to a manager with a prolific team that they cannot fully trust. Maintaining an appropriate level of confidence in assessing GenAI output is important, because this may influence the extent of over- or under-reliance on this technology.

Zooming out from individual interactions with GenAI systems, there is a broader question of how people can determine whether, when, and how they should use GenAI (opens in new tab) for tasks within their workflows. Determining this “automation strategy” requires awareness of, and an appropriate level of confidence in, the applicability and potential impact of using GenAI within a workflow. For example, people need to recognise when relying on GenAI is more productive than doing a task manually, or vice versa. Over time, people also need the flexibility to change strategies as they assess their workflows.

Rather than imposing metacognitive demands on users, GenAI’s model flexibility, generality, and originality also present many opportunities for improving people’s metacognition—that is, their self-awareness, adjustment of confidence, and flexibility in their cognition. An exciting area for exploration is designing systems that can proactively support users in planning tasks with GenAI. For example, systems can transform people’s initial high-level task into a series of sub-tasks (opens in new tab), clarifying their goals in the process, and helping them craft effective prompts. At the output assessment stage, systems can also proactively help people reflect (opens in new tab) on the output, their confidence in it, and any uncertainties they may have, either in the GenAI interaction or in the domain itself. Indeed, by adapting information to people’s level of confidence and areas of uncertainty, GenAI systems could help augment explanations (opens in new tab) for their own outputs in an interactive manner. At a broader level, GenAI systems can proactively support people in reflecting on and, in turn, optimizing their workflows as they incorporate GenAI into their tasks. In a way, it’s as if each manager had a personal coach and mentor that supported them as they learned to manage their team.

As we offload more of our cognitive tasks (e.g., ideation, synthesis, writing) to GenAI systems, it becomes increasingly important to understand how we monitor and control our cognition. This is precisely where the perspective of metacognition fits in. Beyond helping us re-frame the key usability challenges emerging in human-GenAI interaction, metacognition can also inspire novel system designs that augment our cognition, improving our self-awareness, confidence adjustment, and flexibility as we work with GenAI systems. As hinted here, there is a lot more work to do in this space.

The thinking around metacognition and GenAI described here is a Microsoft Research project, with key collaborators that include Viktor Kewenig, Auste Simkute, Ava Scott, Advait Sarkar, Abi Sellen, and Sean Rintel.