Research Forum Brief | March 2024 Articles

Generative AI and Plural Governance: Mitigating Challenges and Surfacing Opportunities

Microsoft Research Team — Sat, 09 Mar 2024 00:56:35 +0000

Presented by Madeleine Daepp and Vanessa Gathecha at Microsoft Research Forum, March 2024

“Democracy requires healthy dialogue and debate. It is actively threatened by generative AI’s misuse. Neither civil society nor technology companies can challenge these problems in isolation. The disruption of our digital public sphere is an all-of-society challenge that requires an all-of-society response.”
– Madeleine Daepp, Senior Researcher, Microsoft Research Redmond

Microsoft research copilot experience What are the risks and challenges related to Generative AI and disinformation?

Transcript: Lightning Talk 5

Generative AI and plural governance: Mitigating challenges and surfacing opportunities

Ashley Llorens, CVP, Microsoft Research (closing remarks)
Madeleine Daepp, Senior Researcher, Microsoft Research Redmond
Vanessa Gathecha, Research and Policy Manager, Baraza Media Lab

Madeleine Daepp talks about the potential impacts and challenges of generative AI in a year with over 70 major global elections, and AI & Society fellow Vanessa Gathecha discusses her work on disinformation in Kenya and sub-Saharan Africa.

Microsoft Research Forum, March 5, 2024

ASHLEY LLORENS: Thank you all for joining us for Episode 2 of Research Forum, both the folks here in Building 99 and those joining live on our online platform.

More than ever, as we’ve seen today, pushing the frontiers of research demands collaboration across disciplines and institutions. Through our work on our AI & Society Fellows program, we are aiming to catalyze collaboration at another essential intersection: AI and society. To close us out today, I’m going to invite my colleague Madeleine Daepp and her collaborator under this AI & Society Fellows program, Vanessa from the Baraza Media Lab foundation, to tell us more about their work.

MADELEINE DAEPP: Thank you, Ashley. This year is a big year for democracy. In fact, it’s the biggest election year in history with more people voting than ever before. And it’s happening just as generative AI is showing unprecedented new capabilities. Now as a Microsoft researcher, I love generative AI. I use it every day to speed up my code, to punch up my essays. I use it to send emails to my non-English-speaking relatives because German grammar is hard. My colleague Robert Ness and I wanted to understand what a, sorry … but as a Microsoft researcher, I also recognize that AI can be misused. So my colleague Robert Ness and I wanted to understand what that misuse might look like in order to help protect against it. Now we are empiricists, which means that we didn’t want to rely on hypotheticals. We didn’t want to give way to histrionics. We wanted real use cases. And so we went to Taiwan, a place that the Swedish V-Dem Institute has found was subject to the most disinformation of any democracy in the world. And we met with the fact-checkers, journalists, and officials on the infodemics frontlines.

Now as you might expect, we saw deepfakes. But the reality is that deepfakes are just one case of a bigger problem. We’re calling it generative propaganda—generative AI software that makes it easy to turn propaganda text into thousands of videos. Now why is that such a big deal? Because text is boring. Videos are something that you can scroll through for hours. We also saw crisis content creation. When something unexpected happens in the world—a natural disaster or a political gaffe—whoever speaks first often sets the narrative. With generative AI, even if you do not speak the language of the affected place, you do not have to wait for a copywriter. You can automatically generate content about events as they emerge.

We are beginning to see these malicious tactics all around the world. As Microsoft researchers, we belong to a global organization with researchers on many, many continents. And this—well, all of them, except Australia and Antarctica, specifically. This gives us an obligation and an opportunity to do globally relevant work. But you cannot do good global work without understanding local context. And that’s why I am always scouting for collaborators in the places I hope to study. The AI & Society Fellows gives us an opportunity to learn from and with Vanessa Gathecha, a Nairobi-based researcher and policy analyst who works at the intersection of global governance and human welfare. I’ll let Vanessa describe the challenges that she is working on, in her own words.

[Beginning of pre-recorded presentation from Vanessa Gathecha]

VANESSA GATHECHA: Thank you, Madeleine, for this opportunity. And one of the tracks of work that we are working on is on generative AI and plural governance. This is one of the biggest election years in the history of the world, and 12 countries in sub-Saharan Africa are slated to go to the polls. One of the challenges we will likely experience is a spread of hate speech, myths, and disinformation, especially where elections are highly contested. This is really something that affects credible reporting, especially when it comes to just, um, journalism or any aspects of the media, and it also affects access to information for the general public. One of the ways we can curb this is to ensure that just as we do have a broad-based access to this technology is to have collective action when it comes to also the regulation. We need to work together on all levels of governance, on all sectors, but also to ensure that the regulatory framework is not fragmented. Thank you very much for this opportunity. I’m looking forward to collaborating with the rest of the team.

[End of pre-recorded presentation from Vanessa Gathecha]

DAEPP: We need to work together. Tech companies cannot challenge misuse of generative AI in isolation. We need to work with the people on the infodemics frontlines. Democracy requires healthy dialogue and debate. It is actively threatened by generative AI’s misuse. Neither civil society nor technology companies can challenge these problems in isolation. The disruption of our digital public sphere is an all-of-society challenge that requires an all-of-society response. The AI & Society Fellows program is helping to build much-needed connections—in this case, across places, across academic disciplines, and across society’s sectors—to help us understand the problem and work towards an impactful response.

Thank you all.

Research Lab Microsoft Research Lab – Redmond

Program Microsoft Research AI & Society Fellows

The post Generative AI and Plural Governance: Mitigating Challenges and Surfacing Opportunities appeared first on Microsoft Research.

Getting Modular with Language Models: Building and Reusing a Library of Experts for Task Generalization

Microsoft Research Team — Sat, 09 Mar 2024 00:53:36 +0000

Presented by Alessandro Sordoni at Microsoft Research Forum, March 2024

“We have witnessed basically the wide adoption of large language models, such as GPT-4, that have very broad capabilities and can be used to solve a variety of tasks. But they are, kind of, expensive to serve, and somehow, we can actually think and ask ourselves, are they really necessary for most tasks that users … might need?”
– Alessandro Sordoni, Principal Researcher, Microsoft Research Montreal

Microsoft research copilot experience What are expert language models?

Transcript: Lightning Talk 3

Getting modular with language models: Building and reusing a library of experts for task generalization

Alessandro Sordoni, Principal Researcher, Microsoft Research Montréal

Alessandro Sordoni shares recent efforts on building and reusing large collections of expert language models to improve zero-shot and few-shot generalization to unseen tasks.

Microsoft Research Forum, March 5, 2024

Hi, everyone. My name is Alessandro. I’m from Montréal. I’m going to share with you a vision for building, sort of, modular language models.

So we have witnessed basically the wide adoption of large language models, such as GPT-4, that have very broad capabilities and can be used to solve a variety of tasks. But they are, kind of, expensive to serve, and somehow, we can actually think and ask ourselves, are they really necessary for most tasks that users for Microsoft, for example, might need?

This has basically boosted the development of small language models—for example, the mighty and very powerful Phi-2—that can be adapted to user tasks, right. Either with full fine-tuning, which means we change all the parameters of the model, or with parameter-efficient adaptation, for example, by training LoRAs, which only change a small amount of the parameters of the model. And basically, if we can see these experts, basically these model adapters, as experts at their own tasks, right … and this is great because we have now a cost-effective model that can solve the tasks very effectively. But the problem now is that we have only very narrow capabilities. So the question that we ask here is that now that we have all these experts models for each user, for each task, can we actually reuse the experts models for either building small models that have broader capability or for adapting to new users and tasks more efficiently?

Let me show you how this system could work. So we start from a base model, which is Phi-2, and we adapt this base model for every user or for a set of tasks, and we group these, sort of, adapters into a library. And now we, kind of, come up with a, sort of, orchestration mechanism that chooses which adapters to use based on a new user query to produce a system response. The system has, sort of, some desirable properties. One is that it enhances the base language model capabilities via such an expert composition, and this resembles, a little bit, how mixture of experts work, right. But there is a big difference here. It’s that these experts are not trained with the base model itself, but they are trained a posteriori. This leads us to the second point, which is basically a, sort of, decentralized training of these LoRA experts, and this is good because somehow users … for example, this preserves privacy in the sense that we do not need—we do not require—all the data to be shared at once and to always retrain the base model from scratch. And second, energy efficiency. These LoRA experts are very energy efficient. And so basically, we can train it very quickly. The third point is interpretability because these experts are basically associated with the task, usually, that they can solve. And so upon seeing a new user query, we can actually inspect which expert has been activated for that user query, and we can get a little bit of a sense of which actually capabilities are required.

So in order to build this system, we have to answer two questions. One, how do we build such an expert library? And second, how do we select the relevant experts for these new inputs? So the first basically scenario that we are dealing with is a, sort of, private scenario in the sense that we have an ensemble of datasets which are tasks or user data. And in the private scenario, we assume that we cannot share data for these tasks, OK. We cannot train all the data together. And so basically, one standard approach is to fine-tune, for example, a LoRA adapter on each dataset independently. Here in the figure, we are going to end up with a library with three experts. But let’s say that we can actually share a certain amount of this data, for example, if we are dealing with public tasks or stuff like that. So the idea here around this approach is to basically form a, sort of, clustering of these tasks and just train an adapter for each cluster. How do we cluster these tasks actually? We basically do a, sort of, a private LoRA fine-tuning for a few steps at the beginning, to get just a weight for each task, right. And then we cluster the weights of each task by their similarity, and we group tasks together that have high weight similarity. And we train one adapter per task. So at the end, we are going to end up with a library of two experts. This basically relies on the intuition that the similarity in the weight space for these tasks reflects how synergistic these tasks are when adapters are trained on the joint dataset.

Now that we have our great library of experts, we have to choose how to actually select them upon seeing a new input, OK. Here we assume that we do not have access to the data that’s been used to train the experts. So basically, you trained your experts, you gave it to us, and we figure out how to use it. So here to do routing, we basically select which expert to use for each representation in the base model. The base model is a transformer. We have a representation for each layer and for each token. And so basically, we route by dot products between each hidden states in the transformer and an expert representation. But now we have to come up with an expert representation, and we do not know which data these experts have been trained on. Here, to do so, we leverage basically the functional form of these LoRA adapters, which basically produces a linear shift on the hidden state’s representations. And so we, basically, we take the linear transform of the LoRA adapter, we decompose it into singular directions, and we take the top singular direction of that matrix. That gives us our expert representation. We stack those expert representations in the matrix, in a routing matrix. We compute the dot product, much similar to the mixture of experts, kind of, sort of, parameterization, and we choose which experts we use based on the scores obtained by that.

Here, the idea is simple, is that that singular direction gives us a sense of how the hidden states for that expert looked like when that expert was training. So in order to test basically our system here, we assume that we have some data for tasks available, and we use, like, FLAN data, which is just natural language tasks. We evaluate our systems on a set of these 10 tasks used to evaluate Phi-2, and these tasks range from commonsense reasoning, code, BBH—BIG-Bench Hard—et cetera. And so basically these are some results that we obtained in the recent submission, is that we have, like, Phi-2, the first part, which gets out of the box around 64, and then we actually fine-tuned Phi-2 on our own multitask dataset, and this gets a boost of around 65.5. And basically, this approach assumes that we can train on all data, right. And then we have our first dot, which is “Private + Arrow.” And so basically we … private, as I remember, it trains experts independently—256 tasks—and then there is post-hoc routing. And here it was very surprising to us that we can actually get some good performance even with this method.

But if we go further, we assume some sort of selective data sharing and that we have our clustering approach and then the route on top of that, we can get even further gains. And this last method— “MBC + Arrow”—actually adds only 22 million parameters to the model.

So looking forward, I believe that an exciting direction would be really to push this to fully decentralized training and continual improvement of language models in the sense people can train their experts, they give it to the platform, and model gets better. The other point is a heterogeneous library of adapters in the sense that we can actually add different sorts of adapters into this library, each with its own inductive biases, and so we can expand even more the capabilities.

Thank you very much.

Blog Phi-2: The surprising power of small language models

Download Phi-2

Publication LoRA: Low-Rank Adaptation of Large Language Models

Download LoRA

The post Getting Modular with Language Models: Building and Reusing a Library of Experts for Task Generalization appeared first on Microsoft Research.

GigaPath: Foundation Model for Digital Pathology

Microsoft Research Team — Sat, 09 Mar 2024 00:52:19 +0000

Presented by Naoto Usuyama at Microsoft Research Forum, March 2024

“This project (GigaPath) is not possible without many, many collaborators, and we are just scratching the surface, so I’m very excited, and I really hope we can unlock the full potential of the real-world patient data and advance AI for cancer care and research.”
– Naoto Usuyama, Principal Researcher, Microsoft Research Health Futures

Microsoft research copilot experience Summarize how GigaPath is approaching cancer pathology.

Transcript: Lightning Talk 4

GigaPath: Foundation model for digital pathology

Naoto Usuyama, Principal Researcher, Microsoft Research Health Futures

Naoto Usuyama proposes GigaPath, a novel approach for training large vision transformers for gigapixel pathology images, utilizing a diverse real-world cancer patient dataset, with the goal of laying a foundation for cancer pathology AI.

Microsoft Research Forum, March 5, 2024

NAOTO USUYAMA: Hi, my name is Naoto. I’m from Microsoft Health Futures. I’m excited to talk about GigaPath.

Unfortunately, almost everyone gets cancer at some point. And when cancer is suspected, a small portion is taken from a patient, and this small portion is sent to a pathology lab. The pathology lab prepares a sample and creates a pathology slide. And then the pathology slide is examined under a microscope. And this microscopic view provides lots of information into cancer characteristics, profiles, and this information is essential for choosing the best treatment for each patient.

One notable example is immunotherapy. Immunotherapy is, like, one of the cutting-edge cancer treatments, and it works by using a patient’s own immune system, and it’s, like, a new hope for cancer patients. But unfortunately, it doesn’t work for everyone. The key is the tumor microenvironment. Tumor microenvironment means a complex ecosystem within and around the tumor. This includes not just the cancer cells but also normal cells, like immune cells and blood vessels, and how they interact with each other affects the immunotherapy and the success rate. So modeling the pathology images and modeling the tumor microenvironment is very critical.

My slide is not working. OK, thank you …

The pathology image is super detailed, and the size is huge; one file can be a couple gigabytes. And this pathology slide, I’m not sure if you know, but it’s very tiny, only a few centimeters. But with a microscope, you get very high-resolution images, and it can be 120,000 pixels in just one slide. And this size can blow up transformers easily. Typically, vision transformers use only a few hundred tokens, but for us, we get 56 million, so a few hundred tokens and 56 million. And even if we use a larger patch size, we get a lot of tokens. So it is quite challenging to model pathology slide images. So how do we do this?

We are investigating scalable architectures, and one example is LongNet. We are collaborating with Microsoft Research Asia, and the key idea is dilated attention. This dilated attention uses sparse attention patterns instead of dense attention in vanilla transformers. And also, we segment the sequence into smaller blocks and then focus attention within this smaller segment. So sparsity and segmentation make it much more scalable. And we are testing this LongNet idea for pathology images, and that’s the modeling side. And data is critical for foundation models, of course, and we are working with Providence Hospital. Providence Hospital is one of the largest nonprofit hospitals in the US, and together, we are working on creating a large-scale, real-world patient dataset. Our dataset includes more than 1 million cancer patient records. This includes all the clinical notes so text, as well, and genomics data, as well, and radiology images and reports, and, of course, the pathology images. So this is very large scale but also multimodal, longitudinal. So this rich dataset enables us to train a large-scale foundation model. And to make the most of the data, we are exploring self-supervised learning approaches in many ways, like unimodal, multimodal, longitudinal, and that’s basically the GigaPath project: to make the real-world foundation model using the Providence Hospital data.

This project is not possible without many, many collaborators, and we are just scratching the surface, so I’m very excited, and I really hope we can unlock the full potential of the real-world patient data and advance AI for cancer care and research.

Thank you.

Research Lab Microsoft Health Futures

Publication LongNet: Scaling Transformers to 1,000,000,000 Tokens

The post GigaPath: Foundation Model for Digital Pathology appeared first on Microsoft Research.

Multimodal Generative AI: the Next Frontier in Precision Health

Microsoft Research Team — Sat, 09 Mar 2024 00:50:37 +0000

“GenAI can potentially unlock a slew of high-value applications, from improving patient care to accelerating drug development and clinical discovery, to the ultimate dream of precision health: predicting medical events.”
– Hoifung Poon, General Manager, Microsoft Research Health Futures

Multimodal Generative AI: the Next Frontier in Precision Health

By Hoifung Poon

The dream of precision health is to prescribe the right intervention for the right patient at the right time. We are still far from attaining this dream, with cancer being the poster child of the challenges we still face. Despite all the progress medical science has achieved in treating cancer, the standard of care often fails, with the majority of patients not responding to their prescribed treatment.

The confluence of technological advances and social policies has led to the rapid digitization of multimodal, longitudinal patient journeys, such as electronic medical records (EMRs), imaging, and multiomics (i.e., a type of biological analysis that uses multiple “omes”—the genome, epigenome, microbiome, and so on—as datasets). Each modality conveys only limited information about the patient, like a blind person touching one small part of an elephant and trying to describe the whole animal. By synthesizing all relevant modalities, however, we can create a holistic view of the patient.

The availability of such multimodal real-world data enables pretraining of powerful patient embedding, which can serve as a digital twin for the patient. In turn, this high-fidelity patient embedding enables patient-like-me reasoning at scale, which can help to improve patient care by identifying what works and accelerate discovery by pinpointing exactly where and how today’s medicines don’t work. Such real-world evidence (RWE) represents emergent capabilities, which come from assimilating population-scale real-world data and go far beyond the competency of today’s frontier models.

This is exciting, but progress is difficult. Even for table-stake medical technologies, such as two-dimensional (2D) X-rays, existing multimodal frontier models show a large competency gap. Meanwhile, three-dimensional (3D) imaging, such as computerized tomography (CT) and magnetic resonance imaging (MRI), is underexplored, and digital pathology is enormous compared to web images. If we printed a whole digital slide image at the standard printer resolution, it would cover a tennis court. At the cutting edge, emerging modalities such as genomics and spatial transcriptomics (i.e., a molecular profiling method that allows researchers to measure all gene activity in a tissue sample and map it to individual cells) are progressing quickly, with rapidly evolving scale and adoption.

Beyond individual modalities, the challenges multiply even further given the combinatorial explosion (i.e., the rapid growth of possibilities or combinations that researchers must consider when solving a problem). This can be likened to a multimodal “Tower of Babel” situation. We no longer have simplistic scaling laws pertaining to one-dimensional model size and training tokens. Instead, we need to factor in both unimodal and cross-modal data points across all modalities and their combinations.

In translation, the multilingual complexities are often tackled by grounding them in a resource-rich “interlingua” (an intermediary language) such as English. Similarly, in multimodal generative AI (GenAI), text can serve as the 80/20 interlingua modality to drastically simplify learning. Frontier models such as GPT-4 already provide a solid foundation for interpreting biomedical text and assimilating a good portion of public knowledge. Moreover, the study of any modality typically involves natural language. Thus, data is often accompanied by a co-occurring textual description (for example, research literature proves to be a rich source of biomedical multimodal data, such as image-text pairs). At Microsoft Research, we have curated the largest biomedical multimodal dataset from public sources (opens in new tab), with 46 million image-text pairs extracted from millions of papers in PubMed Central. Multimodal real-world data, such as medical images and reports, is even more abundant.

To tackle the multimodal complexities in precision health, we propose a modular approach by factoring patient embedding into unimodal pretraining and cross-modal learning. For each modality, we can train an encoder and decoder to map it to embedding and back. Such pretraining can be conducted independently for each modality by leveraging modality-specific self-supervision, such as masked language modeling for text and DINO (self-DIstillation with NO label) for images. For the text modality, we can piggyback on frontier models or state-of-the-art small language models (SLMs). The encoder and decoder can be the same, as in GPT-4. For cross-modal learning, we introduce a modality-specific adapter, which can serve as a projection layer to “translate” the given modality into the text space. Of course, current text embedding doesn’t capture everything, especially things yet to be discovered (think COVID before 2020). Nevertheless, text still serves as a strong beachhead and can be updated by continued pretraining and fine-tuning.

LLaVA-Med (opens in new tab) shows how this general recipe might work in practice, using image-text as an example. It adopts a modular design, where the vision encoder and text decoder can be plug-and-play from any pretrained models. The hypothesis is that unimodal pretraining already removes a large number of superficial variations. As a result, learning can be very data efficient, focusing on the lightweight adapter, such as a linear layer or a simple multilayer perceptron (MLP). Another key idea about LLaVA-Med is to leverage frontier models (specifically GPT-4) to synthesize multimodal instruction-following data. Given an image-text pair, we take the gold text and ask GPT-4 to generate simulated conversations about the image, using only information from the text. Then, for each generated question-answer pair, we add back the image to form the image, question, answer triad for multimodal instruction-tuning. In this way, GPT-4 can generate a huge amount of multimodal instruction-following data from the original image-text pairs.

We have applied the LLaVA-Med recipe to multimodal patient data, such as radiology image-report pairs, demonstrating substantial improvement over existing frontier models in standard tasks such as identifying key findings from radiology images. The same recipe can also be applied to in-silico imaging by adding an image decoder, as shown in BiomedJourney (opens in new tab). Specifically, BiomedJourney takes consecutive radiology image-report pairs from a patient journey, uses GPT-4 to summarize changes, and then leverages the before image, progression text, after image triad for multimodal instruction-tuning. Given a prior image and the hypothetical progression, BiomedJourney can generate a counterfactual image reflecting the changes.

For digital pathology, the enormous slide size translates into a context of up to a million tokens, which would blow-up self-attention in a transformer model. We have explored advanced techniques such as dilated attention to circumvent such limitations. In joint work with Providence researchers, we have trained real-world pathology foundation models from hundreds of thousands of slides along with their clinical reports, with promising results in pathomics and progression modeling.

By learning multimodal and longitudinal patient embedding from population-level real-world data, multimodal GenAI can potentially unlock a slew of high-value applications, from improving patient care to accelerating drug development and clinical discovery, to the ultimate dream of precision health: predicting next medical events, such as longitudinal disease progression and treatment outcome, as in real-world evidence (RWE).

The multimodal GenAI research work described in this essay stems from collaboration across Microsoft Research, Azure AI, and HLS S&P (Nuance), and includes key collaborators such as Jianfeng Gao, Mu Wei, Matt Lungren, Houdong Hu, Hany Awadalla, Furu Wei, Tao Qin and team.

Research Lab Microsoft Health Futures

Group Real-world Evidence

Publication BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

Publication BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Publication LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

The post Multimodal Generative AI: the Next Frontier in Precision Health appeared first on Microsoft Research.

The Metacognitive Demands and Opportunities of Generative AI

Microsoft Research Team — Sat, 09 Mar 2024 00:49:10 +0000

Presented by Lev Tankelevitch at Microsoft Research Forum, March 2024

“We believe that a metacognitive perspective can really help us analyze, measure, and evaluate the usability challenges of generative AI. And it can help us design generative AI systems that can augment human agency and workflows.”
– Lev Tankelevitch, Senior Behavioral Science Researcher, Microsoft Research Cambridge

Microsoft research copilot experience Explain how metacognition can inform Generative AI usability.

Transcript: Lightning Talk 2

The metacognitive demands and opportunities of generative AI

Lev Tankelevitch, Senior Behavioral Science Researcher, Microsoft Research Cambridge

Lev Tankelevitch explores how metacognition—the psychological capacity to monitor and regulate one’s cognitive processes—provides a valuable perspective for comprehending and addressing the usability challenges of generative AI systems around prompting, assessing and relying on outputs, and workflow optimization.

Microsoft Research Forum, March 5, 2024

LEV TANKELEVITCH: My name is Lev. I’m a researcher in the Collaborative Intelligence team in Microsoft Research Cambridge, UK, and today I’ll be talking about what we’re calling the metacognitive demands and opportunities of generative AI. So we know that AI has tremendous potential to transform personal and professional work. But as we show in our recent paper, a lot of usability challenges remain—from crafting the right prompts to evaluating and relying on outputs to integrating AI into our daily workflows. And [what] we propose in a recent paper is that metacognition offers a powerful framework to understand and design for these usability challenges.

So metacognition is thinking about thinking and includes things like self-awareness, so our ability to be aware of our own goals, knowledge, abilities, and strategies; our confidence and its adjustment, so this is our ability to maintain an appropriate level of confidence in our knowledge and abilities and adjust that as new information comes in; task decomposition, our ability to take a cognitive task or goal and break it down into subtasks and address them in turn; and metacognitive flexibility, so our ability to recognize when a cognitive strategy isn’t working and adapt it accordingly. Let me walk you through a simple example workflow.

So let’s say you decided to ask an AI system to help you in crafting an email. So in the beginning, you might have to craft a prompt. And so you might ask yourself, what am I trying to convey with this email? Perhaps I need to summarize x, clarify y, or conclude z—all in the correct tone. You might then get an output and then need to evaluate that. And then you might ask yourself, well, how can I make sense of this output? In the case of an email example, it’s pretty straightforward. But what if you’re working with a programming language that you’re less familiar with? You might then need to iterate on your prompt. And so then you might ask yourself, well, how does it relate to my ability to craft the right prompt versus the system’s performance in a given task or domain?

And now if you zoom out a little bit, there are these questions around what we’re calling automation strategy. So this is whether, when, and how you can apply AI to your workflows. So here you might ask yourself, is trying generative AI worth my time versus doing a task manually? And how confident am I that I can actually complete a task manually or learn AI effectively to help me do it? And then if I do decide to rely on AI on my workflows, how do I actually integrate it into my workflows most effectively? And so what we’re proposing is that all these questions really reflect the metacognitive demands that generative AI systems impose on users as they interact with these systems. So, for example, at the prompt formulation stage, this involves self-awareness of task goals. So knowing exactly what you want to achieve and break that down into subgoals and subtasks and then verbalize that explicitly for an effective prompt. At the output evaluation stage, it involves well-adjusted confidence in your ability to actually evaluate that output. And so that means disentangling your confidence in the domain you’re working with from the system’s performance in that task or domain.

In the prompt iteration stage, it involves well-adjusted confidence in your prompting ability, so this is about disentangling your ability to craft an effective prompt from the system’s performance in that task or domain, and metacognitive flexibility, which is about recognizing when a prompting strategy isn’t working and then adjusting it accordingly. In the automation strategy level, this is about self-awareness of the applicability and impact of AI on your workflows and well-adjusted confidence in your ability to complete a task manually or learn generative AI systems effectively to actually help you do that. And then finally, it requires metacognitive flexibility in actually recognizing when your workflow with AI isn’t working effectively and adapting that accordingly.

So beyond reframing these usability challenges through the perspective of metacognition, we know from psychology research that metacognition is both measurable and teachable. And so we can now think about how we can design systems that actually support people’s metacognition as they interact with them. So, for example, you can imagine systems that support people in planning complex tasks. So let’s say you’ve decided to ask an AI system to help you craft an email. It might actually break that task down for you and remind you that certain types of content are more common in such emails and actually proactively prompt you to fill that content in. It might also make you aware of the fact that there’s a certain tone or length that you might want to have for this email. And so in this way, it, sort of, breaks the task down for you and actually improves your self-awareness about different aspects of your task.

Similarly, we can imagine systems that support people in reflecting on their own cognition. So let’s say you’ve asked the system to help you craft a proposal based on a previous document. Now a smart system that knows in the past you’ve had to edit this output quite extensively might let you know that you should specify an outline or other details and provide you with examples so that you can save time later on. Similarly, at the output evaluation stage, you can imagine how such an approach can augment AI explanations. So this is work done by the Calc Intelligence team here at Microsoft Research, and it shows a system that can help users complete tasks in spreadsheets. And it shows a step-by-step breakdown of the approach that it took to complete that task. So you can imagine a system that proactively probes users about different steps and their uncertainty around those steps and then tailors explanations effectively to that user’s uncertainty.

So in sum, we believe that a metacognitive perspective can really help us analyze, measure, and evaluate the usability challenges of generative AI. And it can help us design generative AI systems that can augment human agency and workflows. For more details, I encourage you to check out the full paper, and I thank you for your time.

The Metacognitive Demands and Opportunities of Generative AI

By Lev Tankelevitch

Generative AI (GenAI) systems offer unprecedented opportunities for transforming professional and personal work. This potential stems from a unique combination (opens in new tab) of generative AI’s model flexibility, in that systems can accommodate a wide range of information in prompts and outputs; generality, in that systems are applicable to a wide range of tasks; and originality, in that systems can generate novel content. However, these properties are a double-edged sword, in that they also pose usability challenges for people working with GenAI systems. Studies show that people find it difficult to craft effective prompts, evaluate and rely on AI outputs, and optimize their workflows with GenAI. In recent work (opens in new tab), we propose that metacognition—the psychological ability to monitor and control one’s thoughts and behavior— offers a valuable lens through which to understand and design for these usability challenges.

Current GenAI systems impose multiple metacognitive demands on users. A useful analogy for how people work with GenAI systems is that of a manager delegating tasks to a team. A manager needs to clearly understand and formulate their goals, break down those goals into communicable tasks, confidently assess the quality of the team’s output, and adjust plans accordingly along the way. Moreover, a manager needs to decide whether, when, and how to delegate tasks in the first place. All these responsibilities involve the metacognitive monitoring and control of one’s thought processes and behavior (i.e., cognition). Working successfully with GenAI systems requires these same abilities.

Prompting is the first challenge people face when interacting with GenAI systems, according to multiple (opens in new tab) studies (opens in new tab). With a manual task, such as drafting an email, many implicit goals and intentions can remain so without ever being verbalized. For example, one might implicitly know to adopt a certain tone when writing to a senior colleague, or that a given draft first requires summarizing certain content and then pulling out key points for discussion. In contrast, all these details need to be explicitly specified in order for GenAI systems to execute tasks effectively. This requires people to be aware of their goals, have the ability to break down tasks into sub-tasks, and to verbalize all of this within a set of prompts—abilities comprising metacognitive monitoring and control. After submitting a prompt, a common step is assessing the output (i.e., determining whether the system achieved what was intended) and iterating on the prompt. If the result is not satisfactory, a key challenge for users is disentangling the role of their prompting ability from that of the system’s capabilities. A poor result can arise due to a poorly worded prompt, the system’s settings, the system’s limitations on a given task, or an “unlucky draw” given the probabilistic nature of many GenAI systems. The range of possible explanations make it difficult for people to appropriately adjust their confidence in their prompting ability, a key form of metacognitive monitoring—much like a manager who can’t figure out whether their instructions are unclear or if their team is underperforming for other reasons. A subsequent challenge for users is being flexible enough to change prompting strategies as needed, whether that means rewording their most recent prompt, breaking up their task into further sub-tasks, or taking another approach.

Putting prompting aside, assessing the system’s output poses further challenges for people in terms of deciding whether to rely on it. This is particularly true when people use GenAI in a domain in which they have little expertise (opens in new tab), such as a new programming language. Again, the challenge here is to appropriately adjust one’s confidence, albeit now in the domain itself and the ability to assess the output. This is critical given the risk of incorrect or incoherent results, or other errors that GenAI systems can produce. Alongside the many possible explanations for a given output, this is further exacerbated by the speed and ease with which GenAI systems can produce extensive outputs, including entire presentations, reports, or software. This is akin to a manager with a prolific team that they cannot fully trust. Maintaining an appropriate level of confidence in assessing GenAI output is important, because this may influence the extent of over- or under-reliance on this technology.

Zooming out from individual interactions with GenAI systems, there is a broader question of how people can determine whether, when, and how they should use GenAI (opens in new tab) for tasks within their workflows. Determining this “automation strategy” requires awareness of, and an appropriate level of confidence in, the applicability and potential impact of using GenAI within a workflow. For example, people need to recognise when relying on GenAI is more productive than doing a task manually, or vice versa. Over time, people also need the flexibility to change strategies as they assess their workflows.

Rather than imposing metacognitive demands on users, GenAI’s model flexibility, generality, and originality also present many opportunities for improving people’s metacognition—that is, their self-awareness, adjustment of confidence, and flexibility in their cognition. An exciting area for exploration is designing systems that can proactively support users in planning tasks with GenAI. For example, systems can transform people’s initial high-level task into a series of sub-tasks (opens in new tab), clarifying their goals in the process, and helping them craft effective prompts. At the output assessment stage, systems can also proactively help people reflect (opens in new tab) on the output, their confidence in it, and any uncertainties they may have, either in the GenAI interaction or in the domain itself. Indeed, by adapting information to people’s level of confidence and areas of uncertainty, GenAI systems could help augment explanations (opens in new tab) for their own outputs in an interactive manner. At a broader level, GenAI systems can proactively support people in reflecting on and, in turn, optimizing their workflows as they incorporate GenAI into their tasks. In a way, it’s as if each manager had a personal coach and mentor that supported them as they learned to manage their team.

As we offload more of our cognitive tasks (e.g., ideation, synthesis, writing) to GenAI systems, it becomes increasingly important to understand how we monitor and control our cognition. This is precisely where the perspective of metacognition fits in. Beyond helping us re-frame the key usability challenges emerging in human-GenAI interaction, metacognition can also inspire novel system designs that augment our cognition, improving our self-awareness, confidence adjustment, and flexibility as we work with GenAI systems. As hinted here, there is a lot more work to do in this space.

The thinking around metacognition and GenAI described here is a Microsoft Research project, with key collaborators that include Viktor Kewenig, Auste Simkute, Ava Scott, Advait Sarkar, Abi Sellen, and Sean Rintel.

Research Lab Microsoft Research Lab – Cambridge

Group Collaborative Intelligence

Project Calc Intelligence

Podcast Abstracts: February 29, 2024

Publication The Metacognitive Demands and Opportunities of Generative AI

Publication “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models

The post The Metacognitive Demands and Opportunities of Generative AI appeared first on Microsoft Research.

What’s new in AutoGen?

Microsoft Research Team — Sat, 09 Mar 2024 00:46:43 +0000

Presented by Chi Wang at Microsoft Research Forum, March 2024

“AutoGen has a large community—very active—of developers, researchers, AI practitioners. They are so active and passionate. I’m so amazed by that, and I appreciate all the recognition that was received by AutoGen in such a short amount of time.”
– Chi Wang, Principal Researcher, Microsoft Research AI Frontiers

Microsoft research copilot experience Tell me about AutoGen.

Transcript: Lightning Talk 1

What’s new in AutoGen?

Chi Wang, Principal Researcher, Microsoft Research AI Frontiers

Chi Wang discusses the latest updates on AutoGen—the multi-agent framework for next-generation AI applications. This includes milestones achieved, community feedback, new exciting features, and ongoing research and challenges.

Microsoft Research Forum, March 5, 2024

CHI WANG: Hi, everyone. My name is Chi. I’m from Microsoft Research AI Frontiers. I’m excited to share with you the latest news about AutoGen. AutoGen was motivated by two big questions: what are the future AI applications like, and how do we empower every developer to build them? Last year, I worked with my colleagues and collaborators from Penn State University and University of Washington on a new multi-agent framework.

We have been building AutoGen as a programing framework for agent AI, like PyTorch for deep learning. We developed AutoGen inside an open-source project, FLAML, and in last October, we moved it to a standalone repo on GitHub. Since then, we’ve got new feedback from users every day, everywhere. Users have shown really high recognition of the power of AutoGen, and they have deep understanding of the values in different dimensions like flexibility, modularity, simplicity.

Let’s check one example use case.

[Beginning of pre-recorded testimonial.]

Sam Khalil, VP, Data Insights & FounData, Novo Nordisk: In our data science department, AutoGen is helping us develop a production ready multi-agent framework.

Rasmus Sten Andersen, AI engineer lead, Novo Nordisk: Our first target is to reduce the barriers to technical data analytics and to enable our broader community to derive insights.

Georgios Ilias Kavousanos, data engineer, AI Labs, Novo Nordisk: We are also extending AutoGen with the strict requirements from our industry given the sensitive nature of our data.

[End of pre-recorded testimonial.]

WANG: That is one example use case from the pharmacy vertical. We have seen big enterprise customers’ interest like this from pretty much every industry vertical. AutoGen is used or contributed [to] by companies, organizations, universities from A to Z, all over the world. We have seen hundreds of example applications, and some organizations use AutoGen as a backbone to build their own agent platform, and others use AutoGen for diverse scenarios, including research and investment to novel and creative applications of multiple agents. AutoGen has a large community—very active—of developers, researchers, AI practitioners. They are so active and passionate. I’m so amazed by that, and I appreciate all the recognition that was received by AutoGen in such a short amount of time. For example, we have been selected, our paper is selected by TheSequence as one of the top favorite AI papers in 2023. To quickly share our latest news, last Friday, our initial multi-agent experiment on the challenging GAIA benchmark turned out to achieve the No. 1 accuracy in the leaderboard in all the three levels. That shows the power of AutoGen in solving complex tasks and the bigger potential.

This is one example of our effort in answering a few open hard questions, such as how to design an optimal multi-agent workflow. AutoGen is under active research and development and is evolving at a very fast pace. Here are examples of our exciting new features or ongoing research. First, for evaluation, we are making agent-based evaluation tools or benchmarking tools. Second, we are making rapid progress in further improving the interface to make it even easier to build agent applications. Third, the learning capability allows agents to remember teachings from users or other agents long term and improve over time. And fourth, AutoGen is integrated with new technologies like OpenAI assistant and multimodality. Please check our blog post from the website to understand more details.

I appreciate the huge amount of support from everyone in the community, and we need more help in solving all the challenging problems. You’re all welcome to join the community and define the future of AI agents together.

Thank you very much.

Project AutoGen

Download AutoGen

Blog AutoGen: Enabling next-generation large language model applications

The post What’s new in AutoGen? appeared first on Microsoft Research.

Panel Discussion: Transforming the Natural Sciences with AI

Microsoft Research Team — Sat, 09 Mar 2024 00:44:58 +0000

Hosted by Bonnie Kruft, with Rianne van den Berg, Tian Xie, Tristan Naumann, Kristen Severson, and Alex Lu at Microsoft Research Forum, March 2024

“Just as in the fields of health and biology, machine learning is really beginning to disrupt some of the traditional pipelines that happen in materials discovery.”
– Tian Xie, Principal Research Manager, Microsoft Research AI4Science

Microsoft research copilot experience How is AI transforming the natural sciences?

Transcript: Panel Discussion

Transforming the Natural Sciences with AI

Bonnie Kruft, Partner Deputy Director, Microsoft Research AI4Science (Host)
Rianne van den Berg, Principal Research Manager, Microsoft Research AI4Science
Tian Xie, Principal Research Manager, Microsoft Research AI4Science
Tristan Naumann, Principal Researcher, Microsoft Research Health Futures
Kristen Severson, Senior Researcher, Microsoft Research New England
Alex Lu, Senior Researcher, Microsoft Research New England

Microsoft researchers share their advancements in the fields of foundation models, drug discovery, material design, and machine learning. They highlight how deep learning is transforming the natural sciences.

Microsoft Research Forum, March 5, 2024

BONNIE KRUFT: I’m joined here by my colleagues in Microsoft Research. We’re all working in different teams in AI for science, working at the intersection of machine learning and the natural sciences. So my name is Bonnie Kruft. I work in the AI4Science team in Microsoft Research in Cambridge, and I’m joined here with Rianne.

RIANNE VAN DEN BERG: Hi, my name is Rianne van den Berg, and I’m a principal research manager in AI4Science at Microsoft Research located in Amsterdam, in the Netherlands. And I co-lead a project on density functional theory. Yeah, that’s it.

TIAN XIE: Hello, everyone. So my name’s Tian Xie, so I’m a principal research manager at Microsoft Research AI4Science. I’m located in Cambridge in the UK. So I lead the project that works on generative models for materials, so the MatterGen model that Chris has mentioned earlier is coming from our team. So, yeah, very nice to be here.

KRISTEN SEVERSON: Hi, everyone. My name’s Kristen Severson, and I’m a senior researcher in the Bio ML team at Microsoft Research New England. I’m broadly interested in how we can use machine learning for applications in human health with a current focus in computational pathology.

TRISTAN NAUMANN: Hi, everyone. My name is Tristan Naumann. I’m a principal researcher in the Real-world Evidence Group here at Microsoft Research’s Health Futures, where we’re looking to advance health at the speed of AI. My research focus is really at the intersection of artificial intelligence and health care, specifically the application of natural language processing in this space.

ALEX LU: Hi, everyone. My name is Alex Lu. I’m a senior researcher, part of the Bio ML team at Microsoft Research New England. My research concentrates on how we can use AI to make new biological discoveries, particularly in the realm of cellular and molecular biology.

KRUFT: OK, great. So our first question is, what real-world impact have we seen already today in health care, drug discovery, or in materials science?

NAUMANN: Yeah, so maybe I’ll start. I think this is an incredibly exciting time in health care. If we, sort of, think to some of the goals of precision health over the years, it’s really to look to apply the right intervention for the right group of people at the right time. And one of the things that’s crucial to realizing that is really this reality that we need a data-driven learning system that’s able to adapt and then also really incorporate new information instantaneously. Historically, this has been incredibly challenging because much of the data we have in health care is not nicely structured in a clean, easy-to-use way. And so one of the things that’s really incredible about some of these recent advances in generative AI—specifically large language models, LLMs, also large multimodal models—is really this opportunity to have a tool for universal structuring, and unlocking some of that data quickly and efficiently really opens up a lot of new opportunities. I think another thing that’s really nice about some of these techniques in generative AI is the innate accessibility of some of these tools. So a lot of the clinical collaborators, other care staff that we work with, they can use some of these tools, as well. And so really taken together, you have this new opportunity to be able to quickly access a lot of the information that potentially holds the future of medicine.

KRUFT: That’s great. You mentioned universal structuring. Can you touch maybe on an area where that’s used already in health care?

NAUMANN: Yeah. So I think actually maybe a little bit of context. If we think to the cancer space specifically, we have this, sort of, interesting paradigm where the center of care often fails. And so the, sort of, last resort for many patients is a clinical trial. And unfortunately, in the US, very few patients are actually able to enroll in some of these trials. So the number is ranged, but perhaps something like 3 percent. And on the other side, we have a number of pharmaceutical companies who are indicating that there’s actually a really large number of trials that fail because of insufficient number of patients. So up to 40 percent. And so there is this, you know, maybe immediate gap there that we might want to address. And so we’ve looked at some first steps toward this with our partners at Providence health system really looking to start to close up that gap. So specifically, some of the work we did recently was looking at how we could scale clinical trial eligibility criteria. So taking, for example, the unstructured text from something like ClinicalTrials.gov, bringing that into logic formats that could be more easily used by a lot of things downstream. And then really looking to how we can make that accessible to the clinicians who are trying to match that for patients, as well.

KRUFT: That’s great. Thank you. And what about in biology?

LU: Absolutely. I would describe biology as being a very heterogenous and fragmented landscape. I mean, this is not surprising because there are just so many subdisciplines in biology that not everyone uses AI in the exact same way. So, for some context, I concentrate my research on three main areas. I do work in proteins; I do work in omics, which concerns genes and how they’re expressed; and I do work in images, particularly microscopy images. And I would describe each of those fields as basically being at a different level of maturity. So, for example, for proteins, we have very well-founded structure predictors; protein language models are routinely integrated into bioinformatics prediction pipelines. But for omics data, the idea of using a large amount of data to pretrain a model, even though there have been a lot of really inspiring, like, precursors to this work, is really just emerging, and the conversation in that area is just beginning, then. And then similarly for images, which I believe is poised for the next revolution because you can see that there are people, like, really ramping up their data collection efforts and there are these massive datasets that only recently, like, hit the public sphere, but the extent that people have really worked on these datasets compared to the work in proteins is actually very limited, then. So there are a number of factors that influenced this. To speak to a few, one factor is how much a field has, a field’s problems can be posed as well-posed predictable problems for AI. So, for example, in the realm of protein engineering, a lot of these problems can be formalized as well-posed prediction tasks—predict the structure of a protein given the sequence of a protein, for example. But for a lot of things in biology, that’s not necessarily true. The task is more exploratory in nature: “Hey, I’ve got a huge amount of data. Help me understand and comprehend this data.” It’s not really a task that you can easily evaluate whether you’re doing good or not simply because there’s a subjective element to that, then. Another factor is how centralized the dataset sharing and collaboration is. So for example, I’ll point to proteins, again, as a wonderful example of this because for the longest time, even before AI became a thing, it was standard to deposit your protein sequences in a single centralized repository, and then eventually that became the foundation on which many people train their models on now, then. In contrast, I would describe images as almost being, like, the opposite situation, not as if the biologist don’t agree that it’s important to share the data—many do—but the formats, where you share your data, what your data looks like, even, like, your file conventions, all that varies very drastically and so that data has traditionally not been accessible for machine learning practitioners.

KRUFT: Yeah, that’s a great point. Thank you. And what about in materials?

XIE: Yes. So, yeah, similar. Just as in the field of health and in biology, right, machine learning is really beginning to [disrupt] some of the traditional pipelines that happen in materials discovery. So one area that has been especially innovated here was basically there’s this huge area about using large-scale computation, high-throughput screening, to discover new materials. So this was just running large-scale quantum mechanical calculations and to screen thousands, tens of thousands, of materials to find new materials for a variety of different applications. But the limitation of this field was that a lot of these simulations were very, very expensive, so you cannot go into a lot of materials. So one area that Chris mentioned in his keynote was the building up these machine learning emulators. So they have been shown to be able to speed up the simulation of materials properties by at least a thousand times, allowing the community to really screen a lot more materials in a much faster speed than what has been possible before. So another important area is really generative models because for these traditional, like, screening-based methods, I think you are, kind of, limited by the number of candidates, right? You know for materials, so for materials, this is usually in the order of hundreds of thousands of materials that you can screen from. But now with these generative models, you can, kind of, expand into this much larger hypothetical space of at least five to six order magnitude larger by generating materials guided by the property that you’re interested in, like the MatterGen model that we had developed earlier. So this opens up a lot of opportunity for discover much better materials in many different domains. So this research has really created a lot of excitement in the community, across industry, that a lot of these materials companies are beginning to pay much more attention into this AI tools, but obviously this is not as mature as what we have seen in pharma industry, where many of these AI tools has already been integrated into, like, their drug discovery pipelines.

KRUFT: Yeah. So what strategies would you take to increase adoption and trust of those AI models in the materials industry?

XIE: Yeah, I think this is a wonderful question because I think that adoption is really the key for this AI models to have a real-world impact, right. So what I see today in adopting AI models for materials is, kind of, very similar to what we’ve seen in AI for drugs in around maybe 2020. So at that time, there was actually a research coming from MIT where the researchers were … managed to discover a new antibiotic drug from this AI models. So I think this is, kind of, like a wakeup call for a lot of the pharma companies to become paying much more attention into this new AI tools. So after that, we see a lot of investment from the pharma companies to, kind of, developing their internal teams to be able to really utilizing these tools and being able to integrate that into their internal pipelines. So it’s, kind of, a gradual process because it’s also internal trust-building process. At the beginning, a lot of the pharma teams, they don’t trust these models. They would prefer their traditional pipeline, right? But once they see one or two examples, right, of this new drugs designed by this generative models being actually performing better than something that they come up with their traditional pipelines, then they begin to adopt. So I, kind of, see where we are today in the materials as, kind of, similar to this, kind of, early stage where these materials companies begin to try out these models and they are going to be a lot of iteration going forward. But I’m quite optimistic that these AI tools will begin to make a bigger impact in the coming maybe two to three years.

KRUFT: Yeah, thanks, Tian. Having come from the pharmaceutical industry myself, I’ve definitely seen that transformation over the past couple years, so it’ll be really interesting to see that happening in materials, as well. So the next question we have is, AI has been used in different applications in different industries, but how do you think that science is different?

SEVERSON: In the health care space, I think there are two main differences. And the first that I’d want to highlight is data. So Tristan already mentioned this, that a lot of the data in the health space is not in a format such that we could leverage it for AI applications. It’s also often that there are privacy concerns about how we might pool this data together, and it sits siloed in various health systems. So these factors combined means that health data is oftentimes quite small, and that’s a major difference between what we’ve seen in more classic machine learning applications. I do think recently there have been a couple of factors that have started to change this, one being the rise of these large language models that can help us process the data into a format that’s usable as well as just performance gains. I think that those performance gains have inspired the health industry to think, what can we do with these types of models? What types of innovations might we see? The second piece of what differentiates health, though, I would say is an interest in building on prior knowledge. We have a lot of knowledge about diseases and how they manifest, and we don’t want to leave that information on the table when we train machine learning model. So there’s not an interest in using solely black-box approaches but instead building on what’s already known. And we could think of a specific example of that in terms of some of the invariances that were mentioned earlier. So if I focus on a digital pathology image, we have that same rotational invariance where you can rotate the image and it has the same context. One way I think those images vary as compared to natural images is in terms of the resolution. Because of the way the data is generated, we have rather fixed scales, where each pixel maps to a certain number of microns, and we might hope to leverage that information when we’re trying to describe the morphology of the tissue that we’re analyzing. And so while we have that piece at the development side, I think there’s also considerations, how we want to build on that knowledge at the deployment side, where clinicians might be hesitant to leverage something that’s really black box. So how can we build their trust in a similar but different way to what Tian was mentioning?

KRUFT: So tell me, what are some of the key factors for success and trust in those models?

SEVERSON: Yeah, I think there’s a lot of different ways you can use a machine learning model in a health application, but if we focus specifically on the point of care, I think at least in the near term, the gold standard is going to remain randomized controlled trials. An RCT is something that a clinician is already familiar with and is really the standard for evaluating some new tool in the health space. And I think that we know that this is possible in health care. One of our partners in pathology, Paige, actually did go ahead and do an RCT to get FDA approval for their prostate product, which is useful for detecting prostate cancer in tissue samples.

KRUFT: Oh, wow. That’s fascinating.

SEVERSON: And I think one thing that would be interesting to highlight here is some of the different ways the data is generated. So in the health care space, there’s a lot of excitement about leveraging the data that’s already been generated as part of the standard of care. But I think this really varies from some of what I’ve heard about materials, and I’d love for Rianne to talk a little bit more about that.

VAN DEN BERG: Yeah, so I think one of the areas that, or one of the areas where natural sciences and AI, kind of, differentiate itself also from other areas like vision or language or even maybe in health is that the importance of synthetic data should not be underestimated. So, and this goes in particular for, I think, models that are targeting the molecular and the material sciences because, as Chris already mentioned in his presentation, in this case, these models often aim to replace very expensive simulators with faster AI emulators or generators. And so contrary to, for instance, models that are trained for vision or for language, where the data are real images or text that’s scraped off the web, in this case, these models are trained on the data that is generated by the simulators that they aim to replace, right. So we are training on synthetic data, and that has the advantage that we have less privacy concerns, which is a good thing. But there’s also challenges that come with that because generating this data is quite expensive, and there is less publicly available data than in, for instance, areas such as vision and language. So when you want to generate new data, you have to do this with a very effective strategy. So new data should really improve the performance of your model. You shouldn’t just start randomly generating data at large scale without knowing that. And this also highlights, again, the importance that we’ve already mentioned before, that generalizability from small amounts of data is very, very important.

XIE: Yeah, actually, I want to dive a little bit deeper about synthetic data for materials because this is actually really, really important for a lot of machine learning models for materials. Compared with some other spaces, the availability of data that’s coming from experiments is actually quite small for materials. I think mainly because materials are pretty diverse; [there’s a] variety of different class of materials, and the data was pretty sparse in a lot of these domains. So what the community ended up leveraging is basically these large-scale computational simulation workflows that has been developed over the past decade in simulating the materials’ properties of a variety of different material design problems, let’s say the solar cells catalysis or the batteries. So this actually all started from this initiative called Materials Genome Initiative that was established in 2011 by the Obama administration as a way to develop the infrastructure and the simulation capability to use simulation to significantly speed up the materials development process. So building around this initiative … so there has been a lot of effort all around the world in building up this open community over all these workflows to simulate materials and generate data. So this creates a lot of open databases that is currently actually powering the majority of the machine learning models that we have seen today.

KRUFT: So what do you think are the future opportunities for data generation in materials?

XIE: Yeah, I think there are mainly two opportunities going forward that I can see. The first is really the leverage of the new coming cloud infrastructure like what we have here with Azure. So in our team, we have this experience that we were able to utilize a lot of low-priority compute on Azure to really scale up our simulation to generate, to simulate a million materials in just a week, which would normally take, I think, a couple of months if you’re doing this in a more traditional kind of academic setting where you have a server of your group, right; you can only run things in a smaller scale. I think this creates, really creates a lot of opportunity to generating lots of data because you have a lot of compute power available on this cloud infrastructure that is actually underutilized.

KRUFT: Yeah.

XIE: So the second area that I see this creates a lot of opportunity is the autonomous labs that has been gaining to building up in this space of materials, so compared with pharma, where you see a lot of this automation, this is actually a pretty new thing in this space in materials, and in just the in this last one, two years, people has been spending a lot of investment in building out this autonomous labs for a variety of different material design problems, see batteries and also metal-organic framework synthesis, et cetera, and so it used to be that if you were a graduate student, right, it could take maybe one month or several months for you to synthesize a material. But now, with these autonomous labs, you can do tens or even hundreds every month, really significantly speeding up the throughput of experimental data generation. I think this is another very exciting opportunity that I can see in this space of materials for data generation.

KRUFT: Very exciting. And what about in biology, Alex?

LU: Sure. I would say that as a computational biologist, I sometimes feel very fortunate to be working in the field because I don’t struggle with many of the complications with data that my colleagues describe. So, for example, unlike my colleagues in health care, the data did not tend to be private. We’re dealing with microorganisms, basic biology measurements. It’s not associated with patients or human beings, so there’s no issues of showing up, then. And in contrast to my colleagues from materials, in many subdomains of biology, there are scalable data collection processes. So, for example, I remember when I first entered microscopy 10 years ago and people would conceptualize microscopy as, like, this very low-throughput, subjective thing where individual biologists would look at slides under a microscope and that was the extent of understanding data then, right. But since then, we’ve really revolutionized the data collection processes. We now have robot-controlled microscopes, and they can collect tens of thousands—maybe even millions—of images just in the course of a single week. So already you’ve converted like microscopy data into what was originally a low-throughput, qualitative science to a high-throughput and then necessarily a quantitative science, then. And so while I cannot say that every single, like, piece of data collection in biology is scalable at that level, what I can say is that there a lot of efforts to improve the scale at which we collect biology across multiple disciplines. I feel like any domain that I entered, there’s always people thinking about, how do we scale this data collection, then.

But at the same time, just because we have an abundance of data for training doesn’t necessarily mean that we have the right data for training. If you look at what particularly differentiates biology—and, I suspect, by extension a lot of other scientific disciplines—is the whole point is to try to discover something new, right? So by definition that new thing is not going to be captured in your original distribution of data. So take proteins as an example. The most interesting application could be trying to design a protein that has some kind of a property or function that is not seen in any existing protein out there on the planet, then. And so while we have a very large database of proteins—UniProt contains 200 million proteins—all of those are proteins from natural organisms that do exist out there on the planet, then. So there’s already a bit of a mismatch between the data we have to train and what it is we actually want to do with this application, then. So this means that we have to be careful and intentional about the fact our data may not actually reflect what it is that we want to do, then. There are multitudes of approaches to this. One, I think, has been more intentional with data collection, which resonates with themes that Rianne talked about, then. I think in the earlier days of AI, we saw, OK, through preexisting biological databases and then people will use these databases to train models. Now you’re seeing more of an exchange between the ML practitioners and the biologists, then. So you’re seeing a lot of biologists thinking, OK, I need to intentionally collect data to allow the thought of AI efforts, then, and then in doing so, they collect more diverse data. So, for example, in the latest microscopy datasets, what they do is that they’ll intentionally collect data from multiple or different sources across, like, the world, knowing that differences between, like, different laboratories and the way that they collect images is a big barrier for AI, then. The other way I think we should be doing this is just being very intentional about the method that will actually produce it, then, because again, the whole goal is, like, to generalize from that known distribution, to extrapolate beyond that known distribution, to something unknown, then. And I think you have to think from first principles, like what methods are suited to do that extrapolation and what methods are better in distribution.

KRUFT: Yeah, great, great point. So how are you guys—what are some specific techniques that you’re using to create a big impact in science overall?

VAN DEN BERG: So I think that here is worth focusing on something that Alex has mentioned before, and that is the importance of scientific discovery when you look at AI for the natural sciences. And here we can think about scientific discovery in materials and in drugs. And obviously generative models play a very big role here, right, because they can learn what existing materials and drugs look like and use that knowledge to essentially search in a space of unknown materials and drugs. And one particular class of generative models that I’m very excited about and that’s becoming increasingly popular is that of diffusion models and score-based generative models. And these models have been super successful already, for instance, in high-resolution image generation and video. And I think they’re also very naturally suited to target scientific discovery. So the way that they are trained is that at training time, they get data that is corrupted to various degrees, and the model then has the task to try to de-corrupt those data, these corrupted data points in an iterative process. And then at test time, what you can do is you can feed it a data point that is complete noise and iteratively apply this model such that it turns that noisy sample into a sample, for instance, for a new material or a drug. And what is really exciting is that we’ve seen some very cool applications of these type of models already. For instance, in protein sequence generation, something that Alex has already worked on.

LU: Sure, I can speak to that. So this is the EvoDiff work, and the goal of that is to generate novel protein sequences. So to give you some context for this, proteins essentially actualize the majority of biological functions at the molecular level. So to be able to design them has vast application. So, for example, some of the things that people have done with protein design in the past will be to design proteins that can metabolize and break down plastics or to design proteins that can synthesize new, like, pharmaceuticals, then. Obviously, very impactful applications. And frequently, the goal, again, is to try to do something that’s not really present in that of nature. So you really want a protein that can synthesize a new pharmaceutical as opposed to just replicate, like, a protein that already synthesizes, like, a compound that [already can be produced]. So what makes this problem really difficult is just how vastly expansive the search space it. Proteins are made out of building blocks called amino acids. There are 20 different amino acids, then, and then they connect in a chain to form a protein, then. So typically proteins are around a few hundred to a few thousand amino acids in length, then. So when you talk about the search space, it is actually 20 to the x, where x is a few hundred to a few thousand letters, then, and you can see that the strategy for such search space is not really that functional. If you look at the landscape of any possible proteins, the majority of combinations essentially produce just gibberish. You know, not going to, like, produce a viable protein, then. So the goal is to be able to hone into, like, to toe that fine balance between novelty and function. You want to discover novel proteins, but you want to ensure that these proteins are functional in nature, then. So our strategy, yeah, is exactly to use, like, these diffusion models. So this is essentially a discrete diffusion task. Like, you’re trying to predict, OK, each position, what amino acid should be slotted into that particular position, then. And our strategy is essentially to use the distribution of known proteins. I alluded to the UniProt database, which contains, like, natural proteins across all walks of life, then, train on that, then, and extrapolate to generate novel proteins that are hopefully within that functional distribution because, like, since these are proteins from natural organisms, by evolution, the likelihood you, like, the vast majority will be functional, then, but that helps us train a search space so that we design mostly a functional protein, then.

VAN DEN BERG: So I think that ties in really nicely, one of the things you said, to why diffusion models are, I think, very attractive for the natural sciences—because they are, it’s very easy to adapt them to the different types of data that you find or discovery tasks that you have in the natural sciences. So Alex already mentioned that, you know, you can use diffusion models for protein sequence generation, which is inherently discrete data task. But if you were interested in structure generation of proteins, then what you’d need to generate are samples where you describe the positions of the different atoms in the protein, and that’s an inherently continuous problem, right. You have to describe where the positions are of the atoms. Now if you want to take it even a step further and look at, for instance, crystal structure prediction, then you have to describe the generation of an object that has combinations of discrete parts, but also continuous parts, so the positions of the atoms, but you also have to describe what the different types of the atoms are. And I think what’s nice is that diffusion models can very naturally handle all of these scenarios easily. And Tian has actually done some really exciting work recently on this. So maybe you can tell a little bit more about that.

XIE: Yeah, absolutely. So, yeah, I want to talk a little bit more about the MatterGen, which was the model that Chris has also mentioned earlier in his keynote. I think this is such a great example to demonstrate the kind of flexibility of it [diffusion models] being able to handle different kind of types of data—discrete and continuous. So for materials, we usually represent a structure using its so-called unit cell, which is the smallest—because it’s actually an infinite periodic structure, so you want to looking at its smallest repeating unit. So then there are three different attributes. One is the atom types, and second is the atom positions, and last is lattice, which, kind of, defines this periodic box and defines how this crystal structure repeats, right, in this three-dimensional space. So the atom type is a discrete variable, and atom position and lattice, they are continuous variables, and they are all geometry. For example, coordinates are in this periodic space where the lattice, the, kind of … you need to impose some constraints to avoid that getting into a pretty skewed lattice. So due to the flexibility of the diffusion model, so we actually build a very specialized diffusion process for each one of these attributes—atom types, positions, and lattice. And by combining together, this, kind of, lead into this MatterGen model that we have found that by doing these innovations around how do you build diffusion for different attributes, we’re able to significantly outperform traditional previous models that takes, that does it in a less careful way. So I think another advantage of diffusion model is really the ability to add in different conditions so that you can guide the generation towards the direction that you’re interested in. So I think most of you are probably familiar with this model, like DALL-E 2, DALL-E 3, right, that is text-based generative model that you can, kind of, guide the generation of an image using these text values. So in materials, we do something very similar. We can actually guide the material generation using things like electronic property, magnetic property, or even things like chemistry. And the flexibility of diffusion models basically allow us to very easily to add these different conditions into the model. This actually provides a much more flexible tool for the materials science community to basically being able to compose different types of constraints that they are interested in to solve in their own material design problems.

NAUMANN: Yeah, and maybe, maybe I’ll touch a little bit on two of the models that our group has worked on recently in collaboration with the Deep Learning Group and collaborators in health and life sciences. One of them based in diffusion; the other not as much. So we have some variety in this space, as well. The first is this model BiomedJourney, where we’re actually looking at this task of counterfactual image generation. So you can, sort of, imagine taking a medical image, maybe something like a chest X-ray, adding some sort of text like maybe “resolved pleural effusion” or some other condition that you’re hoping to apply, and then trying to see, you know, what would that look like. And you can imagine a variety of potential use cases, whether it’s, sort of, training or even synthetic generation of data for health care cases. And in another one of the works, LLaVA-Med, we’re really looking at, you know, how can we create, sort of, a GPT-like interface that’s able to use both images and, and some of the text data, as well, and interact with some of those images in a much more useful way for some of the clinical practitioners that we work with. I think actually touching on one of the things that Kristen had mentioned earlier, you know, there is always this desire to use some of the data that’s actually out there. And so in this latter work, one of the things that we focused on was really also the creation of a large dataset gathered from publicly available data sources in order to help both support this model and then also help support other similar models in this space. And I think one of the things, Tian, that you had mentioned, as well, is there is this broader community. And so I think one of the, sort of, interesting things that, sort of, come out here is I think we’re all working, you know, on of course some interesting things but also to support the broader intersectional communities at this space, as well.

KRUFT: Yeah. So we can really see the transformative power of generative AI towards the discovery of new medicines, in materials, or even in health care, too. So finally, I want to touch on collaborations. So as machine learning researchers, it’s really important for you to optimize your collaborations with domain experts. So, for example, in my team, we have a close collaboration with Novartis to train new AI models to help us discover new small-molecule drug candidates. But I’m curious. What collaborations do you have now, and how are you optimizing that collaboration with the domain expert?

SEVERSON: Yeah, we have several different collaborators in the pathology space in addition to Paige that I mentioned earlier, who’s an AI software company. We’re also working with Volastra, who’s a therapeutics company, and Providence health system. And I think the motivation for having this—oh, and that’s not even an exhaustive list of our pathology partners. So I think the motivation for having this diverse set is to make sure that we really understand the different ways these tools could be applied. It gives us a diversity of perspective that then allows us to think about where could we have the biggest impact, like where is AI going to make a difference, and bring all those perspectives together to form our research strategy and ideally build something that can cut across these different applications of point-of-care diagnostics and therapeutic discovery.

KRUFT: That’s great.

NAUMANN: Yeah. This is an incredible, incredibly important set of points because I think a lot of the collaborators we work with actually, sort of, ground the reality that our work resides in. And you mentioned Providence health systems, and, sort of, going back to one of the things I mentioned earlier around clinical trial matching and, sort of, the importance in that space, that was, you know, primarily a project that was, sort of, born of a desire to, sort of, be able to do more than we can currently in this space. And similar to some of the registry work that we’ve done with them, as well. And so I think mostly just to say, yeah, I think this is … our collaborators across a variety of these projects really help us make sure that we’re having the impact that we’re looking for.

KRUFT: I’m curious about internal collaborations. Rianne, Tian, any thoughts about that?

VAN DEN BERG: Yeah, so we work on a project on density functional theory, which is a method that we can use to do quantum calculations, which are relevant for chemistry and physics and materials science, and our team is super interdisciplinary. We have researchers in machine learning but also researchers in quantum physics and chemistry. And although, I have to say, of course, in the beginning when you set up a project like that, I would say interdisciplinary communication can be a little bit of a challenge because you have to learn how to speak each other’s language. But after, kind of, going through that process, it definitely gets better over time. And after doing that now for about a year, I’m 100 percent convinced that if you don’t have such intense daily collaborations, it’s, I think, impossible to have success in some of these projects because we really need to have a really in-depth understanding of what the problem is that we’re trying to solve, also have some sense of what the historical developments were before AI tried to come in and solve this problem, and what’s really needed to make a breakthrough that, you know, the scientific community that has been working on this project, on this type of problem, before AI came in, would also perceive as a suitable breakthrough. So yeah …

XIE: Yeah, I just want to say I 100 percent agree with what you have just said because it’s a very similar situation in our team. So our team, we have people from machine learning background and also from materials science background. There are also people who do a little bit of a computation; there are people who are coming more from the experimental side. So I think it really takes quite a while for the entire team to be able to, kind of, understand each other’s language, right, and be able to speak in the same language that allows for this kind of interdisciplinary communication. I find this to be very, very beneficial because for the machine … for the people from the machine learning background, right, so it’s very important to be able to contextualize your result, right, in the language of scientific domain to understand, is this a trivial result, or is this a result that is actually meaningful, right? Would that actually change how materials scientists design new materials, right? So on the other hand, it’s also very important for the domain scientist to be able to speak the language of machine learning because it takes quite a lot of effort to be able to translate your domain knowledge in a language that a machine learning researcher can understand—for example, how to build up the proper evaluation pipeline, right. How to translate [your scientific intuition into a single scaler number that allows] faster iteration of model development. I think all of this is not possible without interdisciplinary communication and not even within our team. I would say even broader in this community, right, of AI for science community. I have seen a major, a lot of improvement just in the last two years that people from the machine learning research groups and people from the domain research groups really begin to talk to each other, right, a lot of these open workshops. And I think that’s really the key for the success of applying AI to scientific domains.

KRUFT: Yeah, wonderful. Well, thanks everyone. I want to wrap up by just extending my deepest gratitude to my esteemed colleagues here and especially for their pioneering work in the AI for science domain. And thank you to everyone for joining us and listening to us today. So stay tuned. We have some research lightning talks coming up next. Thank you.

Research Lab Microsoft Research AI for Science

Research Lab Microsoft Research Lab – New England

Research Lab Microsoft Health Futures

Group Real-world Evidence

Group Biomedical ML

Blog MatterGen: Property-guided materials design

The post Panel Discussion: Transforming the Natural Sciences with AI appeared first on Microsoft Research.

Keynote: The Revolution in Scientific Discovery

Microsoft Research Team — Sat, 09 Mar 2024 00:43:17 +0000

Presented by Chris Bishop at Microsoft Research Forum, March 2024

“In my view, the most important use case of AI will be to scientific discovery. And the reason I believe this is that it’s our understanding of the natural world obtained through scientific discovery, together with its application in the form of technology, that has really transformed the human species.”
– Chris Bishop, Technical Fellow and Director, Microsoft Research AI4Science

Microsoft research copilot experience How will AI revolutionize scientific discovery?

Transcript: Keynote

The revolution in scientific discovery

Chris Bishop, Technical Fellow and Director, Microsoft Research AI4Science

Chris Bishop shares the vision for how AI for science will leverage AI to model and predict natural phenomena, including the exciting real-world progress being made by the team.

Microsoft Research Forum, March 5, 2024

CHRIS BISHOP: Good morning. A very warm welcome to the Microsoft Research Forum. My name is Chris, and I’m going to talk today about an extraordinary revolution that’s unfolding at the intersection of AI and deep learning with the natural sciences.

In my view, the most important use case of AI will be to scientific discovery. And the reason I believe this is that it’s our understanding of the natural world obtained through scientific discovery, together with its application in the form of technology, that has really transformed the human species. This transformation has very broad applicability, spanning vast ranges of length and time. Now, we’ve seen remarkable advances, of course, in AI in the last couple of years. And you may ask, can we just apply large language models to scientific discovery and be done? Well, the answer is no. But first, let me say that large language models do have two remarkable properties that are very useful. The first one is, of course, they can generate and could understand human language, so they provide a wonderful human interface to very sophisticated technologies. But the other property of large language models—and I think this came as a big surprise to many of us—is that they can function as effective reasoning engines. And, of course, that’s going to be very useful in scientific discovery. But large language models alone don’t address the full challenge of scientific discovery. And the reason is that there are some key differences in the natural sciences. And let me highlight some of these.

So the first one is that in scientific discovery, we need to do precise quantitative numerical calculations. We may need to calculate the properties of molecules or materials. And large language models are very poor at doing complex numerical calculations. They don’t produce accurate results. And, of course, they’re hugely inefficient from a computational point of view in doing such calculations. A second critical difference is that in the natural sciences, the ultimate truth—the gold standard—is experiment. It doesn’t matter how beautiful your theory is or how clever your code is. If it doesn’t agree with experiment, you have to go back and think again. So in scientific discovery, experiment needs to be embedded in the loop of the scientific discovery process.

Another difference is that with large language models, we can exploit internet-scale data that, you know, to a first approximation is readily available, freely available. In scientific discovery, however, the training data is often scarce. We may generate it computationally at great expense, or we gather it through sophisticated, complex laboratory experiments. But it tends to be scarce. It tends to be expensive. It tends to be limited. But there’s a final difference that, to some extent, offsets that scarcity of data, and it’s the fact that we have the known laws of physics. We’ve had more than three and a half centuries of scientific discovery that’s given us tremendous insight into the machinery of the universe. So let me say a little bit more about that, what I’ll call prior knowledge.

So very often, this prior knowledge is expressed in the form of differential equations. So think about Newton’s laws of motion or the law of gravity, going back to the 17th century; Maxwell’s equations of electrodynamics, in the 19th century; and then, of course, very importantly, at the beginning of the 20th century, the discovery of the equations of quantum physics. And here I show a simplified version of Schrödinger’s equation. And if you sprinkle in a few relativistic effects, then this really describes matter at the molecular level with exquisite precision. And it, of course, [would] be crazy not to use those centuries of scientific advance. But there’s a problem, which is that these equations, although they’re very simple to write down, are computationally very expensive to solve. In fact, an exact solution of Schrödinger’s equation is exponential in the number of electrons, so it’s prohibitive for any practical application. And even accurate approximations to Schrödinger’s equation are still computationally very expensive. Nevertheless, we can make efficient use of that because instead of viewing your solver for a Schrödinger’s equation as a way of directly calculating the properties of materials or molecules—that’s expensive—instead, we can use that simulation to generate synthetic training data and then use that training data to train deep learning models, which we’ll call emulators. And once they’re trained, those emulators can be several orders of magnitude faster than the original simulator. And I’ll show an example of that in a moment. But it’s not just these differential equations that constitute powerful prior knowledge.

Let’s have a look at this molecule in isolation. Just a simple molecule. And it has various properties. Let’s say it has some energy. If we now imagine rotating the molecule that in the computer, the coordinates—all the atoms are stored as numbers. As we rotate the molecule, all of those numbers change, but the energy doesn’t change. And we call that an invariance property, and it’s a powerful, exact piece of prior knowledge. We want to make sure that’s baked into our, into our machine learning models. And if that molecule happens to have a dipole moment like a little bar magnet that when the molecule rotates, that little magnet rotates with the molecule, that’s called equivariance. And there’s a lot more besides. These are examples of symmetries, but symmetries play a very powerful role in the natural sciences. So the symmetry of spacetime gives rise to conservation of momentum, conservation of energy; gauge symmetries in the electromagnetic field gives rise to the conservation of charge. These hold exactly with exquisite precision, and again, we want to exploit all of that prior knowledge.

So how can we actually make use of that prior knowledge in practice? Well, it really comes down to a very fundamental theorem that’s right at the heart of machine learning. It has a strange title. It’s called the no-free-lunch theorem. But what it says is that you cannot learn purely from data. You can only learn from data in the presence of assumptions, or prior knowledge. And in the machine learning context, we call that inductive bias. And there’s a tradeoff between the data and the inductive bias. So if you’re in a situation where data is scarce, you can compensate for that by using powerful inductive bias. And so it leads to a different kind of tradeoff. If you think about large language models, I’ve already said that we have data available at a very large scale, and so those large language models use very lightweight inductive bias. They’re often based on transformers. The inductive biases that we have are deep hierarchical representation; perhaps there’s some data-dependent self-attention. But it’s very lightweight inductive bias. And many scientific models are in the other regime. We don’t have very much data, but we have these powerful inductive biases arising from three and a half centuries of scientific discovery.

So let me give you an example of how we can use those inductive biases in practice. And this is some work done by our close collaborators and partners in the Microsoft Azure Quantum team. And the goal here is to find new electrolytes for lithium-ion batteries and, in particular, to try to replace some of that increasingly scarce lithium with cheap, widely available sodium. And so this really is a screening process. We start at the top with over 32 million computer-generated candidate materials, and then we go through a series of evermore expensive screening steps, including some human-guided screening towards the end, eventually to arrive at a single best candidate. Now, those steps involve things like density functional theory, which are approximate solutions to Schrödinger’s equation, but they’re computationally very expensive.

So we do what I talked about earlier, which is we use those solutions—we use solutions from density functional theory—to train an emulator, and now the emulator can do the screening much faster. In fact, it’s more than three orders of magnitude faster at screening these materials. And anytime something gets three orders of magnitude faster, that really is a disruption. And so what this enabled us to do is to take a process, a screening process, that would have taken many years of compute by conventional methods and reduce it to just 80 hours of computation. And here you see the best candidate material from that screening process. This was synthesized by our partners at the Pacific Northwest National Laboratory. And here you can see some test batteries being fabricated. And then here are the batteries in a kind of test cell. And then just to prove that it really works, here’s a little alarm clock being powered by one of these new lithium-ion batteries that uses 70 percent less lithium than a standard lithium-ion battery. So that’s extremely exciting. But there’s much more that we can do. It’s really just the beginning. So as well as using AI to accelerate that screening process by three orders of magnitude, we can also use AI to transform the way we generate those candidate materials at the top of that funnel.

So this is some recent work called MatterGen. And the idea here is not simply to generate materials at random and then screen them but instead generate materials in a much more focused way, materials that have specific values of magnetic density, bandgap, and other desired properties. And we use a technique called diffusion models. You’re probably familiar at least with the output of diffusion models; they’re widely used to generate images and now video, as well. And here they are being used to generate—can we just play that video? Is that possible? This is a little video … here we go. So this, the first part of the video here, is just showing a typical generation of a random material. And now we see MatterGen generating materials that have specific desired properties. What this means is that we can take that combinatorically vast space of possible new materials and by focusing our attention on subspace of that overall space of materials and then using accelerated AI, this gives a further several orders of magnitude acceleration in our ability to explore the space of materials to find new candidates for things like battery electrolytes. But it’s not just materials design. This disruption has much broader applicability.

It’s a very sad fact that in 2022, 1.3 million people died of tuberculosis. Now, you may find that surprising because there are antibiotics; there are drugs to treat tuberculosis. But the bacterium that causes TB is developing very strong drug resistance, and so the search is on for new and better treatments. So again, we can use modern deep learning techniques, and I’ll talk through a framework here called TamGen, for target-aware molecular generation, and this allows us to go search very specifically for new molecules that bind to a particular protein. So here’s how it works. We first of all train a language model, but it’s not trained on human language; it’s trained on the language of molecules. And, in particular, this uses a standard representation called SMILES, which is just a way of taking a molecule and expressing it as a one-dimensional sequence of tokens, so a bit like a sequence of words in language. And now we train a transformer with self-attention to be able to effectively predict the next token, and when it’s trained, it now understands the language of SMILES strings—it understands the language of molecules—and it can generate new molecules.

But we don’t just want to generate new molecules at random, of course. We want to generate molecules that are targeted to a particular protein. And so we use another transformer-based model to encode the properties of that protein. And, in particular, we’re looking for a region of the protein called a pocket, which is where the drug molecule binds and, in the process, it alters the function of the protein, and that breaks the chain of the disease. And so we use some of those geometrical properties that I talked about earlier to encode the geometrical structure of the protein, taking account of those invariance and equivariance properties. And we learn a model that can map that into that representation of the SMILES string. We want to do one more thing, as well. What we want to do is to be able to refine molecules. We want to take molecules that we know bind but improve them, increase their binding efficiency. And so we need a way of encoding an existing molecule but also generating variability. And we use another standard deep learning technique called a variational autoencoder, which takes a representation of the starting molecule, and again encode that into that representation space.

And then finally we use a thing called cross-attention that combines the output of those two encoders into that SMILES language model. So once the system has been trained, we can now present it with a target protein, in this case, for TB. We can present it with a known molecule that binds to that target, and then it can generate candidates that we hope will have an improved efficacy compared to the starting molecule. Now, we collaborate with a partner called GHDDI—the Global Health Drug Discovery Institute. They’ve synthesized these candidate molecules, and they found this one in particular is more than two orders of magnitude improvement over a standard drug molecule. So it’s got a long way to go before we have a clinical drug. But nevertheless, this is an extraordinary achievement. This is the state of the art in terms of candidate drug molecules which bind to this particular protein. So I think very, very exciting. And, of course, we’re continuing to work with GHDDI to refine and optimize this and hope eventually to take this towards pre-clinical trials.

So I’ve mentioned several concepts here: transformers, attention, variational autoencoders, diffusion models, and so on. And if you want to learn more about these techniques, I’m delighted to say that a new book has just been published a few weeks ago called Deep Learning: Foundations and Concepts, produced by Springer—a beautiful, very high-quality hardback copy. But it’s also available from BishopBook.com (opens in new tab) as a free online version. So I encourage you to take a look at that.

So finally, I hope I’ve given you a glimpse of how AI and deep learning are transforming the world of scientific discovery. I’ve highlighted two examples, one of them in materials design and one of them in drug discovery. This is just scratching the surface. The potential of this disruption has huge breadth of applicability. And so to hear more about this exciting field, in a few minutes, Bonnie [Kruft] will be moderating a panel discussion on transforming the natural sciences with AI.

Thank you very much.

Blog MatterGen: Property-guided materials design

Publication MatterGen: a generative model for inorganic materials design

Publication Target-aware Molecule Generation for Drug Design Using a Chemical Language Model

The post Keynote: The Revolution in Scientific Discovery appeared first on Microsoft Research.

Research Forum Brief | March 2024 Articles

Generative AI and Plural Governance: Mitigating Challenges and Surfacing Opportunities

Transcript: Lightning Talk 5

Related resources

Getting Modular with Language Models: Building and Reusing a Library of Experts for Task Generalization

Transcript: Lightning Talk 3

Related resources

GigaPath: Foundation Model for Digital Pathology

Transcript: Lightning Talk 4

Related resources

Multimodal Generative AI: the Next Frontier in Precision Health

Multimodal Generative AI: the Next Frontier in Precision Health

Related resources

The Metacognitive Demands and Opportunities of Generative AI

Transcript: Lightning Talk 2

The Metacognitive Demands and Opportunities of Generative AI

Related resources

What’s new in AutoGen?

Transcript: Lightning Talk 1

Related resources

Panel Discussion: Transforming the Natural Sciences with AI

Transcript: Panel Discussion

Related resources

Keynote: The Revolution in Scientific Discovery

Transcript: Keynote

Related resources