Research Forum | Episode 2 - abstract chalkboard background

Research Forum Brief | March 2024

Keynote: The Revolution in Scientific Discovery

Share this page

2024 | Christopher Bishop wearing glasses and looking at the camera

“In my view, the most important use case of AI will be to scientific discovery. And the reason I believe this is that it’s our understanding of the natural world obtained through scientific discovery, together with its application in the form of technology, that has really transformed the human species.”

Chris Bishop, Technical Fellow and Director, Microsoft Research AI4Science

Transcript: Keynote

The revolution in scientific discovery 

Chris Bishop, Technical Fellow and Director, Microsoft Research AI4Science 

Chris Bishop shares the vision for how AI for science will leverage AI to model and predict natural phenomena, including the exciting real-world progress being made by the team. 

Microsoft Research Forum, March 5, 2024 

CHRIS BISHOP: Good morning. A very warm welcome to the Microsoft Research Forum. My name is Chris, and I’m going to talk today about an extraordinary revolution that’s unfolding at the intersection of AI and deep learning with the natural sciences.  

In my view, the most important use case of AI will be to scientific discovery. And the reason I believe this is that it’s our understanding of the natural world obtained through scientific discovery, together with its application in the form of technology, that has really transformed the human species. This transformation has very broad applicability, spanning vast ranges of length and time. Now, we’ve seen remarkable advances, of course, in AI in the last couple of years. And you may ask, can we just apply large language models to scientific discovery and be done? Well, the answer is no. But first, let me say that large language models do have two remarkable properties that are very useful. The first one is, of course, they can generate and could understand human language, so they provide a wonderful human interface to very sophisticated technologies. But the other property of large language models—and I think this came as a big surprise to many of us—is that they can function as effective reasoning engines. And, of course, that’s going to be very useful in scientific discovery. But large language models alone don’t address the full challenge of scientific discovery. And the reason is that there are some key differences in the natural sciences. And let me highlight some of these.  

So the first one is that in scientific discovery, we need to do precise quantitative numerical calculations. We may need to calculate the properties of molecules or materials. And large language models are very poor at doing complex numerical calculations. They don’t produce accurate results. And, of course, they’re hugely inefficient from a computational point of view in doing such calculations. A second critical difference is that in the natural sciences, the ultimate truth—the gold standard—is experiment. It doesn’t matter how beautiful your theory is or how clever your code is. If it doesn’t agree with experiment, you have to go back and think again. So in scientific discovery, experiment needs to be embedded in the loop of the scientific discovery process.  

Another difference is that with large language models, we can exploit internet-scale data that, you know, to a first approximation is readily available, freely available. In scientific discovery, however, the training data is often scarce. We may generate it computationally at great expense, or we gather it through sophisticated, complex laboratory experiments. But it tends to be scarce. It tends to be expensive. It tends to be limited. But there’s a final difference that, to some extent, offsets that scarcity of data, and it’s the fact that we have the known laws of physics. We’ve had more than three and a half centuries of scientific discovery that’s given us tremendous insight into the machinery of the universe. So let me say a little bit more about that, what I’ll call prior knowledge.  

So very often, this prior knowledge is expressed in the form of differential equations. So think about Newton’s laws of motion or the law of gravity, going back to the 17th century; Maxwell’s equations of electrodynamics, in the 19th century; and then, of course, very importantly, at the beginning of the 20th century, the discovery of the equations of quantum physics. And here I show a simplified version of Schrödinger’s equation. And if you sprinkle in a few relativistic effects, then this really describes matter at the molecular level with exquisite precision. And it, of course, [would] be crazy not to use those centuries of scientific advance. But there’s a problem, which is that these equations, although they’re very simple to write down, are computationally very expensive to solve. In fact, an exact solution of Schrödinger’s equation is exponential in the number of electrons, so it’s prohibitive for any practical application. And even accurate approximations to Schrödinger’s equation are still computationally very expensive. Nevertheless, we can make efficient use of that because instead of viewing your solver for a Schrödinger’s equation as a way of directly calculating the properties of materials or molecules—that’s expensive—instead, we can use that simulation to generate synthetic training data and then use that training data to train deep learning models, which we’ll call emulators. And once they’re trained, those emulators can be several orders of magnitude faster than the original simulator. And I’ll show an example of that in a moment. But it’s not just these differential equations that constitute powerful prior knowledge.

Let’s have a look at this molecule in isolation. Just a simple molecule. And it has various properties. Let’s say it has some energy. If we now imagine rotating the molecule that in the computer, the coordinates—all the atoms are stored as numbers. As we rotate the molecule, all of those numbers change, but the energy doesn’t change. And we call that an invariance property, and it’s a powerful, exact piece of prior knowledge. We want to make sure that’s baked into our, into our machine learning models. And if that molecule happens to have a dipole moment like a little bar magnet that when the molecule rotates, that little magnet rotates with the molecule, that’s called equivariance. And there’s a lot more besides. These are examples of symmetries, but symmetries play a very powerful role in the natural sciences. So the symmetry of spacetime gives rise to conservation of momentum, conservation of energy; gauge symmetries in the electromagnetic field gives rise to the conservation of charge. These hold exactly with exquisite precision, and again, we want to exploit all of that prior knowledge.  

So how can we actually make use of that prior knowledge in practice? Well, it really comes down to a very fundamental theorem that’s right at the heart of machine learning. It has a strange title. It’s called the no-free-lunch theorem. But what it says is that you cannot learn purely from data. You can only learn from data in the presence of assumptions, or prior knowledge. And in the machine learning context, we call that inductive bias. And there’s a tradeoff between the data and the inductive bias. So if you’re in a situation where data is scarce, you can compensate for that by using powerful inductive bias. And so it leads to a different kind of tradeoff. If you think about large language models, I’ve already said that we have data available at a very large scale, and so those large language models use very lightweight inductive bias. They’re often based on transformers. The inductive biases that we have are deep hierarchical representation; perhaps there’s some data-dependent self-attention. But it’s very lightweight inductive bias. And many scientific models are in the other regime. We don’t have very much data, but we have these powerful inductive biases arising from three and a half centuries of scientific discovery.  

So let me give you an example of how we can use those inductive biases in practice. And this is some work done by our close collaborators and partners in the Microsoft Azure Quantum team. And the goal here is to find new electrolytes for lithium-ion batteries and, in particular, to try to replace some of that increasingly scarce lithium with cheap, widely available sodium. And so this really is a screening process. We start at the top with over 32 million computer-generated candidate materials, and then we go through a series of evermore expensive screening steps, including some human-guided screening towards the end, eventually to arrive at a single best candidate. Now, those steps involve things like density functional theory, which are approximate solutions to Schrödinger’s equation, but they’re computationally very expensive.

So we do what I talked about earlier, which is we use those solutions—we use solutions from density functional theory—to train an emulator, and now the emulator can do the screening much faster. In fact, it’s more than three orders of magnitude faster at screening these materials. And anytime something gets three orders of magnitude faster, that really is a disruption. And so what this enabled us to do is to take a process, a screening process, that would have taken many years of compute by conventional methods and reduce it to just 80 hours of computation. And here you see the best candidate material from that screening process. This was synthesized by our partners at the Pacific Northwest National Laboratory. And here you can see some test batteries being fabricated. And then here are the batteries in a kind of test cell. And then just to prove that it really works, here’s a little alarm clock being powered by one of these new lithium-ion batteries that uses 70 percent less lithium than a standard lithium-ion battery. So that’s extremely exciting. But there’s much more that we can do. It’s really just the beginning. So as well as using AI to accelerate that screening process by three orders of magnitude, we can also use AI to transform the way we generate those candidate materials at the top of that funnel.  

So this is some recent work called MatterGen. And the idea here is not simply to generate materials at random and then screen them but instead generate materials in a much more focused way, materials that have specific values of magnetic density, bandgap, and other desired properties. And we use a technique called diffusion models. You’re probably familiar at least with the output of diffusion models; they’re widely used to generate images and now video, as well. And here they are being used to generate—can we just play that video? Is that possible? This is a little video … here we go. So this, the first part of the video here, is just showing a typical generation of a random material. And now we see MatterGen generating materials that have specific desired properties. What this means is that we can take that combinatorically vast space of possible new materials and by focusing our attention on subspace of that overall space of materials and then using accelerated AI, this gives a further several orders of magnitude acceleration in our ability to explore the space of materials to find new candidates for things like battery electrolytes. But it’s not just materials design. This disruption has much broader applicability.

It’s a very sad fact that in 2022, 1.3 million people died of tuberculosis. Now, you may find that surprising because there are antibiotics; there are drugs to treat tuberculosis. But the bacterium that causes TB is developing very strong drug resistance, and so the search is on for new and better treatments. So again, we can use modern deep learning techniques, and I’ll talk through a framework here called TamGen, for target-aware molecular generation, and this allows us to go search very specifically for new molecules that bind to a particular protein. So here’s how it works. We first of all train a language model, but it’s not trained on human language; it’s trained on the language of molecules. And, in particular, this uses a standard representation called SMILES, which is just a way of taking a molecule and expressing it as a one-dimensional sequence of tokens, so a bit like a sequence of words in language. And now we train a transformer with self-attention to be able to effectively predict the next token, and when it’s trained, it now understands the language of SMILES strings—it understands the language of molecules—and it can generate new molecules. 

But we don’t just want to generate new molecules at random, of course. We want to generate molecules that are targeted to a particular protein. And so we use another transformer-based model to encode the properties of that protein. And, in particular, we’re looking for a region of the protein called a pocket, which is where the drug molecule binds and, in the process, it alters the function of the protein, and that breaks the chain of the disease. And so we use some of those geometrical properties that I talked about earlier to encode the geometrical structure of the protein, taking account of those invariance and equivariance properties. And we learn a model that can map that into that representation of the SMILES string. We want to do one more thing, as well. What we want to do is to be able to refine molecules. We want to take molecules that we know bind but improve them, increase their binding efficiency. And so we need a way of encoding an existing molecule but also generating variability. And we use another standard deep learning technique called a variational autoencoder, which takes a representation of the starting molecule, and again encode that into that representation space. 

And then finally we use a thing called cross-attention that combines the output of those two encoders into that SMILES language model. So once the system has been trained, we can now present it with a target protein, in this case, for TB. We can present it with a known molecule that binds to that target, and then it can generate candidates that we hope will have an improved efficacy compared to the starting molecule. Now, we collaborate with a partner called GHDDI—the Global Health Drug Discovery Institute. They’ve synthesized these candidate molecules, and they found this one in particular is more than two orders of magnitude improvement over a standard drug molecule. So it’s got a long way to go before we have a clinical drug. But nevertheless, this is an extraordinary achievement. This is the state of the art in terms of candidate drug molecules which bind to this particular protein. So I think very, very exciting. And, of course, we’re continuing to work with GHDDI to refine and optimize this and hope eventually to take this towards pre-clinical trials.

So I’ve mentioned several concepts here: transformers, attention, variational autoencoders, diffusion models, and so on. And if you want to learn more about these techniques, I’m delighted to say that a new book has just been published a few weeks ago called Deep Learning: Foundations and Concepts, produced by Springer—a beautiful, very high-quality hardback copy. But it’s also available from BishopBook.com (opens in new tab) as a free online version. So I encourage you to take a look at that. 

So finally, I hope I’ve given you a glimpse of how AI and deep learning are transforming the world of scientific discovery. I’ve highlighted two examples, one of them in materials design and one of them in drug discovery. This is just scratching the surface. The potential of this disruption has huge breadth of applicability. And so to hear more about this exciting field, in a few minutes, Bonnie [Kruft] will be moderating a panel discussion on transforming the natural sciences with AI. 

Thank you very much.