The AI Revolution in Medicine, Revisited Archives - Microsoft Research http://approjects.co.za/?big=en-us/research/podcast-series/the-ai-revolution-in-medicine-revisited/ Tue, 09 Sep 2025 17:07:24 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education http://approjects.co.za/?big=en-us/research/podcast/coauthor-roundtable-reflecting-on-healthcare-economics-biomedical-research-and-medical-education/ Thu, 21 Aug 2025 16:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1148218 For the series finale, Peter Lee, Carey Goldberg, and Dr. Zak Kohane compare their predictions to insights from the series’ most recent guests, including experts on AI’s economic and societal impact, leaders in AI-driven medicine, and doctors in training.

The post Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education appeared first on Microsoft Research.

]]>
Illustrated headshots of Carey Goldberg, Peter Lee, and Dr. Isaac Kohane.

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this series finale, Lee welcomes back coauthors Carey Goldberg (opens in new tab) and Dr. Zak Kohane (opens in new tab) to discuss how their predictions stack up against key takeaways from guests in the second half of the series: experts on AI’s economic and societal impact; technologists on the cutting edge; leaders in AI-driven medicine; next-generation physicians; and heads of healthcare organizations. Lee, Goldberg, and Kohane explore thinking innovatively about existing healthcare processes, including the structure of care teams and the role of specialties, to take advantage of AI opportunities and consider what clinicians and patients might need these new AI tools to be to feel empowered when it comes to giving and receiving the best healthcare. They close the episode with their hopes for the future of AI in health.

Transcript

[MUSIC] 

[BOOK PASSAGE] 

PETER LEE: “As a society—indeed, as a species—we have a choice to make. Do we constrain or even kill artificial intelligence out of fear of its risks and obvious ability to create new harms? Do we submit ourselves to Al and allow it to freely replace us, make us less useful and less needed? Or do we start, today, shaping our Al future together, with the aspiration to accomplish things that humans alone, and Al alone, can’t do but that humans+Al can? The choice is in our hands … .” 

[END OF BOOK PASSAGE]

[THEME MUSIC]

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee. 

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong? 

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 

[THEME MUSIC FADES] 

The book passage I read at the top is from the epilogue, and I think it’s a truly fitting closing sentiment for the conclusion of this podcast series—because it calls back to the very beginning.

As I’ve mentioned before, Carey, Zak, and I wrote The AI Revolution in Medicine as a guide to help answer these big questions, particularly as they pertain to medicine. You know, we wrote the book to empower people to make a choice about AI’s development and use. Well, have they? Have we?

Perhaps we’ll need more time to tell. But over the course of this podcast series, I’ve had the honor of speaking with folks from across the healthcare ecosystem. And my takeaway? They’re all committed to shaping AI into a tool that can improve the industry for practitioners and patients alike.

In this final episode, I’m thrilled to welcome back my coauthors, Carey Goldberg and Dr. Zak Kohane. We’ll examine the insights from the second half of the season. 

[TRANSITION MUSIC] 

Carey, Zak—it’s really great to have you here again! 

CAREY GOLDBERG: Hey, Peter! 

ZAK KOHANE: Hi, Peter. 

LEE: So this is the second roundtable. And just to recap, you know, we had several early episodes of the podcast where we talked to some doctors, some technology developers, some people who think about regulation and public policy, patient advocates, a venture capitalist who invests in, kind of, consumer and patient-facing medical ventures, and some bioethicists. 

And I think we had a great conversation there. I think, you know, it felt mostly validating. A lot of the things that we predicted might happen happened, and then we learned a lot of new things. But now we have five more episodes, and the mix of kinds of people that we talk to here is different than the original. 

And so I thought it would be great for us to have a conversation and recap what we think we heard from all of them. So let’s just start at the top. 

So in this first episode in the second half of this podcast series, we talked to economists Azeem Azhar and Ethan Mollick. And I thought those conversations were really interesting. Maybe there were, kind of, two things, two main topics. One was just the broader impact on the economy, on the cost of healthcare, on overall workforce issues. 

One of the things that I thought was really interesting was something that Ethan Mollick brought up. And maybe just to refresh our memories, let’s play this little clip from Ethan. 

LEE: So let me start with you, Zak. Does that make sense to you? Are you seeing something similar? 

KOHANE: I thought it was incredibly insightful because we discussed on our earlier podcast how a chief AI officer in one of the healthcare hospitals, in one of the healthcare systems, was highly regulating the use of AI, but yet in her own practice on her smartphone was using all these AI technologies. 

And so it’s insightful that on the one hand, she is increasing her personal productivity, … 

LEE: Right. 

KOHANE: … and perhaps she’s increasing her quality of her care. But it’s very hard for the healthcare system to actually realize any gains. It’s unlikely … let’s put it this way. It would be for her a defeat if they said, “Now you should see more patients.” 

LEE: Yes. [LAUGHS] 

KOHANE: Now, I’m not saying that won’t happen. It could happen. But, you know, gains of productivity are really at the individual level of the doctors. And that’s why they’re adopting it. That’s why the ambient dictation tools are so successful. But really turning it into things that matter in terms of productivity for healthcare, namely making sure that patients are getting healthy, requires that every piece of the puzzle works well together. You know, it’s well-tread ground to talk about how patients get very expensive procedures, like a cardiac transplant, and then go home, and they’re not put on blood thinners … 

LEE: Right. 

KOHANE: … and then they get a stroke. You know, the chain is as strong as the weakest link. And just having AI in one part of it is not going to do it. And so hospitals, I think, are doubly burdened by the fact that, (A) they tend to not like innovation because they are high-revenue, low-margin companies. But if they want it implemented effectively, they have to do it across the entire processes of healthcare, which are vast and not completely under their control. 

LEE: Yeah. Yep. You know, that was Sara Murray, who’s the chief health AI officer at UC San Francisco. 

And then, you know, Carey, remember, we were puzzled by Chris Longhurst’s finding in a controlled study that the, you know, having an AI respond to patient emails didn’t seem to lead to any, I guess you would call it, productivity benefits. I remember we were both kind of puzzled by that. I wonder if that’s related to what Ethan is saying here. 

GOLDBERG: I mean, possibly, but I think we’ve seen since then that there have been multiple studies showing that in fact using AI can be extremely effective or helpful, even, for example, for diagnosis. 

And so I find just from the patient point of view, it kind of drives me crazy that you have individual physicians using AI because they know that it will improve the care that they’re offering. And yet you don’t have their institutions kind of stepping up and saying, “OK, these are the new norms.” 

By the way, Ethan Mollick is a national treasure, right. Like, he is the classic example of someone who just stepped up at this moment … 

LEE: Yeah. 

GOLDBERG: … when we saw this extraordinary technological advance. And he’s not only stepping up for himself. He’s spreading the word to the masses that this is what these things can do. 

And so it’s frustrating to see the institutions not stepping up and instead the individual doctors having to do it. 

KOHANE: But he made another very interesting point, which was that the reason that he could be so informative to not only the public but practitioners of AI is these things would emerge out of the shop, and they would not be aged too long, like a fine wine, before they were just released to the public. 

And so he was getting exposure to these models just weeks after some of the progenitors had first seen it. And therefore, because he’s actually a really creative person in terms of how he exercises models, he sees uses and problems very early on. But the point is institutions, think about how much they are disadvantaged. They’re not Ethan Mollick. They’re not the progenitors. So they’re even further behind. So it’s very hard. If you talk to most of the C-suite of hospitals, they’d be delighted to know as much about the impact as Ethan Mollick. 

LEE: Yeah. By the way, you know, I picked out this quote because within Microsoft, and I suspect every other software company, we’re seeing something very similar, where individual programmers are 20 to 30% more productive just in the number of lines of code they write per day or the number of pull requests per week. Any way you measure it, it’s very consistent. And yet by the time you get to, say, a 25-person software engineering team, the productivity of that whole team isn’t 25% more productive. 

Now, that is starting to change because we’re starting to figure out that, well, maybe we should reshape how the team operates. And there’s more of an orientation towards having, you know, smaller teams of full-stack developers. And then you start to see the gains. But if you just keep the team organized in the usual way, there seems to be a loss. So there’s something about what Ethan was saying that resonated very strongly with me. 

GOLDBERG: But I would argue that it’s not just productivity we’re talking about. There’s a moral imperative to improve the care. And if you have tools that will do that, you should be using them or trying harder to. 

LEE: Right. Yep. 

KOHANE: I think, yes, first of all, absolutely you would. Unfortunately, most of the short-term productivity measures will not measure improvements in the quality of care because it takes a long time to die even with bad care. 

And so that doesn’t show up right away. But I think what Peter just said actually came across in several of the podcasts, which is that it’s very tricky trying to shoehorn these things into making what we’re already doing more productive. 

GOLDBERG: Yeah. Existing structures. 

KOHANE: Yeah. And I know, Carey, that you’ve raised this issue many times. But it really calls into question, what should we be doing with our time with doctors? And they are a scarce resource. And what is the most efficient way to use them? 

You know, I remember we [The New England Journal of Medicine AI] published a paper of someone who was able to use AI to increase the throughput of their emergency room (opens in new tab) by actually more appropriately having the truly sick people in the sick queue, in the triage queue, for urgent care. 

And so I think we’re going to have to think that way more broadly, about we don’t have to now look at every patient as an unknown with maybe a few pointers on diagnosis. We can have a fairly extensive profiling. 

And I know that colleagues in Clalit [Health Services] in Israel, for example, are using the overall trajectory of the patient and some considerations about utilities to actually figure out who to see next week. 

LEE: Yeah, you know, what you said brings up another maybe connection to one thing that we see also in software development. And it relates to also what we were discussing earlier: about the last thing a doctor wants is to have a tool that allows them to see even yet more patients per day. 

So in software development, there’s always this tension. Like, how many lines of code can you write per day? That’s one productivity measure. 

But sometimes we’re taught, well, don’t write more lines of code per day, but make sure that your code is well structured. Take the time to document it. Make sure it’s fully commented. Take the time to talk to your fellow software engineering team members to make sure that it’s well coordinated. And in the long run, even if you’re writing half the number of lines of code per day, the software process will be far more efficient.

And so I’ve wondered whether there’s a similar thing where doctors could see 20% fewer patients in a day, but if they take the time and also had AI help to coordinate, maybe a patient’s journey might be half as long. And therefore, the health system would be able to see twice as many patients in a year’s period or something like that. 

KOHANE: So I think you’ve “nerd sniped” me because you [LAUGHTER]—which is all too easy—but I think there’s a central issue here. And I think this is the stumbling block between what Ethan’s telling us about between the individual productivity and the larger productivity, is the team’s productivity. 

And there is actually a good analogy in computer science and that’s, uh, Brooks’s “mythical man-month,” … 

LEE: Yes, exactly. 

KOHANE: … where he shows how you can have more and more resources, but when the coordination starts failing, because you have so many, uh, individuals on the team, you start falling apart. And so even if the, uh, individual doctors get that much better, yeah, they take better care of patients, make less stupid things. 

But in terms of giving the “I get you into the emergency room, and I get you out of a hospital as fast as possible, as safely as possible, as effectively as possible,” that’s teamwork. And we don’t do it. And we’re not really optimizing our tools for that. 

GOLDBERG: And just to throw in a little reality check, I’m not aware of any indication yet that AI is in any way shortening medical journeys or making physicians more efficient. Yet … 

LEE: Right. 

GOLDBERG: at least. Yeah. 

LEE: Yes. So I think, you know, with respect to our book, critiquing our book, you know, I think it’s fair to say we were fairly focused or maybe even fixated on the individual doctor or nurse or patient, and we didn’t really, at least I never had a time where I stepped back to think about the whole care coordination team or the whole health system. 

KOHANE: And I think that’s right. It’s because, first of all, you weren’t thinking about it? It’s not what we’re taught in medical school. We’re not taught to talk about team communication excellence. And I think it’s absolutely essential. 

There’s a … what’s the … there was an early … [Terry] Winograd. And he was trying to capture what are the different kinds of actions related to pronouncements that you could expect and how could AI use that. And that was beginning to get at it. 

But I actually think this is dark matter of human organizational technology that is not well understood. And our products don’t do well. You know, we can talk about all the groupware things that are out there. But they all don’t quite get to that thing. 

LEE: Right. 

KOHANE: And I can imagine an AI serving as a team leader, a really active team leader, a real quarterback of, let’s say, a care team. 

LEE: Well, in fact, you know, we have been trying to experiment with this. My colleague, Matt Lungren, who was also one of the interviewees early on, has been working with Stanford Medicine on a tumor board AI agent—something that would facilitate tumor board meetings. 

And the early experiences are pretty interesting. Whether it relates to efficiency or productivity I think remains to be seen, but it does seem pretty interesting. 

But let’s move on. 

GOLDBERG: Well, actually, Peter, … 

LEE: Oh, go ahead. 

GOLDBERG: if you’re willing to not quite move on yet … 

LEE: [LAUGHS] All right. 

GOLDBERG: … this kind of segues into one of, I think, the most provocative questions that arose in the course of these episodes and that I’d love to have you answer, which was, remember, it was a question at a gathering that you were at, and you were asked, “Well, you’re focusing a lot on potential AI effects on individual patient and physician experiences. But what about the revolution, right? What about, like, can you be more big-picture and envision how generative AI could actually, kind of, overturn or fix the broken system, right?” 

I’m sure you’ve thought about that a lot. Like, what’s your answer? 

LEE: You know, I think ultimately, it will have to. For it to really make a difference, I think that the normal processes, our normal concept of how healthcare is delivered—how new medical discoveries are made and brought into practice—I think those things are going to have to change a lot. 

You know, one of the things I think about a lot right at the moment is, you know, we tend to think about, let’s say, medical diagnosis as a problem-solving exercise. And I think, at least at the Kaiser Permanente School of Medicine, the instruction really treats it as a kind of detective thing based on a lot of knowledge about biology and biomedicine and human condition, and so on. 

But there’s another way to think about it, given AI, which is when you see a patient and you develop some data, maybe through a physical exam, labs, and so on, you can just simply ask, “You know, what did the 500 other people who are most similar to this experience, how were they diagnosed? How were they treated? What were their outcomes? What were their experiences?” 

And that’s really a fundamentally different paradigm. And it just seems like at least the technical means will be there. And by the way, that also then relates to [the questions]: “And what was most efficacious cost-wise? What was most efficient in terms of the total length of the patient journey? How does this relate to my quality scores so I can get more money from Medicare and Medicaid?” 

All of those things, I think, you know, we’re starting to confront. 

One of the other episodes that we’re going to talk about, was my interview with two medical students. Actually, thinking of a Morgan Cheatham as just a medical student or medical resident [LAUGHTER] is a little strange. But he is. 

One of the things he talks about is the importance that he placed in his medical training about adopting AI. So, Zak, I assume you see this also with some students at Harvard Medical School. And the other medical student we interviewed, Daniel Chen, seemed to indicate this, too, where it seems like it’s the students who are bringing AI into the medical education ahead of the faculty. Does that resonate with you? 

KOHANE: It absolutely resonates with me. There are students I run into who, honestly, my first thought when I’m talking to them is, why am I teaching you [LAUGHTER], and why are you not starting a big AI company, AI medicine company, now and really change healthcare instead of going through the rest of the rigmarole? And I think broadly, higher education has a problem there, which is we have not embraced, again, going back to Ethan, a lot of the tools that can be used. And it’s because we don’t know necessarily the right way to teach them. And so far, the only lasting heuristic seems to be: use them and use them often. 

And so it’s an awkward thing, where the person who knows how to use the AI tools now in the first-year medical school can teach themselves better and faster than anybody else in their class who is just relying on the medical school curriculum. 

LEE: Now, the reason I brought up Morgan now after our discussion with Ethan Mollick is Morgan also talked about AI collapsing medical specialties. 

GOLDBERG: Yes. 

LEE: And so let’s hear this snippet from him. 

LEE: So on the specific question about specialties, Zak, do you have a point of view? And let me admit, first of all, for us, all three of us, we didn’t have any clue about this in our book. I don’t think. 

KOHANE: Not much. Not much of a clue. 

So I’m reminded of a New Yorker cartoon where you see a bunch of surgeons around the patient, and someone says, “Is that a spleen?” And it says, “I don’t know. I slept during the spleen lecture,” [LAUGHTER] and … or “I didn’t take the spleen course.” 

And yet when we measure things, we measure things much more than we think we are doing. So for example, we [NEJM AI] just published a paper where echocardiograms were being done. And it turns out those ultrasound waves just happen to also permeate the liver. And you can actually diagnose on the way with AI all the liver disease (opens in new tab) that is in—and treatable liver disease—that’s in those patients. 

But if you’re a cardiologist, “Liver? You know, I slept through liver lecture.” [LAUGHTER] And so I do think that, (A) the natural, often guild/dollar-driven silos in medicine are less obvious to AI, despite the fact that they do exist in departments and often in chapters. 

But Morgan’s absolutely right. I can tell you as an endocrinologist, if I have a child in the ICU, the endocrinologist, the nephrologist, and the neurosurgeon will argue about the right thing to do. 

And so in my mind, the truly revolutionary thing to do is to go back to 1994 with Pete Szolovits, the Guardian Angel Project (opens in new tab). What I think you need is a process. And the process is the quarterback. And the quarterback has only one job: take care of the patient. 

And it should be thinking all the time about the patient. What’s the right thing? And can be as school-marmish or not about, “Zak, you’re eating this or that or exercise or sleep,” but also, “Hey, surgeons and endocrinologists, you’re talking about my host, Zak. This is the right way because this problem and this problem and our best evidence is this is the right way to get rid of the fluid. The other ways will kill him.”

And I think you need an authoritative quarterback that has the view of the others but then makes the calls. 

LEE: Is that quarterback going to be AI or human? 

KOHANE: Well, for the very lucky people, it’ll be a human augmented by AI, super concierge

But I think we’re running out of doctors. And so realistically, it’s going to be an AI that will have to be certified in very different ways, along the ways Dave Blumenthal says, essentially, trial by fire. Like putting residents into clinics, we’re going to be putting AIs into clinics. 

But what’s worse, by the way, than the three doctors arguing about care in front of the patient is, what happens so frequently, is then you see them outpatient, and each one of them gives you a different set of decisions to make. Sometimes that actually interact pathologically, unhealthily with each other. And only the very smart nurses or primary care physicians will actually notice that and call, quote, a “family meeting,” or bring everybody in the same room to align them. 

LEE: Yeah, I think this idea of quarterback is really very, very topical right now because there’s so much intensity in the AI space around agents. And in fact, you know, the Microsoft AI team under Mustafa Suleyman and Dominic King, Harsha Nori, and team just recently posted a paper on something called sequential diagnosis, which is basically an AI quarterback that is supposed to smartly consult with other AI specialties. And interestingly, one of the AI agents is sort of the devil’s advocate that’s always criticizing and questioning things. 

GOLDBERG: That’s interesting. 

LEE: And at least on very, very hard, rare cases, it can develop some impressive results. There’s something to this that I think is emerging. 

GOLDBERG: And, Peter, Morgan said something that blew me away even more, which was, well, why do we even need specialists if the reason for a specialist is because there’s so much medical knowledge that no single physician can know all of it, and therefore we create specialists, but that limitation does not exist for AI. 

LEE: Yeah. Yeah. 

GOLDBERG: And so there he was kind of undermining this whole elaborate structure that has grown up because of human limitations that may not ultimately need to be there. 

LEE: Right. So now that gives me a good segue to get back to our economist and get to something that Azeem Azhar said. And so there’s a clip here from Azeem. 

LEE: And, you know, in the same conversation, he also talked about his own management of asthma and the fact that he’s been managing this for several decades and knows more than any other human being, no matter how well medically trained, could possibly know. And it’s also very highly personalized. And it’s not a big leap to imagine AI having that sort of lifelong understanding. 

KOHANE: So in fact, I want to give credit back to our book since you insulted us. [LAUGHTER] You challenged us. You doubted us. We do have at the end of the book a AI which is helping this woman manage her way through life. It’s quarterbacking for the woman all these different services. 

LEE: Yes. 

KOHANE: So there. 

LEE: Ah, you’re right. Yes. In fact, it’s very much, I think, along the lines of the vision that Azeem laid out in our conversation. 

GOLDBERG: Yeah. It also reminded me of the piece Zak wrote about his mother (opens in new tab) at one point when she was managing congestive heart failure and she needed to watch her weight very carefully to see her fluid status. And absolutely, there’s no … I see no reason whatsoever why that couldn’t be done with AI right now. Actually, although back then, Zak, you were writing that it takes much more than an AI [LAUGHS] to manage such a thing, right? 

KOHANE: You need an AI that you can trust. Now, my mother was born in 1927, and she’d learned through the school of hard knocks that you can’t trust too many people, maybe even not your son, MD, PhD [LAUGHTER]. 

But what I’ve been surprised [by] is how, for example, how many people are willing to trust and actually see effective use of AI as mental health counselors, for example. 

GOLDBERG: Yeah 

KOHANE: So it may in fact be that there’s a generational thing going on, and at least there’ll be some very large subset of patients which will be completely comfortable in ways that my mother would have never tolerated. 

LEE: Yeah. Now, I think we’re starting to veer into some of the core AI. 

And so I think maybe one of the most fun conversations I had was in the episode with both Sébastien Bubeck, my former colleague at Microsoft Research, and now he’s at OpenAI, and Bill Gates. And there was so much that was, I thought, interesting there. And there was one point, I think that sort of touches tangentially on what we were just conversing about, that Sébastien said. So let’s hear this snippet. 

LEE: So I thought Sébastien was saying something really profound, but I haven’t been able to quite decide or settle in my mind what it is. What do you make of what Seb just said? 

KOHANE: I think it’s context. I think that it requires an enormous amount of energy, brain energy, to actually correctly provide the context that you want this thing to work on. And it’s only going to really feel like we’re in a different playing field when it’s listening all the time, and it just steps right in. 

There is an advantage that, for example, a good programmer can have in prompting Cursor or any of these tools to do so. But it takes effort. And I think being in the conversation all the time so that you understand the context in the widest possible way is incredibly important. And I think that’s what Seb is getting at, which is if we spoon feed these machines, yes, 90%. 

But then, talking to a human being who then has to interact and gets distracted from whatever flow they’re in and maybe even makes them feel like an early bicycle rider who all of a sudden realizes, “I’m balancing on two wheels—oh no!” And they fall over. You know, there’s that interaction which is negatively synergistic. 

And so I do think it’s a very hard human-computer engineering problem. How do we make these two agents, human and computational, work in an ongoing way in the flow? I don’t think I’m seeing anything that’s particularly new. And the things that you’re beginning to hint about, Peter, in terms of agentic coordination, I think we’ll get to some of that.

LEE: Yeah. Carey, does this give you any pause? The kind of results that … they’re puzzling results. I mean, the idea of doctors with AI seeming at least in this one test—it’s just one test—but it’s odd that it does worse than the AI alone. 

GOLDBERG: Yes. I would want to understand more about the actual conditions of that study. 

From what Bill Gates said, I was most struck by the question of resource-poor environments. That even though this was absolutely one of the most promising, brightest perspectives that we highlighted in the book, we still don’t seem to be seeing a lot of use among the one half of humanity that lacks decent access to healthcare. 

I mean, there are access problems everywhere, including here in the United States. And it is one of the most potentially promising uses of AI. And I thought if anyone would know about it, he would with the work that the Gates Foundation does. 

LEE: You know, I think both you and Bill, I felt, are really simpatico. You know, Bill expressed genuine surprise that more isn’t happening yet. And it really echoed, in fact, maybe even using some of the exact same words that you’ve used. And so two years on, you’ve expressed repeatedly expecting to have seen more out in the field by now. And then I thought Bill was saying something in our conversation very similar. 

GOLDBERG: Yeah. 

LEE: You know, for me, I see it both ways. I see the world of medicine really moving fast in confronting the reality of AI in such a serious way. But at the same time, it’s also hard to escape the feeling that somehow, we should be seeing even more. 

So it’s an odd thing, a little bit paradoxical. 

GOLDBERG: Yeah. I think one thing that we didn’t focus on hardly at all in the book but that we are seeing is these companies rising up, stepping up to the challenge, Abridge and OpenEvidence, and what Morgan describes as a new stack, right. 

So there is that on the flip side. 

LEE: Now, I want to get back to this thing that Seb was saying. And, you know, I had to bring up the issue of sycophancy, which we discussed at our last roundtable also. But it was particularly … at the time that Seb, Bill, and I had our conversation, OpenAI had just gone through having to retract a fresh update of GPT-4o because it had become too sycophantic. 

So I can’t escape the feeling that some of these human-computer interaction issues are related to this tension between you want AI to follow your directions and be faithful to you, but at the same time not agree with you so often that it becomes a fault. 

KOHANE: I think it’s asking the AI to enter into a fundamental human conundrum, which is there are extreme versions of doublethink, and there’s everyday things, everyday asks of doublethink, which is how to be an effective citizen. 

And even if you’re thinking, “Hmm. I’m thinking this. I’m just not going to say it because that would be rude or counterproductive.” Or some of the official doublethinks, where you’re actually told you must say this, even if you think something else. And I think we’re giving a very tough mission for these things: be nice to the user and be useful. 

And, in education, where the thing is not always one in the same. Sometimes you have to give a little tough love to educate someone, and doing that well is both an art and it’s also very difficult. And so, you know, I’m willing to believe that the latest frontier models that have made the news in the last month are very high-performing, but they’re also all highlighting that tension … 

LEE: Yes. 

KOHANE: … that tension between behaving like a good citizen and being helpful. And this gets back to what are the fundamental values that we hope these things are following. 

It’s not, you know, “Are these things going to develop us into the paperclip factory?” It’s more of, “Which of our values are going to be elevated, and which one will be suppressed?” 

LEE: Well, since I criticized our book before, let me pat ourselves on the back this time because, I think, pervasive throughout our book, we were touching on some of these issues. 

In fact, we started the book, you know, with GPT-4 scolding me for wanting it to impersonate Zak. And there was the whole example of asking it to rewrite a poem in a certain way, and it kind of silently just tried to slide, you know, without me knowing, slide by without following through on the whole thing. 

And so that early version of GPT-4 was definitely not sycophantic at all. In fact, it was just as prone to call you an idiot if it thought you were wrong. [LAUGHTER] 

KOHANE: I had some very testy conversations around my endocrine diagnosis with it. [LAUGHTER] 

GOLDBERG: Yeah. Well then, Peter, I would ask you, I mean last time I asked you about, well, hallucinations, aren’t those solvable? And this time I would ask you, well, sycophancy, isn’t that kind of like a dial you can turn? Like, is that not solvable? 

LEE: You know, I think there are several interlocking problems. But if we assume superintelligence, even with superintelligence, medicine is such an inexact science that there will always be situations that are guesses that take into account other factors of a person’s life, other value judgments, exactly as Zak had pointed out in our previous roundtable conversation. 

And so I think there’s always going to be an opening for either differences of opinion or agreeing with you too much. And there are dangers in both cases. And I think they’ll always be present. I don’t know that, at least in something as inexact as medical science, I don’t know that it’ll ever be completely eliminated. 

KOHANE: And it’s interesting because I was trying to think what’s the right balance, but there are patients who want to be told this is what you do. Whereas there’s other patients who want to go through every detail of the reasoning. 

And it’s not a matter of education. It’s really a temperamental, personality issue. And so we’re going to have to, I think, develop personalities … 

LEE: Yeah. 

KOHANE: … that are most effective for those different kinds of individuals. And so I think that is going to be the real frontier. Having human values and behaving in ways that are recognizable and yet effective for certain groups of patients. 

LEE: Yeah. 

KOHANE: And lots of deep questions, including how paternalistic do we want to be? 

LEE: All right, so we’re getting into medical science and hallucination. So that gives me a great segue to the conversations in the episode on biomedical research. And one of the people that I interviewed was Noubar Afeyan from Moderna and Flagship Pioneering. So let’s listen to this snippet. 

LEE: [LAUGHS] So I think that really touches on just the fact that there’s so many unknowns and such lack of precision and exactness in our understanding of human biology and of medicine. Carey, what do you think? 

GOLDBERG: I mean, I just have this emotional reaction, which is that I love the idea of AI marching into biomedical science and everything from getting to the virtual cell eventually to, Zak, I think it was a colleague of yours who recently published about … it was a new medication that had been sort of discovered by AI (opens in new tab), and it was actually testing out up to the phase II level or something, right?

KOHANE: Oh, this is Marinka’s work. 

GOLDBERG: Yeah, Marinka, Marinka Zitnik. And … yeah. So, I mean, I think it avoids a lot of the, sort of, dilemmas that are involved with safety and so on with AI coming into medicine. And it’s just the discovery process, which we all want to advance as quickly as possible. And it seems like it actually has a great deal of potential that’s already starting to be realized. 

LEE: Oh, absolutely. 

KOHANE: I love this topic. First of all, I thought, actually, I think Bill and Seb, actually, had interesting things to say on that very topic, rationales which I had not really considered why, in fact, things might progress faster in the discovery space than in the clinical delivery space, just because we don’t know in clinical medicine what we’re trying to maximize precisely. Whereas for a drug effect, we do know what we’re trying to maximize. 

LEE: Well, in fact, I happened to save that snippet from Bill Gates saying that. So let’s cue that up. 

LEE: So, Zak, isn’t that Bill saying exactly what you’re saying? 

KOHANE: That is my point. I have to say that this is another great bet, that either we’re all going to be surprised or a large group of people will be surprised or disappointed. 

There’s still a lot of people in the sort of medicinal chemist, trialist space who are still extremely skeptical that this is going to work. And we haven’t quite shown them yet that it is. Why have we not shown them? Because we haven’t gone all the way to a phase III study, which showed that the drug behaves as expected to, is effective, and basically doesn’t hurt people. That turns out to require a lot of knowledge. I actually think we’re getting there, but I understand the skepticism. 

LEE: Carey, what are your thoughts? 

GOLDBERG: Yeah. I mean, there will be no way around going through full-on clinical trials for anything to ever reach the market. But at the same time, you know, it’s clearly very promising. And just to throw out something for the pure fun of it, Peter, I saw … one of my favorite tweets recently was somebody saying, you know, isn’t it funny how computer science is actually becoming a lot more like biology in that it’s just becoming empirical. 

It’s like you just throw stuff at the AI and see what it does. [LAUGHTER] And I was like, oh, yeah, that’s what Peter was doing when we wrote the book. I mean, he understood as many innards as anybody can. But at the same time, it was a totally empirical exercise in seeing what this thing would do when you threw things at it. 

LEE: Right. 

GOLDBERG: So it’s the new biology. 

LEE: Well, yeah. So I think we talked in our book about accelerating, you know, biomedical knowledge and medical science. And that actually seems to be happening. And I really had fun talking to Daphne Koller about some of the accomplishments that she’s made. And so here’s a little snippet from Daphne. 

LEE: So, Zak, when I was listening to that, I was reminded of one of the very first examples that you had where, you know, you had a very rare case of a patient, and you’re having to narrow down some pretty complex and very rare genetic conditions. This thing that Daphne says, that seems to be the logical conclusion that everyone who’s thinking hard about AI and biology is coming to. Does it seem more real now two years on? 

KOHANE: It absolutely seems more real. Here’s some sad facts. If you are at a cancer center, you will get targeted therapies if you qualify for it. Outside cancer centers, you won’t. And it’s not that the therapies aren’t available. It’s just that you won’t have people thinking about it in that way. And especially if you have some of the rare and more aggressive cancers, if you’re outside one of those cancer centers, you’re at a significant disadvantage for survival for that reason. And so anything that provides just the “simple,” in quotes, dogged investigation of the targeted therapies for patients, it’s a home run. 

So my late graduate student, Atul Butte, died recently at UCSF, where he was both a professor and the leader of the Bakar Institute, and he was a Zuckerberg Chan Professor of Pediatrics. 

He was diagnosed with a rare tumor two years ago. His wife is a PhD biologist, and when he was first diagnosed, she sent me the diagnosis and the mutations. And I don’t know if you know this, Peter, but this was still when we were writing the book and people didn’t know about GPT-4. 

I put in those mutations into GPT-4 and the diagnosis. And I said, “I’d like to help treat my friend. What’s the right treatment?” And GPT, to paraphrase, GPT-4 said, “Before we start talking about treatment, are you sure this is the right diagnosis? Those mutations are not characteristic for that tumor.” And he had been misdiagnosed. And then they changed the diagnosis therapy and some personnel. 

So I don’t have to hallucinate this. It’s already happened, and we’re going to need this. And so I think targeted therapy for cancers is the most obvious use. And if God forbid one of you has a family member who has cancer, it’s moral malpractice not to look at the genetics and run it past GPT-4 and say, “What are the available therapies?” 

LEE: Yeah. 

KOHANE: I really deeply believe that. 

LEE: Carey, I think one thing you’ve always said is that you’re surprised that we don’t hear more stories along these lines. And I think you threw a quote from Mustafa Suleyman back at me. Do you want to share that? 

GOLDBERG: Yes. Recently, I believe it was a Big Technology interview (opens in new tab), and the reporter asked Mustafa Suleyman, “So you guys are seeing 50 million queries, medical queries, a day [to Copilot and Bing]. You know, how’s that going?” And I think I am a bit surprised that we’re not seeing more stories of all types. Both here’s how it helped me and also here was maybe, you know, a suggestion that was not optimal. 

LEE: Yeah. I do think in our book, we did predict both positive and negative outcomes of this. And it is odd. Atul was very open with his story. And of course, he is such … he was such a prominent leader in the world of medicine. 

But I think I share your surprise, Carey. I expected by now that a lot more public stories would be out. Maybe there is someone writing a book collecting these things, I don’t know. 

KOHANE: Maybe someone called Carey Goldberg should write that book. [LAUGHTER] 

GOLDBERG: Write a book, maybe. I mean, we have Patients Use AI (opens in new tab), which is a wonderful blog by Dave deBronkart, the patient advocate. 

But I wonder if it’s also something structural, like who would be or what would be the institution that would be gathering these stories? I don’t know. 

LEE: Right. 

KOHANE: And that’s the problem. You see, this goes back to the same problem that [Ethan] Mollick was talking about. Individual doctors are using them. The hospital as a whole is not doing that. So it’s not judging the quality, as part of its quality metrics, of how good the AI is performing and what new has happened. And the other audience, namely the patients, have no mechanism. There is no mechanism to go to Better Business Bureau and say, “They screwed up,” or “This was great.” 

LEE: So now I want to get a little more futuristic. And this gets into whether AI is really going to get almost to the ab initio understanding of human biology. And so Eric Topol, who is one of the guests, spoke to this a bit. So let’s hear this. 

LEE: You know, I have to say Eric’s optimism took me aback. Just speaking as a techie, I think I started off being optimistic: as soon as we can figure out molecular dynamics, biology can be solved. And then you start to learn more about biochemistry, about the human cell, and then you realize, oh, my God, this is just so vast and unknowable. And now you have Eric Topol saying, “Well, in less than 10 years.” 

KOHANE: So what’s delightful about this period is that those of us who are cautious were so incredibly wrong about AI two years ago. [LAUGHTER] That’s a true joy … I mean, absolute joy. It’s great to have your futurism made much more positive. 

But I think that we’re going from, you know, for example, AlphaFold has had tremendous impact. But remember, that was built on years of acquisition of crystallography data that was annotated. And of course, the annotation process becomes less relevant as you go down the pipe, but it started from that. 

LEE: Yes. 

KOHANE: And there’s lots of parts of the cell. So when people talk about virtual cells—I don’t mean to get too technical—mostly they’re talking about perturbation of gene expression. They’re not talking about, “Oh, this is how the liposome and the centrosome interact, and notice how the Golgi bodies bump into each other.” 

There’s a whole bunch of other levels of abstraction we know nothing about. This is a complex factory. And right now, we’re sort of the level from code into loading code into memory. We’re not talking about how the rest of the robots work in that cell, and how the rest of those robots work in the cell turns out to be pretty important to functioning. 

So I’d love to be wrong again. And in 10 years, oh yeah, not only, you know, our first in-human study will be you, Dr. Zak. We’re going put the drug because we fully simulated you. That’d be great. 

LEE: Yes. 

KOHANE: And, by the way, just to give people their due, there probably was a lot of animal research that could be done in silico and that for various political reasons we’re now seeing happen. That’s a good thing. But I think that sometimes it takes a lot of hubris to get us where we need to get, but my horizon is not the same as his. 

LEE: So I guess I have to take this time to brag. Just recently out of our AI for Science team did publish in Science a biological emulator that does pretty long timespan, very, very precise, and very efficient molecular dynamics, biomolecular dynamics emulation. We call it emulation because it’s not simulating every single time step but giving you the final confirmations. 

KOHANE: That’s an amazing result. 

LEE: Yeah. 

KOHANE: But … that is an amazing result. And you’re doing it in some very important interactions. But there’s so much more to do. 

LEE: I know, and it’s single molecules; it’s not even two molecules. There’s so much more to go for here. But on the other hand, Eric is right, you know, 42 experts writing for Cell, you know, that’s not a small matter. 

KOHANE: So I think sometimes you really need to drink your own hallucinogens to actually succeed. Because remember, when the Human Genome Project (opens in new tab) was launched, we didn’t know how to sequence at scale. 

We said maybe we would get there. And then in order to get the right funding and excitement and, I think, focus, we predicted that by early 2000s we’d be transforming medicine. Has not happened yet. Things have happened, but at a much slower pace. And we’re 25 years out. In fact, we’re 35 years out from the launch. 

But again, things are getting faster and faster. Maybe the singularity is going to make a whole bunch of things easier. And GPT-6 will just say, “Zak, you are such a pessimist. Let me show you how it’s done.” 

GOLDBERG: Yeah. 

It really is a pessimism versus optimism. Like is it, I mean, biology is such a bitch, right. [LAUGHTER] Can we actually get there? 

At the same time, everyone was surprised and blown away by the, you know, the quantum leap of GPT-4. Who knows when enough data gets in there if we might not have a similar leap. 

LEE: Yeah. All right. 

So let’s get back to healthcare delivery. Besides Morgan Cheatham, we talked to [a] more junior medical student who’s at the Kaiser Permanente School of Medicine, Daniel Chen. And, you know, I asked him about this question of patients who come in armed [LAUGHS] with a lot of their own information. Let’s hear what he said about this. 

LEE: So, Zak, as far as I can tell, Daniel and Morgan are figuring this out on their own as medical students. I don’t think this is part of the curriculum. Does it need to be? 

KOHANE: It’s missing the bigger point. The incentives and economic forces are such that even if you were Daniel, and things have not changed in terms of incentives, and it’s 2030, he still has to see this many patients in an hour. 

And sitting down, going over that with a patient, let’s say some might need more … in fact, I think computer scientists are enriched for these sort of neurotic “explain [to] me why this works,” when often the answer is, “I have no idea; empirically it does.” 

And patients in some sense deserve that conversation, and we’re taught about joint decision making, but in practice, there’s a lot of skills that are deployed to actually deflect so that you can get through the appointment and see enough patients per hour. 

And that’s why I think that one of the central … another task for AI is how to engage with patients to actually explain to them why their doctor is doing what he’s doing and perhaps ask the one or two questions that you should be asking the doctor in order to reassure you that they’re doing the right thing.

LEE: Yeah. 

KOHANE: I just … right now, we are going to have less doctor time, not more doctor time. 

And so I’ve always been struck by the divide between medicine that we’re taught as it should be practiced as a gentle person’s vocation or sport as opposed to assembly line, heads down “you’ve got to see those patients by the end of the day” because, otherwise, you haven’t seen all the patients at the end of the day. 

LEE: Yeah. Carey, I’ve been dying to ask you this, and I have not asked you this before. When you go see a doctor, are you coming in armed with ChatGPT information? 

GOLDBERG: I haven’t needed to yet, but I certainly would. And also my reaction to the medical student description was, I think we need to distinguish between the last 20 years, when patients would come in armed with Google, and what they’re coming in with now because at least the experiences that I’ve witnessed, it is miles better to have gone back and forth with GPT-4 than with, you know, dredging what you can from Google. And so I think we should make that distinction. 

And also, the other thing that most interested me was this question for medical students of whether they should not use AI for a while so that they can learn … 

LEE: Yes. 

GOLDBERG: … how to think and similarly maybe don’t use the automated scribes for a while so they can learn how to do a note. And at what point should they then start being able to use AI? And I suspect it’s fairly early on that, in fact, they’re going be using it so consistently that there’s not that much they need to learn before they start using the tools. 

LEE: These two students were incredibly impressive. And so I have wondered, you know, if we got a skewed view of things. I mean, Morgan is, of course, a very, very impressive person. And Daniel was handpicked by the dean of the medical school to be a subject of this interview. 

KOHANE: You know, we filter our students, by and large, I mean, there’s exceptions, but students in medical school are so starry eyed. And they are really … they got into medical school—I mean, some of them may have faked it—but a lot of them because they really wanted to do good. 

LEE: Right. 

KOHANE: And they really wanted to help. And so this is very constant with them. And it’s only when they’re in the machine, past medical school, that they realize, oh my God, this is a very, very different story. 

And I can tell you, because I teach a course in computational-enabled medicine, so I get a lot of these nerd medical students, and I’m telling them, “You’re going to experience this. And you’re going to say, ‘I’m not going to able to change medicine until I get enough cred 10, 15 years from now, whereas I could start my own company and immediately change medicine.’” 

And increasingly I’m getting calls in like residency and saying, “Zak, help me. How do I get out of this?” 

GOLDBERG: Wow. 

KOHANE: And so I think there’s a real disillusionment of, like, between what we’re asking for people coming to medical school—we’re looking for a phenotype—and then we’re disappointing them massively, not everywhere, but massively. 

And for me, it’s very sad because among our best and brightest, and then because of economics and expectations and the nature of the beast, they’re not getting to enjoy the most precious part of being a doctor, which is that real human connection, and longitudinality, you know, the connection between the same doctor visit after visit, is more and more of a luxury. 

LEE: Well, maybe this gets us to the last episode, you know, where I talk to a former, you know, state director of public health, Umair Shah, and with Gianrico Farrugia, who’s the CEO of Mayo Clinic. And I think if there’s one theme that I took away from those conversations is that we’re not thinking broadly enough nor big enough. 

And so here’s a little quote of exchange that Umair Shah, who was the former head of public health in the State of Washington and prior to that in Harris County, Texas, and we had a conversation about what techies tend to focus on when they’re thinking about AI and medicine. 

LEE: I have been definitely guilty. I think Umair, of course, was speaking as a former frustrated public health official in just thinking about all the other things that are important to maintain a healthy population. 

Is there some lesson that we should take away? I think our book also focused a lot on things like diagnosis. 

KOHANE: Yeah. Well, first of all, I think we just have to have humility. And I think it’s a really important ingredient. I found myself staring at the increase in lifespan in human beings over the last two centuries and looking for bumps that were attributable. 

I’m in medical school. I’ve already made this major commitment. What are the bumps that are attributable to medicine? And there was one bump that was due to vaccines, a small bump. Another small bump that was due to antibiotics. And the rest of it is nutrition, sanitation, yeah, nutrition and sanitation. 

And so I think doctors can be incredibly valuable, but not all the time. And we’re spending now one-sixth of our GDP on it. The majority of it is not effectively prolonging life. And so the humility has to be the right medicine at the right time. 

But that runs, (A) against a bunch of business models. It runs against the primacy of doctors in healthcare. It was one thing when there were no textbooks; there was no PubMed. You know, the doctor was the repository of all the probably knowledge that we have. But I think your guests were right. We have to think more broadly in the public health way. How do we make knowledge pervasive like sanitation? 

GOLDBERG: Although I would add that since what we’re talking about is AI, it’s harder to see if … and if what you’re talking about is public health, I mean, it was certainly very important to have good data during the pandemic, for example. 

But most of the ways to improve public health, like getting people to stop smoking and eat better and sleep better and exercise more, are not things that AI can help with that much. Whereas diagnosis or trying to improve treatment are places that it could tackle. 

And in fact, Peter, I wanted to put you—oh, wait, Zak’s going to say something—but, Peter, I wanted to put you on the spot. 

LEE: Yeah. 

GOLDBERG: I mean, if you had a medical issue now, and you went to a physician, would you be OK with them not using generative AI? 

LEE: I think if it’s a complex or a mysterious case, I would want them to use generative AI. I would want that second opinion on things. And I would personally be using it. If for no other reason than just to understand what the chart is saying. 

I don’t see, you know, how or why one wouldn’t do that now. 

KOHANE: It’s such a cheap second opinion, and people are making mistakes. And even if there are mistakes on the part of AI, if there’s a collision, discrepancy, that’s worth having a discussion. And again, this is something that we used to do more of when we had more time with the patients; we’d have clinic conferences. 

LEE: Yeah. 

KOHANE: And we don’t have that now. So I do think that there is a role for AI. But I think again, it’s much more of a continual presence, being part of a continued conversation rather than an oracle. 

And I think that’s when you’ll start seeing, when the AI is truly a colleague, and saying, “You know, Zak, that’s the second time you made that mistake. You know, that’s not obesity. That’s the effect of your drugs that you’re giving her. You better back off of it.” And that’s what we need to see happen. 

LEE: Well, and for the business of healthcare, that also relates directly to quality scores, which translates into money for healthcare providers. 

So the last person that we interviewed was Gianrico Farrugia. And, you know, I was sort of wondering, I was expecting to get a story from a CEO saying, “Oh, my God, this has been so disruptive, incredibly important, meaningful, but wow, what a headache.” 

At least Gianrico didn’t expose any of that. Here’s one of the snippets to give you a sense. 

LEE: So I tried pretty hard in that interview to get Gianrico to admit that there was a period of headache and disruption here. And he never, ever gave me that. And so I take him at his word. 

Zak, maybe I should ask you, what about Harvard and the whole Harvard medical ecosystem? 

KOHANE: I would be surprised if there are system-wide measurable gains in health quality right now from AI. And I do have to say that Mayo is one of the most marvelous organizations in terms of team behavior. So if there’s someone who’s gotten the team part of it right, they’ve come the closest, which relates to our prior conversation. They have the quarterback idea … 

LEE: Yes. 

KOHANE: … pretty well down compared to others. 

Nonetheless, I take him at his word, that it hasn’t disrupted them. But I’m also, I have yet to see the evidence that there’s been a quantum leap in quality or efficacy. And I do believe that it’s possible to have a quantum leap in efficacy in the right system. 

So if they haven’t been disrupted, I would venture that they’ve absorbed it, but they haven’t used it to its fullest potential. And the way I could be proven wrong is next year, also the metrics showing that over the last year, they’ve had, you know, decreased readmissions, decreased complications, decreased errors and all that. And if so, God bless them. And we should all be more like Mayo. 

LEE: So I thought a little bit about two other quotes from the interviews that sort of maybe would send us off with some more inspirational kind of view of the future. And so there’s one from Bill Gates and one from Gianrico Farrugia. So what I’d like to do is to play both of those and then maybe we can have our last comments. 

And now Gianrico. 

All right. I think these are both kind of calls to be more assertive about this and more forward leaning. I think two years into the GPT-4 era, those are pretty significant and pretty optimistic calls to action. So maybe just to give you both one last word. What would be one hope that you would have for the world of healthcare and medicine two years from now? 

KOHANE: I would hope for businesses that whoever actually owns them at some holding company level, regardless of who owns them, are truly patient-focused companies, companies where the whole AI is about improving your care, and it’s only trying to maximize your care and it doesn’t care about resource limitations. 

And as I was listening to Bill, and the problem with what he was saying about saving dollars for governments is for many things, we have some very expensive things that work. And if the AI says, “This is the best thing,” it’s going to break your bank. And instead, because of research limitations, we play a human-based fancy footwork to get out of it. 

That’s a hard game to play, and I leave it to the politicians and the public health officials who have to do those trades of utilities. 

In my role as doctor and patient, I’d like to see very informed, authoritative agents acting only on our behalf so that when we go and we seek to have our maladies addressed, the only issue is, what’s the best and right thing for me now? And I think that is both technically realizable. And even in our weird system, there are business plans that will work that can achieve that. That’s my hope for two years from now. 

LEE: Yeah, fantastic. Carey. 

GOLDBERG: Yeah. I second that so enthusiastically. And I think, you know, we have this very glass half full/glass half empty phenomenon two years after the book came out. 

And it’s certainly very nice to see, you know, new approaches to administrative complexity and to prior authorization and all kinds of ways to make physicians’ lives easier. But really what we all care about is our own health and that we would like to be able to optimize the use of this truly glorious technological achievement to be able to live longer and better lives. And I think what Zak just described is the most logical way to do that. 

[TRANSITION MUSIC] 

LEE: Yeah, I think for me, two years from now, I would like to see all of this digital data that’s been so painful, such a burden on every doctor and nurse to record, actually amount to something meaningful in the care of patients. And I think it’s possible. 

KOHANE: Amen. 

GOLDBERG: Yeah. 

LEE: All right, so it’s been quite a journey. We were joking before we’re still on speaking terms after having written a book. [LAUGHS] 

And then, um, I think listeners might enjoy knowing that we debated amongst ourselves what to do about a second edition, which seemed too painful to me, and so I suggested the podcast, which seemed too painful to the two of you [LAUGHTER]. And in the end, I don’t know what would have been easier, writing a book or doing this podcast series, but I do think that we learned a lot. 

Now, last bit of business here. To avoid having the three of us try to write a book again and do this podcast, I leaned on the production team in Microsoft Research and the Microsoft Research Podcast. And I thought it would be good to give an explicit acknowledgment to all the people who’ve contributed to this. 

So it’s a long list of names. I’m going to read through them all. And then I suggest that we all give an applaud [LAUGHTER] to them. And so here we go. 

There’s Neeltje Berger, Tetiana Bukhinska, David Celis Garcia, Matt Corwine, Jeremy Crawford, Kristina Dodge, Chris Duryee, Ben Ericson, Kate Forster, Katy Halliday, Alyssa Hughes, Jake Knapp, Weishung Liu, Matt McGinley, Jeremy Mashburn, Amanda Melfi, Wil Morrill, Joe Plummer, Brenda Potts, Lindsay Shanahan, Sarah Sobolewski, David Sullivan, Stephen Sullivan, Amber Tingle, Caitlyn Treanor, Craig Tuschhoff, Sarah Wang, and Katie Zoller. 

Really a great team effort, and they made it super easy for us. 

GOLDBERG: Thank you. Thank you. Thank you. 

KOHANE: Thank you. Thank you.

GOLDBERG: Thank you. 

[THEME MUSIC] 

LEE: A big thank you again to all of our guests for the work they do and the time and expertise they shared with us. 

And, last but not least, to our listeners, thank you for joining us. We hope you enjoyed it and learned as much as we did. If you want to go back and catch up on any episodes you may have missed or to listen to any again, you can visit aka.ms/AIrevolutionPodcast (opens in new tab).

Until next time.

[MUSIC FADES] 

The post Coauthor roundtable: Reflecting on healthcare economics, biomedical research, and medical education appeared first on Microsoft Research.

]]>
Reimagining healthcare delivery and public health with AI http://approjects.co.za/?big=en-us/research/podcast/reimagining-healthcare-delivery-and-public-health-with-ai/ Thu, 07 Aug 2025 16:00:00 +0000 http://approjects.co.za/?big=en-us/research/podcast/reimagining-healthcare-delivery-and-public-health-with-ai/ Former Washington State Secretary of Health Dr. Umair Shah and Mayo Clinic CEO Dr. Gianrico Farrugia explore how healthcare leaders are approaching AI when it comes to public health, care delivery, the healthcare-research connection, and the patient experience.

The post Reimagining healthcare delivery and public health with AI appeared first on Microsoft Research.

]]>
Illustrated headshots of Peter Lee, Umair Shah, Gianrico Farrugia

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this episode, healthcare leaders Dr. Umair Shah (opens in new tab) and Dr. Gianrico Farrugia (opens in new tab) join Lee to discuss AI’s impact on the business of public health and healthcare delivery, the healthcare-research connection, and the patient experience. Shah, a healthcare strategic consultant and former state secretary of health, explores the role of public health in the larger ecosystem and why it might not get the attention it needs or deserves and how AI could be leveraged to assist in data analysis, to help better engage with people on matters of public health, and to help narrow gaps between care delivery and public health responses during health emergencies. Farrugia, president and CEO of Mayo Clinic, traces AI’s path from predictive to generative and discusses how that progress has helped usher in a new healthcare architecture for Mayo Clinic and its partners, one powered by the goal of longer, healthier lives for patients, and how AI is also changing Mayo Clinic’s research and the education it provides, including the offering of masters and PhDs in AI and other emerging technologies. 

Transcript 

[MUSIC] 

[BOOK PASSAGE] 

PETER LEE: “In US healthcare, quality ratings are increasingly used to tie the improvement in patient health outcomes to the reimbursement rates that healthcare providers can receive. The ability of GPT-4 to understand these systems and give concrete advice … has a chance to make it easier for providers to achieve success in both dimensions.” 

[END OF BOOK PASSAGE]

[THEME MUSIC]

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES] 

The book passage I read at the top is from Chapter 7, “The Ultimate Paperwork Shredder.”

Public health officials and healthcare system leaders influence the well-being and health of people at the population level. They help shape people’s perceptions and responses to public health emergencies, as well as to chronic disease. They help determine the type, quality, and availability of treatment. All this is critical for maintaining good public health, as well as aligning better health and financial outcomes. That, of course, is the main goal of the concept of value-based care. AI can definitely have significant ramifications for achieving this. 

Joining us today to talk about how leaders in public health and healthcare systems are thinking about and acting on this new generation of AI is Dr. Umair Shah and Dr. Gianrico Farrugia. 

Dr. Umair Shah is a nationally recognized health leader and innovator. He led one of America’s top-rated pandemic responses as Washington State’s secretary of health, a position he held from 2020 to 2025. Umair previously directed Harris County Public Health in Texas, overseeing large-scale emergency response for the nation’s third-largest county, while building an emergency-care career spanning 20-plus years. He now advises organizations on health innovation and strategy as founder and principal of Rickshaw Health. 

Dr. Gianrico Farrugia is the president and CEO of Mayo Clinic, the world’s top-ranked hospital for seven consecutive years, and a pioneer in technology-forward, platform-based healthcare. Under his leadership, Mayo has built and deployed the Mayo Clinic Platform. The platform enables Mayo and its partners to gain practical insights from a comprehensive repository of longitudinal de-identified clinical data spanning four continents. Gianrico is also a Mayo Clinic physician and professor and an author. 

Umair and Gianrico are CEO-level leaders representing some of the best of the worlds of public health, healthcare delivery, medical research, and medical education. 

[TRANSITION MUSIC] 

Here is my interview with Dr. Umair Shah: 

LEE: Umair, it’s really great to have you here. 

UMAIR SHAH: Peter, it’s my pleasure. I’ve been looking forward to this conversation, and I hope you are well today. 

LEE: [LAUGHS] I am doing extremely well.

So, you know, what I’d like to do in these conversations is first just to start, a little bit about you.

SHAH: Sure. 

LEE: You served actually during a really tumultuous time as the secretary of health in the State of Washington. But you recently stepped away from that and you started your own firm, Rickshaw Health. So can we start there? What’s that all about? 

SHAH: Yeah, no, absolutely. First of all, you know, I would say that the transition from Texas to Washington could not have been more geopolitically different, [LAUGHTER] as you can imagine.

LEE: Sure. 

SHAH: You know, if you like the red-blue paradigms, you couldn’t be more, you know, red and you couldn’t be more blue, I think. 

LEE: Yes. 

SHAH: But what happened is, back in November this past year, as I saw some of the playout of continuation of this red-blue dynamic, I made the decision to step down. And Jan. 15, I stepped down, as you mentioned, and I spent some time really thinking about what I wanted to do next and was looking at a number of opportunities. 

And then a moment in time, there were some things happening in our—my wife and our family’s personal lives that sort of made me think that I wanted to focus a little bit more on family. And I felt the universe was saying, “Stay still.” [LAUGHTER] 

And I launched Rickshaw Health (opens in new tab) and the notion that, as you know, Peter, rickshaws are oftentimes known across the globe as these modes of transport that reliably get you through ever-changing streets and traffic patterns and all sorts of ecosystems that are evolving at all times. And they get you to the other side and they get you also with a sense of exhilaration. Like when I took my boys to Karachi, and we were—you know, they jumped in a rickshaw and the, you know, open air [LAUGHTER] and they felt this incredible excitement. 

And so Rickshaw Health was speaking to the three wheels of a rickshaw that symbolize the three children that we have and the real notion of how do we bring balance and agility and performance to the forefront and then move in an ever—just like streets—ever-changing healthcare environment that is constantly evolving, and we too must evolve with it. And that’s what Rickshaw Health is all about, is taking clients to that next level of trying to navigate, especially at this time, a very, very different landscape than even several months ago. So, excited about it. 

LEE: Yeah, absolutely. You know, you made this transition from Texas to the State of Washington. And for people who listen to this podcast and don’t know, the particular part of Texas where you were—Harris County—is really big, very, very important in that state. That’s just not, you know, the normal county in Texas. 

SHAH: Yeah. [LAUGHS] 

LEE: It’s actually … it’s actually known as quite a forward-looking place, technologically. 

SHAH: That’s right. 

LEE: So what was, you know, the transition like, then, going from, you know, possibly the most, sort of, maybe advanced county in the State of Texas, a large place, to the State of Washington? 

SHAH: Yeah, you know, Harris County is the third-largest county in the US. So it had close to five million. And now it’s probably … it’s exceeded the five million people, and a very diverse, very forward-looking, as you mentioned, technologically very, very much looking at what’s the next horizon, and home to Texas Medical Center [TMC] as well, which is … 

LEE: Right. 

SHAH: … the largest medical center. Of course, it had to be Texas. So it can’t be the largest in the state or the country [LAUGHTER]—the largest in the world, right. 

And TMC also had a number of different initiatives related to startups and venture capital and VC. And so they had launched something called TMCX. And that was a real opportunity—and I know you’re familiar with it—an opportunity to really look at how do you incubate all sorts of different innovations and bringing private sector, public sector as well as healthcare delivery alongside these startups to really look at the landscape. 

And so when I left Houston and came to Washington, I realized that obviously, I was in the backyard … I mean, you know, you all at Microsoft Research and the work that you’re all doing is part of an ecosystem of advanced innovation that’s occurring in the Pacific Northwest that, you know, when we see all the players that are here, all the, you know, ones that do so many different things, but they’re doing them with an eye towards technology, advancements, and adoptions, it’s been quite amazing. 

When I made that transition, it was really about, you know, the vaccines and what was happening with, you know, with COVID and fighting the—you know, remember, this was the state that had the first case in the continental United States, had the first outbreak, and the first [lab-confirmed] death. And fast-forward a few years later, we had the fifth-lowest death rate in the US. And that was because we all came together to do so much.

LEE: Yeah, well maybe that gets us into a question that I ask a lot of our guests, which is, you know, and maybe let’s, since we’re on your time as the secretary of health in Washington State, [start] with that job. I ask, how would you explain to your mother what you do every day? 

SHAH: [LAUGHS] I laugh because that’s been such a fascinating conversation in public health because we have oftentimes been—it’s been really hard to describe what that is. 

LEE: Yeah. 

SHAH: And, you know, there are so many metaphors and, you know, analogies that we’ve used. I’ve always wondered why we do not have more television shows or sitcoms or dramas that are about the public health workforce or the work that we do in the field, because you have, you know, all sorts of healthcare delivery ones, right. 

LEE: Yup. 

SHAH: As a practicing physician for 20 years, I realized that people knew what doctors did; they knew what nurses did, right. They intimately touch the healthcare system. 

LEE: Yes. 

SHAH: They understood, you know, that an ambulance picks you up at your home or somewhere else, transports you … gets you to the emergency department. The emergency department, they do some things to you or within the four walls of that ER, and then you’re either admitted, sent home, and several days, weeks, whatever later, you get home if you’re admitted, and you start your, you know, post-hospital stay at home or your rehab or what have you. And that all is known to people. 

But when you ask your mother, your grandmother, or your, you know, your uncle, or your brother, your neighbor, your coworker about what is public health, they have a very quizzical look on their face of what that is. 

LEE: Right. 

SHAH: And so what I’ve … 

LEE: You know, just one thing I’ve learned is: it’s not just all the people you mentioned. Even healthcare professionals sometimes have that quizzical look. 

SHAH: Yeah, good point. That’s right. Good point. And a lot of it is because we don’t get exposed to it or trained in it. You know, we think about public health when we’re in our training. And, you know, I’m sure you had a very similar piece of this is that, you know, you see it as, oh, that’s the health department that takes care of, you know, STDs, or it takes care, you know, it does the immunizations, or, you know, maybe they do some water quality, or maybe they do mosquitoes [mosquito control], and things like that. But the reality is, we do all of those things and more. 

So my metaphor has been that we are the offensive line of a football team, and the healthcare delivery is the quarterback. So everybody focuses on, you know, from a few years back, everybody knows Tom Brady, right. 

LEE: Yeah. [LAUGHS] 

SHAH: He won the Super Bowls, everybody knows what … but if you asked people who was number 75 on the offensive line of the New England Patriots … 

LEE: Right. 

SHAH: … or name your favorite football team. And the answer would be: you would not be able to likely answer that question. You would know Tom Brady, the quarterback, and that’s healthcare delivery, the ER doc or the hospitalist or the nurse or the, you know, the medical assistant, or the people that are doing all the work in the field that are the ones that are more visible, but the invisible workforce of the offensive line, that’s who we don’t know. And yet these are the people that are blocking and sweating and doing all things to complement the work and make sure the quarterback is successful. 

And here’s where the metaphor breaks down, that when Tom Brady wins the Super Bowl, we continue to invest in the offensive line because we recognize the value of it and we want the quarterback to be successful the next season. But in public health or in society, we do the exact opposite. 

When tuberculosis rates come down, we say, well, you know what? We’ve solved the problem; we don’t need it anymore. 

LEE: Right. 

SHAH: Or you have another, you know, environmental issue that’s no longer there, you say, “We don’t need it anymore.” And we disinvest from public health or that offensive line. And then you start to see those rates go back up. 

And so my answer to Mom and Grandma and Dad and Grandpa is we are critical to your health because we touch you every single day. And so please invest in us. 

LEE: Yeah. And, you know, I think I’m going to want to get a little deeper on that in just a few minutes here, because, I think especially during the pandemic, that issue of not understanding the importance of that offensive lineman actually really came to the forefront. 

And so I’d like to get into that. But the, kind of, second, kind of, standard thing I’ve been probing with people is still just focusing on you and your background is what touchpoints or experiences you’ve had with AI in the past. 

And not everyone has. Like, it maybe isn’t too surprising that doctors and healthcare developers, tech developers, have lots of contact with AI, but would the top dog, you know, at a public health agency ever have had significant contact with AI? What about you? 

SHAH: You know, it’s interesting. Several years ago, I was in the audience with the [then] FEMA director, [Rich Serino], who just did such an incredible job. And I remember he made this comment at that time. And, Peter, this may have been like … I don’t know—I’m dating myself—10, 15, maybe even 20 years ago, and he said, “Everybody in the audience, there’s this, you know, app called Twitter.” And, you know, “How many people in the audience have ever sent a tweet or know about this?” And I don’t know, maybe—it was a public health audience—maybe about 15% of the people raised their hands. 

He said, “I challenge you to right now, pick up your phone, download the app, and go ahead and send a tweet right now.” 

And I remember I sent my first tweet at that time. And it was so thought provoking for me was that he was saying you need to be engaged in social media, but the other 85% of the audience had not even done that or had … 

LEE: Right. 

SHAH: … even understood the importance of social media at that time. Or maybe they understood, but they had restrictions on how to utilize, right. 

So that has stayed with me because that’s very much about this revolution of AI that I know that public health and population health practitioners like myself who have been in the trenches and understand the importance of it, they really believe in the importance or think they know the importance. 

But NACCHO, the National Association of County and City Health Officials, had done a survey of local health agencies. And about two-thirds, if not three-quarters, of local health agencies reported that they had an AI capacity that was low or lower than ideal. 

LEE: Hmm. Yeah. Yeah. 

SHAH: And that is very much where I come from. When I was in public sector and at the state health agency, our transformation was very much about how do we advance the work, and how do we utilize this in a population health standpoint? 

And I was fortunate to have a chief of innovation at Washington State Department of Health, Les Becker, who understood the value of AI. And as you know, we did also hold a AI science convening that … 

LEE: Yep. Yeah. 

SHAH: … your team was there with University of Washington. And that was really an opportunity for us to say that AI is here. It’s not tomorrow. It’s not next year. It’s not the future. It’s already here. We need to embrace it. 

But here’s the problem, Peter, far too few people in our field understand just how to embrace it. 

LEE: Right. 

SHAH: So I have become a markedly more champion of AI. One, since I read your book. So I think there’s that. So thank you for writing it. But two, since I really recognize that when I became a solo or a primary-few practitioner in my own realm, I needed to force-amplify the work that I was doing. 

And when I look back, and I continue to stay in touch with my colleagues in the field of public health, what they’re also struggling with is that you have an epidemiologist who’s got a mound of information—data, statistics, etc.—that they are going through, and they’re doing everything in their power to get that processed and analyzed. 

LEE: Yep. Yep. 

SHAH: AI can take 80% of that and do it. And that epidemiologist can now turn to more of an overseer and a gatekeeper and to really recognize the patterns … 

LEE: Yep. 

SHAH: … and let AI be able to do the, you know, grunt work. And similarly, as you know, measles—with the outbreaks that we’ve seen, especially in Texas but elsewhere—you’ve got an opportunity where our communications people who are saying, “Look, we’re about to have, or we know we’re about to announce that there’s a measles outbreak in, you know, in our community or our state or what have you—our region.” 

And they can have AI go through different press briefings and/or press releases and say, “Give me the state of the art on how I should communicate this message to the community.” 

LEE: Hmm. 

SHAH: And bam! You can do that. And now you can oversee that work, as well. And then the third example is that we are always looking at how do we find ways to have a deeper connection with those who come to our, you know, our websites or come to our engagement tools—with bots and things like that. AI can really accelerate that work, as well. So there’s so many use cases that AI has for population health or public health. 

LEE: Yeah. 

SHAH: But I think the challenge is that we just don’t have enough adoption because they’re … one, we’ve had funding cuts, but two is that there is this real hesitation on, what is it that we can do? And I argue—the last thing I’ll say about this, Peter—is that I argue that AI is happening right now. The discussions, the technology advancements, the work, the policy work, all that’s happening right now. If public health practitioners are not at the table, if they’re not part of the, … 

LEE: Right. 

SHAH: … “What does this look like? How does it work in our field?” … guess what? It’s going to be done to us and for us rather than with us. And if we do not get with that and get to the table, then unfortunately it may not be exactly what we want it to be at the end of the day. 

LEE: I find it really interesting that you are using the terms “public health” and “population health” … 

SHAH: Yeah. 

LEE: … pretty much interchangeably here. And I think that that’s something that I think touches on an assumption that was both implicit and explicit in the book that we wrote, which is: we were making some predictions that our ability to extract insights and knowledge from population health data would be enhanced through the use of AI. And I think that it looks to me like that has been more challenging and has come along more slowly over the past two years. But what is your view? 

SHAH: Yeah, I think part of, and I think you and I have had this conversation, you know, in bits and pieces. I think one of the real challenges is that when even tech companies, and you can name all of them, when they look at what they’re doing in the AI space, they gravitate towards healthcare delivery. 

LEE: Yes. 

SHAH: Right? That’s, it’s … 

LEE: And in fact, it’s not even delivery. I think techies—I did this, too—tend to gravitate specifically to diagnosis. 

SHAH: Yes, that’s right. That’s right. You know, I think that’s a really good point. And, you know, when you look at sepsis or you look at pneumonia or try to figure out ways that, you know, radiologists or x-rays or CT scans can be read, it’s, I mean, there are so many use cases that are within the healthcare sector. And I think that gets back to this inequity that we have when we look at population health or, you know, this broad, um, swath of land that is, oftentimes, left behind or unexplored, and you have healthcare delivery. Now, healthcare delivery we know gets 95 cents or 96 cents of every dollar. So it makes sense why, right. But we also know that, at the end of the day, we’re looking at value-based outcomes, and you cannot be successful in the healthcare delivery system unless we are truly looking at prevention and what’s happening in the community and the population. 

LEE: Right. 

SHAH: And that’s why I use it interchangeably, but I know that “public health” has got a very specific term, and “population health” is a different set of ways of looking at the world. The reason that people try to shy away from pop health in essence is that you could talk about population health as being my population of patients in a clinic. It could be my health systems population. It could be an insurance company saying, these are the lives covered, right. So it becomes, what is population? When we think of public health, we think of the entirety of the population, right. In the State of Washington, eight million people. Harris County, five million people. Or in the US, 300—whatever the number of millions of people that—we think of the entire population. And what is it that actually impacts the health and well-being of that population is really what that’s about. 

Yet here’s the challenge. When we then talk to those of our partners and our colleagues in the tech field, there are two things happening. One is, there’s a motivation because of the amount of dollars that are in [the] healthcare sector. And number two is, because it’s more familiar, right. 

LEE: Right. 

SHAH: And so there are very few practitioners similar to me that are out there, that are in the pop health who kind of know healthcare delivery because they’ve also seen patients, but they’re also—they worked at that federal, state, local level, community level—they’ve, you know, they’ve done you know various different kinds of environments. 

And they say, “Look, I’ve got a perspective to really help a tech company or somebody see the rest of it,” but you have to have both partners coming together to see that. And I think that’s one of the real challenges that we have. 

LEE: Yeah. 

And so now I’m going to want to go into specific problems, … 

SHAH: Yeah. Sure. 

LEE: … and maybe COVID is a good thing to focus on—the breadth of problems that had to get solved in pandemic response and where the gaps between healthcare delivery and public health were really exposed. 

And so the first problem that I remember really keenly that just seemed so vexing was understanding where the PPE was, the personal protective equipment … 

SHAH: Hmm. Yeah. 

LEE: and where it needed to be. 

SHAH: Yes. 

LEE: And so that turned out … you would think just getting masks and gowns and gloves to the right places at the right times or even understanding where they are so that, you know … and being able to predict, you know, what hospitals, what clinics are most likely to get a big influx of patients during the height of the pandemic would be something that would be straightforward to solve, but that turned out to be an extremely difficult problem. 

But how did it look from where you were sitting? Because you were sitting at the helm having to deal with these problems. 

SHAH: Yeah, we were constantly chasing data and information. And oftentimes, you know, because a lot of these data systems in the public health sector have been underinvested in over the decades, then, you know, you had our biggest emergency crisis of our time, and a lot of public health agencies were either getting, you know, thrown a whole host of resources or had to create things on the fly. 

And whether that was at Harris County or in the State of Washington, I will tell you that what I saw was that, you know, a lot of agencies across the country were still using fax machines, you know, to get data that were coming in. 

And I remember actually—it’s kind of a funny story—there was a fax machine that was highlighted down in our agency in Texas. And we actually had this fax machine, had mounds of, you know, data … sorry, papers that were next to … faxes that were coming in and all these things. 

And you would have, you know, Mr. Peter Lee listed as a patient. And then the next, you know, transmission would have Pete Lee. And then the next transmission would have Peter Lee, but instead of L-E-E, it was L-E-A-H or something, or L-I or something, right. And it was just … 

LEE: Right. 

SHAH: … or you had a date of birth missing, or you had, you know, an address that was off. And what we realized is that over time, a lot of the data that were coming in were just incomplete data, and being able to chase that was really hard. 

And so, you know, I think AI has that potential to really organize it, and to stratify it, and to especially get you to a point of at least cleaning it up. So I don’t think it’s just that AI … AI doesn’t just save time; it saves lives. Truly used … 

LEE: Hmm. Yeah. 

SHAH: … that’s, I think, where we’re talking here. 

And so when you have PPE and things of that nature, as you talked about, here in the State of Washington or what we were trying to do to get vaccines out or everything we’re doing to try to get communication messages to the public. And we did a fantastic job of that, although not ideal. 

I mean, there are so many things that I could point to that we could have done better—all of us in the field of public health and healthcare delivery alike. 

I will tell you that the one thing that stays with me is that if we had those tools then, and we had them in place then, and we had invested in them at that time in advance of, I think there was a real opportunity for us to be able to move ahead and even be better at how we affected the health outcomes of the very populations that we were trying to get to. 

And I think it’s [that] AI allows us to shift from reactive to proactive systems, catching health issues before they escalate and allow us to really communicate with empathy at scale

LEE: Right. 

SHAH: And when we can do those things, whether it’s opioids or whether it’s, you know, something that’s happening related to an infectious disease, or, you know, even this, the new agenda with Make America Healthy Again—which by the way, as you know, we had a Be Well, WABe Well, Washington … 

LEE: Right. Yes. 

SHAH: … very much that was about, you know, looking at, you know, physical health and nutritional health and emotional well-being and social connectedness—that there is a real opportunity for us to address the very drivers of ill health. And when we can do that, and AI can help us accelerate that, I think we truly have the ability to drive down costs and increase the value that’s returned to all of us. 

LEE: What is your assessment of public health agencies’ readiness to use technology like AI? Because if there’s one thing AI is good at, it’s predicting things. Are they [public health agencies] in a better position to predict things now? 

SHAH: You know, I think it’s a tale of two cities. 

I think on the one hand, we’re better because we have the tools. On the other hand, we’ve lost the capacity to be able to utilize those tools. So, you know, it’s a plus and a minus. 

Many, many years ago, there was the buzzword of what we called syndromic surveillance. And, Peter, you know this term well. 

It was like you would have, you know, a whole host of accumulation of data points in, let’s say, a hospital setting or an emergency department … 

LEE: Yup. Yup. 

SHAH: … where, you know, you’d have runny nose, you’d have cough, you’d have a fever, and you would take that, what was happening and people presenting to the emergency department, with what was happening in the area pharmacies where people were going to get Kleenexes and tissues … 

LEE: Yep. 

SHAH: … and buying over-the-counter, you know, medication, and things of that nature, Tylenol, etc. 

And you would say … you would put those two things together, and you would come up with a quote-unquote “syndrome,” and you would say our ability to say there was an alert to that syndrome allows us to say something uh-oh is going on in the community, and we got many, many advancements related to wastewater surveillance over the last several years as you know … 

LEE: Yep. Yep. Well, also, wasn’t patient number one in the United States discovered also because of the Seattle Flu Study, or at least that sort of syndromic surveillance. 

SHAH: That’s right. 

LEE: They weren’t even looking for COVID. They were just taking, you know, snot samples from people. 

SHAH: That’s right. That’s right. That’s right. 

And so that’s the kind of thing that you, you know, we underappreciate. Is you have to have a smart, intelligent, agile practitioner, right. 

So if I think about down in Dallas when Ebola was, you know, the gentleman who was, you know, the index case for Ebola was sent out of the emergency department and came back several days later. 

And it was the nurse who picked up this time because the practitioner, the provider, the healthcare provider, the doc missed it. And I wouldn’t want to say in a negative way. It was just, like, not obvious. You aren’t thinking of Ebola in the middle of Texas. And it was the nurse who picked up: there’s something wrong here

And what AI has the ability to do is to pick up those symptoms … 

LEE: Yeah. 

SHAH: … or those patterns and be able to recognize the importance of those and be able to then alert the practitioner. So what I … we call it artificial intelligence—it almost becomes artificial wisdom. 

LEE: Hmm. Yeah, interesting. So that actually reminds me of my next question, which is another thing that I watched you and public health officials do is try to play “what if” games. 

So, for example, I think one decision you were involved in had to do with, you know, what would be the impact if we put a ban on large gatherings like concerts or movie theaters or imposed an 8 PM curfew on restaurants, and you were trying to play “what if” games. Like, what would be the impact on the spread of the pandemic there? 

So now, again, today with AI, would that aspect of what you did play out differently than it did during the pandemic? 

SHAH: As you know, COVID was the most studied condition on the planet at one point. And it was, you know, things that usually we would learn over years or months, we were learning in weeks or days or hours. 

And I remember in Houston, I would say something in the morning, and I would always try to give the caveat, “This is the best information we know right now,” because it kept changing, whether it was around masks or whether it was around, you know, the way the virus was operating, whether it was around … 

I remember even … I was just watching something recently where I was asked to comment about whether spiders could transmit COVID-19. You know, just questions that were just evolving, evolving, evolving. And the information was evolving. By morning, you would say something. By evening, it would change. 

And why I say that is that it would have been great in the pandemic if we could have said, if you could give us all the information that’s happening across the globe, synthesize that information, and be able to help us forecast the right decisions that we should be making and help us model that information so we could decide: if you did a curfew, or if you did, you know, a mask, or if you could, you know, change something else related to policy—what are the impacts of it? 

LEE: Yeah. 

SHAH: What we found constantly in public health was that we were weighing decisions in incomplete data, incomplete information. 

So great now that everybody can armchair quarterback looking back three, five years ago and say, “I would have done it this way,” or “I would have done it that way.” Gosh, I would have as well. But guess what—we didn’t have that information at that time. And so you had to make the best decisions you could with incomplete data. 

But what AI has the potential to do is to help complete the incomplete data. Now, it’s not going to get 100%. 

LEE: Right. 

SHAH: And I think, Peter, you know, the one thing we’ve got to be really mindful [of] is phantom information, or information where it sort of makes up things, or may somehow get you incomplete information, or skews it a certain way. 

This is why we can’t take the person out of it yet. 

LEE: Right. 

SHAH: Now, maybe one day we can. 

I’m not one of those Pollyanna-ish that people will never be replaced. I actually believe that those people who are skilled with AI and the tools will eventually have a competitive advantage over those who are not. Just like if I had a physician who knows how to use their smartphone or knows how to use a word processor or knows how to do a PowerPoint presentation is going to replace the ones that use scantrons … 

LEE: Yeah. Yeah. 

SHAH: … or the ones that write it on pieces of paper—that eventually it makes it more efficient and effective, but we’re not there yet. But I think that the potential is absolutely there. 

LEE: So I have one more question. And you can, kind of, tell I’m trying to expand people’s understanding of just the incredible breadth of what goes on in public health, you know, all of these sorts of different issues. 

And again, just sticking to COVID, but this is a much broader issue. Another thing you had to cope with were significant rise of misinformation … 

SHAH: Yes. 

LEE: … and maybe going along with that, very, very significant inequities in outcomes in the COVID response. And when you think about AI there, I think you can argue it both ways, that it both exacerbates the problem but also gives you new tools to mitigate the problems. 

What is your view? 

SHAH: I think you … I don’t even have to say it … I think you hit on it, is that, you know, it really is two sides of one coin. 

On the one hand, it has the power of really advancing and allowing us to move forward in a way that incredibly accelerates and accentuates, but on the other hand, in the case of inequities, right? So if you have inequitable information data that’s already out in the literature or already out in the, you know, media, or what have you, about a certain population or people or certain kinds of ideas or thoughts, etc., then AI will tend to accumulate that. You’re going to take that information, thinking that’s the best out there, but it may have missed out on information and now you go with it. And that’s a potential problem. 

And I think it’s the same thing on information is that when we have people that are able to classify or misclassify information, I think it really becomes hard because it can accelerate the inequities of trust or inequities of trusted sources of information. It can also close the gap. 

So I think, you know, it’s really up to us and this responsible AI to really think about how we can go about doing this in a way that’s going to allow us to further the advancements but also be careful of those, you know, those kind of places where we’re going to step into that are not going to be well received or successful. 

You know, the one thing that’s really fascinating about this whole conversation is that this is why we’ve got to be at the table, Peter. 

LEE: Yep. Yep. 

SHAH: Because if we’re not at the table, you know, what’s the, you know, or if tech companies that are out there doing this work and aren’t even seeing a field of practitioners that are actually wrestling with the same problems but just cannot actually get to the solutions, we’re just going to continue to accentuate the problems. 

And that’s why I’m a firm proponent of: we’ve got to be at the table. 

And so even when we’ve seen in, and this is going to be a little controversial, but governmental spaces where, you know, policymakers have said, “Look, we are not going to let you do certain things,” or they say to public health practitioners or even healthcare delivery practitioners in certain spaces, “You cannot even play with this. You cannot have it on your phones. You can’t do any … ” 

You know, what I really believe it does is that it takes [an] almost like we put our head in the sand type of approach rather than saying, “What is it that we can do to help improve AI and make it work for all of us?” What we’re doing is we’re essentially saying, “We’re going to let the tech companies and all the other developers come up with the solutions, but it’s not going to be informed by the people in the field.” And that’s dangerous. We have to do both. We have to be working together. 

LEE: Umair, that’s really so well said, and I think a great way to wrap things up. I’ve certainly learned a lot from this conversation. So thank you again. 

SHAH: It’s been a pleasure to be with you this morning. Thank you so much for the time. And I’m looking forward to further conversations. 

[TRANSITION MUSIC] 

I live in the State of Washington and because of that, I’ve been able to watch Umair in action as our state’s former secretary of health. And some of that action was pretty intense to say the least because his tenure as secretary of health spanned the period of the COVID pandemic. 

Now, as a dyed-in-the-wool techie, I have to admit that at the beginning, I don’t think I really understood the scope and importance of the field of public health. But as the conversation with Umair showed, it’s really important and it is arguably both an underfunded and underappreciated part of our healthcare system. 

Now, public health is also very much an area that’s ripe for advancement and transformation through AI. As Umair explained in our discussion, the core of public health is the idea of population health, the idea of extracting new health insights from signals from population-scale data. And already we’re starting to see AI making a difference. 

Now here’s my interview with Dr. Gianrico Farrugia. 

LEE: Gianrico, it’s really great to have you here today. 

GIANRICO FARRUGIA: Peter, thanks for having me. Thanks for making me part of your podcast. 

LEE: You know, what I’d like to do in these conversations is, you know, we’ll definitely want to talk about the overall healthcare system, the state of healthcare, and what AI could or might do to help or even hurt all of that. But I always like to start with a sharper focus just on you specifically. And my first question always is, you know, I think people imagine what a hospital or a health system president and CEO does, but not really. And so how would you explain to your mother what you do every day? 

FARRUGIA: So, Peter, my mother’s 88 years old. She lives in Malta, and she’s visiting at the moment, … 

LEE: Oh, wow. 

FARRUGIA: … which is kind of nice, really. 

LEE: Wow, that is amazing. 

FARRUGIA: I’m proud that she’s still proud of me. So she does ask. I’ll tell her the scope of Mayo Clinic. We serve patients across the globe. We have about 83,000 staff members that work with us, and we’re very proud of the work we do in research, education, and the practice. 

Mayo Clinic is built to serve people with serious disease. So what I tell my mother is that here we are. We’re a healthcare organization that knows what it needs to do: keep patients as the North Star. The needs of the patient come first. We have 83,000 people who want to do that, several thousand physicians and scientists. My job is to look slightly ahead and then share what I’m seeing and then, sort of, smooth the way for others to make sure Mayo remains true to its mission but also true to the fact that at the moment, we are in a category of one. We need to remain there not just from an ego standpoint, but really from a “do good to the world” standpoint. 

At that point, invariably my mother will tell me that I’m working too hard. [LAUGHTER] And then of course, I change the subject, and I ask her what she cooked today because my mother, who’s 88, cooks for the whole family in Malta, and there are usually four generations eating around the table. So I tell her what she does for the family is what I do for the Mayo family. 

LEE: Wow, that’s a great way to put it. And it sounds like you actually have a good chance to have some good genes if she’s still that active at age 88. 

FARRUGIA: I think I chose a little more stressful job that may limit [that]. I will tell you very briefly is that one of the AI algorithms we have estimates biological age from an electrocardiogram. My biological age jumped by 3.7 years when I became CEO. 

LEE: [LAUGHS] Oh no. 

FARRUGIA: I’m hoping it will reverse on the other side. 

LEE: To stick with you just for one more moment here, second question I ask is about your origin story with respect to AI. And typically, for most people, there is AI before ChatGPT and generative AI and then after the generative AI revolution. So can you share a little bit about this? Because it must be the case that you’ve been thinking about this a long time since you’ve really led Mayo Clinic to be so tech forward in this way. 

FARRUGIA: Well, I’ve been, as you said, a physician for way too long. I got my MD degree in ’87. So that sort of dates me. But it also means that I saw a lot of the promise for AI that never seemed to pan out for decades and decades and decades like you did. 

LEE: Yeah. 

FARRUGIA: Around 10 years ago, Mayo could sense that there was something different, that something was changing, that we actually—at that time, predictive AI—could make a big difference. And I think that’s the moment where I and others jumped in and said Mayo Clinic needs to be involved. 

And then about six years ago, when—six and a half years ago—I became CEO, it was clear that there was the right confluence of data, knowledge, tech expertise, that we could deal with what was increasingly bothering me, which is that we knew what was coming from a technology standpoint and we knew the current healthcare system could not deliver on what patients need and want within that current system. And so the answer is, how could a place like Mayo Clinic with our reputation not jump in and say there has to be a better way of doing things? I’ve always said that it is impossible for me to understand that every single government employee is incompetent. Every physician is greedy. Something’s wrong here. 

LEE: Yeah. 

FARRUGIA: And that wrong was the architecture was wrong. And we knew that we could incorporate AI and make it better. So for me, that journey was one of wait, wait, wait. 10 years ago, begin to jump in. Six years ago, really jump in with our platform. And then, of course, in November 2022, things changed again. 

LEE: Yeah. When did this idea of a data platform, what you now call the Mayo Clinic Platform (opens in new tab)—by the way, I refer to this as MCP, … 

FARRUGIA: Yeah, I know. [LAUGHS] 

LEE: [LAUGHS] … which I always smirk a little bit because, of course, for those of us in computer science research, the AI research, MCP has also become quite a hot topic because of the model context protocol version of this. But for Mayo’s MCP, when did that become a serious, defined initiative? 

FARRUGIA: So around the end of ’18, 2018, beginning of 2019. At that point, we knew that we were going to do something differently. We came up with a strategic plan, as I took on the job, that we needed to cure more patients. There’s just not enough cures in the world. There’s too much suffering. And that we had all these chronic diseases that people have accepted are chronic, but really the only reason that disease is chronic is you haven’t cured it. 

And physicians have been afraid to talk about cure because, of course, eventually everybody passes away. 

LEE: Yeah. 

FARRUGIA: But I really pushed hard to say, no, it’s OK to talk about cure. It’s OK to aspire to cure. The second was connect—connecting people with data to create new knowledge. And that’s where it became clear that data were not currently in a format that were particularly useful. By the way, you’ll hear me talk about data in the singular and the plural. I’m old school. I talk about data as plural, but I know that most younger people now use data singular. [LAUGHTER] And I apologize if I’ll go through that. 

And then the third was transform. Let’s use Mayo’s resources to transform healthcare for ourselves and for others. And that’s the concept of, if we are able to use data in a different way, let’s create a different architecture. And that architecture had to be very closely linked to using artificial intelligence in order to create better outcomes for patients. So patients can live not only longer lives but healthier lives. And that’s the genesis of MCP, Mayo Clinic Platform, so I’ll timestamp that as end of 2018, beginning of 2019. 

LEE: So I’m really wanting to delve in in this episode, in this conversation, you know, [into the] mindset of a health system or hospital CEO. And so you’re obviously thinking about, I guess, machine learning and predictive analytics and so on. What were the, kind of, like … in 2018, what were the outcomes that you were dreaming about from this? So if you had this thing, you know, what were the things that you were hoping to be able to show or, kind of, produce as results? 

FARRUGIA: So first of all, I think all of us who work at Mayo Clinic, and this tends to be a bit sugary, but it’s true, strongly feel that we have a responsibility to leave the place better than when we started. And so the Mayo brothers, when they started, did two really important things. The first was that they created the first integrated healthcare system. And the second, they created the first unified record. And that record was, of course, paper at that point. 

Part of that is to say, OK, what does it look like now versus how can we improve what we have if … it’d be blasphemy to say, let’s think of ourselves as the Mayo brothers, but let’s think of ourselves as reasonably smart people at Mayo Clinic, really lucky to be surrounded by very smart people with resources. What will we do? And so we said let’s not aim for the low-hanging fruit. Let’s aim to get at whatever you want to call it, the intractable knot, the hardest problem, and that is clinical care. Let’s improve clinical care. Yes, we can deal with burnout. Yes, we can deal with administrative burden. But let’s not focus on that. Let’s really create an architecture that allows us to tackle better clinical outcomes. 

And by starting there, then everything flows from that. That it’s not really worth doing unless at the end of the day, people are experiencing better health. 

LEE: And so I know a very good colleague and friend of mine, John Halamka (opens in new tab), you ended up hiring. I thought he was a very interesting choice because he is, of course, in terms of technology, quite deep and very expert, but he’s, I think, first and foremost, a doctor. And so I assume you must have had to decide what type of person you would bring in and what kinds of people you would bring in to try to create such a thing. What was your thinking around the choice of someone like John? 

FARRUGIA: It was one of the harder decisions. First of all, [I’m] a physician myself. We tend to want to maintain some control. And so now I am the CEO, [LAUGHTER] and I have to give this baby to somebody else. That’s very hard. Second is Mayo Clinic is really good because it is flat, and we run a lot by committee. But it also means that, therefore, you have to work really hard at change, and you cannot change by fiat. You have to change by convincing people. 

So I just … I’ve always made the point that the right change agent is a servant leader because that’s how change becomes embedded. But it also means you’ve got to have that personality, the Mayo personality. And it became clear when we interviewed [that] there were some people that were really hardcore tech; others that were passionate about social issues. But John really fit that of being, as you said, deep in IT but also himself very aligned with the Mayo Clinic values. It’s as if he was a Mayo Clinic physician even though he wasn’t. 

And that came together, and I felt, we felt, that as we were hiring, that we could do it. And then we did something interesting. We paired John with a … we created the role of a chief medical officer for the platform, which was a longstanding Mayo Clinic physician. And so we brought them together so we could get the past and the present and the future working together. 

LEE: So I’m going to ask you about what has come out of this. But before that, let’s get back to this origin story. So now, all of that is being set up starting around 2018. But then, you know, in 2022, there is generative AI. Now you were already experimenting with transformers, starting with BERT out of Google there. So maybe that’s a couple of years earlier. But still, there has to come a point where things are feeling very disrupted. 

FARRUGIA: Yeah, so, you know, it really wasn’t. It, to me, was a relief because it gave this … we were feeling pretty good about what we’re doing. We were feeling a little impatient, but, in true Mayo fashion, were willing to, sort of, do everything, take its time, take it to the right committees, get the right approvals, and get it done. 

And so when generative AI came, for us, it’s like, I wouldn’t say we told you so, but it’s like, ah, there you go. Here’s another tool. This is what we’ve been talking about. Now we can do it even better. Now we can move even faster. Now we can do more for our patients. It truly never was disruptive. It truly immediately became enabling, which is strange, right, … 

LEE: Yeah. 

FARRUGIA: … because something as disruptive as that instantly became enabling at Mayo Clinic. And I’ll take … as I think about it with you and take a moment to think and reflect on it, I think there were a couple of decisions we made earlier on that really helped us. We made the decision against the advice of any consulting firm to completely decentralize AI at Mayo Clinic six years ago. And we told our clinical department, you need to own this. You need to hire basic scientists in AI. We’ll help you by creating the infrastructure. We’ll help you by doing all the rest. We’ll have the compute. We’ll have the partners. You need to do this on your own. You need to treat this the same way as if a new radiological technique happened or a new surgical technique happened. 

And so there was a lot of expertise already present in a very diffused way that then we were able to layer on generative AI onto that. And we found a very willingness to embrace it. In fact, I would argue initially a bit too willing because as you know, we haven’t quite figured out what’s legitimate use, what’s not use. We all learned together.

LEE: Right. Yep. Yep. 

FARRUGIA: But it was mostly energy, which is really interesting. It was mostly energy.

LEE: Wow. And, you know, it’s an amazing thing to hear because one common theme that we hear is that the initial reaction is oftentimes one of skepticism. In fact, I’ve been very open that even I initially had some skepticism. Was that not present in your mind or on your team’s mind at all at the beginning? 

FARRUGIA: So you’re asking a physician if they are skeptical about something. [LAUGHTER] Yeah. I wonder what the answer to that is. Absolutely. The first hallucination, the first wrong reference. Can you imagine if you write the grant and the wrong reference comes. As you know, … 

LEE: Right. 

FARRUGIA: … earlier on when some references were being made up. So massive amounts of skepticism. But the energy was such there that the people [who] were skeptical were also at the same time saying, “Let’s do a RAG [retrieval augmented generation] to clean up those references. Let’s create …” We were experimenting with discharge summaries, but let’s use AI to police AI, and let’s see what’s going on. So there was more massive skepticism, but the energy was pushing that skepticism into a positive versus into a negative frame. Now, I say that summarizing in hindsight. 

LEE: Yeah. 

FARRUGIA: Day to day, much more complicated than that. But overall, if you just … and remember, I had been at the World Economic Forum many years ago and had said, healthcare needs to run towards AI. 

LEE: Yes. 

FARRUGIA: If healthcare was perfect, we would wait. Healthcare is not perfect by any means, therefore let’s run and embrace AI. And, sort of, that mentality was part of who we were because at the same time, we were also saying the other thing, that we need to be the ones to lead validation. We need to be the ones that set the rules. We need to be participating in the creation of CHAI [Coalition for Health AI] (opens in new tab). We need to be participating as the [National] Academy of Medicine (opens in new tab)

So people did feel that Mayo was being fairly responsible about it, but that urge to, the needs of the patient come first, was the driver that kept people wanting to say, “Not ready yet, but let’s make it ready.” And we now have 320 algorithms in the practice, and they run and we constantly are looking and seeing what else we can do to improve. But as you well know, things evolve and change. And we’re also looking and seeing which ones work and which ones don’t and which ones we have to work together on to make better. 

LEE: Yeah, you know, of course Mayo has such a, you know, such a reputation and is so influential, but in the world of healthcare broadly, let’s just focus on the United States to start. How common is this experience? You know, so if you are at a meeting with fellow CEOs of hospitals and health systems, what is the attitude and what is the, kind of … how common is the approach to all of this? 

FARRUGIA: I think it’s more common now, but going back a few years, I think it’s fair to say that it was scary for people to know how it’s going to change things. Healthcare runs on very narrow margins. It’s very expensive. So your expenses and your revenue are both massive, and they are very close to each other. So anything that changes that balance is really scary. 

Because it’s not like you have the opportunity to erode into a margin or get it right the second time. So I think that is what drove a lot of the initial hesitancy. Was, one, is lack of knowledge and, two, understanding that you didn’t have a lot of room to make a mistake. 

LEE: On the economics of this, when you are embarking on what I suspect is a very expensive initiative like Mayo Clinic Platform, how on earth do you justify that early on? 

FARRUGIA: So again, I’m trying hard to try and remember how things were versus how I think about them now. [LAUGHTER] It goes back to our history. Mayo has always invested in what it thinks is the right thing that is coming. And that’s how we’ve stayed where we are. So the investment really was having an open discussion: is this worth it for our patients? And once that discussion was over, then the board was saying, go, go, go

Now we are lucky in that we have the size that we’re able to hire and absorb. We’re lucky in that the people [who] came before us have been financially astute, and one of our values is stewardship. And we’re lucky that we had a lot of patients at Mayo Clinic who were able to listen, be inspired by, and be willing to help support. And so that gave us the ability to build what we’re doing not only into the long-range plan but actually into the yearly plan. And so we built it into the yearly plan. We set up a center for digital health. We set up the platform. And then we set up the budgets to be able to do that. And the budgets came from assets we’ve had, assets that we would get as the year came by, and then from philanthropy. 

We also had a really powerful calling card. And that’s one advantage I had, and that’s … and I’d been very open when I was speaking to other CEOs that would use it is that right at that very beginning, really, really in 2019, our cardiologists, both the researchers and the clinicians, had come together and had used electrocardiograms to create an AI algorithm. 

The first one was for diagnosing from an electrocardiogram, which is very cheap, very easy to do, left ventricular dysfunction. That’s how hard the left part of the heart contracts. If it doesn’t do well, you get heart failure. And they were able to show that that algorithm was already making nurses better than the physician without the algorithm. And after that went on to show that you could do it from a single strip, really with an area under the curve for that single strip on a watch, that was as good as mammograms or pap smears. And so we already had that proof. 

LEE: Yeah. 

FARRUGIA: That quickly then came into Mayo. We put it into it so that any patient now can benefit from it. And now there are, I think, 14 algorithms just from that same one. 

LEE: Yeah. 

FARRUGIA: So we had a proof of concept thanks to those really far-seeing cardiologists that enabled things to happen a little faster and also, as I talked to other CEOs, enabled me to say, “This actually works. This is the path forward.” I have recently been vocal about also saying, we are at a point now where I believe that for some medical conditions, it is not right to not use AI to help treat them. 

LEE: Wow, that’s so interesting. So I think I want to get into another topic here, which is when you think about the use of AI and data, what are some of the results that maybe are top of mind for you or you think are particularly important? And if you don’t mind, I’d like to see if we can think about this not only in terms of results in terms of patient outcomes but in your other activities, core activities, like research, in the education mission, and then even in the broader impacts on the healthcare system. But maybe we start with on patient outcomes. 

FARRUGIA: Yeah, they’re all linked, right. 

LEE: Yes. 

FARRUGIA: They’re part of the same ecosystem. We think of ourselves as three shields— research, education, and the practice—and that one goes into the other. So, as I said, we have about 320 AI algorithms from the practice. Some run on every patient; some run on some patients. And we have good evidence for what they do. So some specific examples, and then I’ll get into the transformer part of this. 

We have a program called CEDAR [Clinical Detection and Response (tool)], and like most other people, I like acronyms for things. [LAUGHTER] But what it is, in our hospitals with patient consent, we monitor vitals. We monitor in the patient room—not in the ICU [intensive care unit], in the patient room. We monitor all sorts of things. But there’s a camera in the room, and we have a team of intensivists—nurses and physicians—who do not have any patient responsibilities but are just monitoring the algorithms, and when the algorithms are predicting decompensation, they’re able to get into the room. And what we’ve shown, for example, with that algorithm, is we’ve shown we’ve decreased length of stay in the hospital, decreased transfers into the intensive care units, and interestingly, decreased mortality and morbidity, which is not easy to show. I talked about the electrocardiogram as a good example. Of course, everybody knows about the radiology things. 

We’ve created … taken part of this and said, if we can do this in the hospital, why cannot we do it in patients’ homes? So being very active in looking after patients that would come to the ED, emergency room, would normally be admitted, and we say, no, here are the things we can give you. Go home if you want to, and we will safely look after you at your home. And we recently have been, looking at the last two years of data, been able to show that we’re also successfully able to give intravenous chemotherapy in patients’ homes because we can monitor; we can do all the things that we can do. 

Now, with generative AI, that gave us many other opportunities. One biggest opportunity for me has always been digital pathology. When we see how pathology’s currently run with a glass slide, not much has changed … 

LEE: Yeah. 

FARRUGIA: … in many, many, many years, right. 

LEE: Yeah. 

FARRUGIA: And so really we have made a massive push to digitize pathology not just for us but for others. But talking about ourselves, we started by saying, it has to be very cheap to digitize. So we worked and created a company with partners called Pramana (opens in new tab) that allows us to digitize slides relatively cheaply using AI algorithms that can take away the dirt, the fingerprint. And so we end up with 21 million of our slides digitized, and that gives you now a massive opportunity. Worked with another company called Aignostics (opens in new tab) to create a, what we call, Atlas (opens in new tab), which is an LLM that allows us to then build upon it. 

LEE: Yeah. 

FARRUGIA: And we, a hundred and, I think, 120 years ago, invented frozen sections at Mayo Clinic. So what that is, is that while the patient’s still on the table, you can take a piece of tissue, look at it, and tell the surgeon the margins of what you’re trying to resect are clear or not. But as a result of that, because you have to hurry, you get no information as a surgeon about, is it an invasive cancer, is it noninvasive cancer, or other things. So we’ve just found a way to digitize our frozen section practice and will completely go across the enterprise with AI-enabled digitized frozen sections, which then enables us to then do it for anybody across the globe if we need to. 

And then in the genomic space, we’re working to create a true exomic transformer that is short range. And we originally started doing it to see if we can test it against the fact that 40% of people with rheumatoid arthritis don’t respond to the first-line therapy, … 

LEE: Right. 

FARRUGIA: … but you have to wait six months to find out. And we found that we can actually do that. But it has much greater uses, of course. 

And then we’re working with you—I don’t know how much you want to get into this, Peter, or [if] you want to talk about it yourself—MAIRA-2, which is really exciting, about how taking a simple problem—can you create a transformer that is able to detect if lines on the chest are in the right place, breathing tube is in the right place?—and then do it in a way that then can be used for many, many other things. 

And then, Peter, because you asked about education and research, … 

LEE: Yeah. 

FARRUGIA: … imagine what this does now to the education system, right. And so we’ve got to train our physicians differently. We now have an AI curriculum for all our medical students. We offer masters and PhDs in AI. We think it’s essential for the people who want to be able to truly become experts, the same way I became an expert in my area of research. 

And then from a research standpoint, when you think about all the registries that exist in people’s labs, all the spatial genomics, all the epigenomics, all the omics that exist. And if you are able to coalesce them into one big, what we call, an atlas, how that could really spur research at a scale that we haven’t thought of before. And so that is our aim at the moment. 

From a research standpoint, we are, with Vijay Shah, who’s our dean of research, is to say, let’s make the effort of making sure all the data are available to be able to use and enable for us to take advantage of AI. And that is not easy because, of course, people have collected the data. They tend to want to embrace it. 

LEE: Yep. 

FARRUGIA: So there have to be the right incentives, the right privacy, and the right ways of doing it. And we think we’re on the way there, and we’re already seeing some advantages from doing it this way. 

LEE: So we’re running short on time. And so I always like to end with one or two more provocative questions. And, you know, it’s tempting to ask you the provocative question of whether you think AI will ever replace human doctors, but I don’t want to go there with you. In fact, as I thought about our discussion, I was reflecting. We were at a conference together once, and I was on stage in a fireside chat. And then, you know, after the fireside chat, there were audience questions, and I don’t remember any of the questions from the audience except yours

And just to remind you, you know, I think when I was on stage, we were talking about a lot of practical uses of AI to, let’s say, reduce administrative burdens and so on in healthcare. But you got up and you, I won’t say you scolded me, but you more or less said, is it the right idea to use AI to optimize today’s somewhat broken healthcare system, or should we be thinking more boldly about, you know, a more fundamental transformation? 

And so what I thought I would try to close with here is to hear what was really behind that question. You know, what were you trying to get me to think about when you asked that question? 

FARRUGIA: So first of all, darn your great memory [LAUGHTER]. Belated apologies … I probably should have … 

LEE: It was by far the best and most sophisticated and, I think, thought-provoking question of all of the ones that came out of the audience. 

FARRUGIA: What I was trying to get to is actually trying to clarify it in my own head and then in the head of others is that we do not need to have a linear path to get to where we want to get to. And we seemed to be on a linear path, which is, let’s try and reduce administrative burden. Let’s try and truly be a companion to a physician or other provider. Let’s make their problems better, make them feel better about providing healthcare. And then in the next step, we keep going until we get to, now we can call it agentic AI, whatever we want to talk about. And my view was, no, is that let’s start with that aim, the last aim, and do the others because the others will come automatically if you’re working on that harder problem. 

Because one, to get to that harder problem, you’ll find all the other solutions. I was just trying to push that here’s this wonderful tool that’s been given to us. Let’s take advantage of it as quickly as we can. I think we had gotten a little too sensitized to need to say the right things. “Careful, be very careful” versus saying, “Massive opportunity. Do it right, and healthcare will be much better. Go for it.” 

LEE: Well, I think I understand better now where the vision, insight, and frankly, courage to take on something as ambitious and transformational as the Mayo Clinic Platform and really all of your leadership in your tenure as the president and CEO of Mayo Clinic. I think I understand it much better now. 

Gianrico, it’s just always such a privilege to interact with you and now to have a chance to work with you more closely. So thank you for everything that you do and thank you for joining us today. 

FARRUGIA: Thank you for making it so easy, and thanks for giving us this opportunity to do good for the world. 

[TRANSITION MUSIC] 

LEE: Gianrico leads what is arguably the crown jewel of the world’s healthcare systems, and so I feel it’s such a privilege to be able to talk and sometimes even brainstorm with him. 

Our conversation, I think, exposed just how tech forward Gianrico is as he charts the strategies for healthcare delivery well into the future. And as I’ve interacted with many others, what I’ve learned is that this is a common trait among major health system CEOs. Roughly speaking, like we’ve seen in previous episodes where doctors and med students are polymath clinician-technologists, the same thing is true of health system CEOs and other leaders. 

AI in the mind of a health system CEO today is not only a technology that can transform diagnosis and treatment, but it’s also something that can have a huge impact on the business of healthcare delivery, the connection of healthcare to medical research, and the journeys that patients go through as they seek better health. 

These two conversations show that virtually all leaders in health and medicine are confronting head-on the opportunities, challenges, and the reality of AI, and they see a future that is potentially very different than what we have today. 

[THEME MUSIC] 

I’d like to thank Umair and Gianrico again for their time and insights. And to our listeners, thank you for joining us. We hope you’ll tune in to our final episode of the series. My coauthors, Carey and Zak, will be back to examine the takeaways from our most recent conversations. 

Until next time. 

[MUSIC FADES] 

The post Reimagining healthcare delivery and public health with AI appeared first on Microsoft Research.

]]>
Navigating medical education in the era of generative AI http://approjects.co.za/?big=en-us/research/podcast/navigating-medical-education-in-the-era-of-generative-ai/ Thu, 24 Jul 2025 20:06:32 +0000 http://approjects.co.za/?big=en-us/research/?p=1145558 Next-generation physicians Morgan Cheatham and Daniel Chen discuss how generative AI is transforming medical education, exploring how students and attending physicians integrate new tools while navigating questions on trust, training, and responsibility.

The post Navigating medical education in the era of generative AI appeared first on Microsoft Research.

]]>
AI Revolution | Illustrated headshots of Daniel Chen, Peter Lee, and Dr. Morgan Cheatham

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.   

In this episode, Dr. Morgan Cheatham (opens in new tab) and Daniel Chen (opens in new tab), two rising physicians and experts in both medicine and technology, join Lee to explore how generative AI is reshaping medical education. Cheatham, a partner and head of healthcare and life sciences at Breyer Capital and a resident physician at Boston Children’s Hospital, discusses how AI is changing how clinicians acquire and apply medical knowledge at the point of care, emphasizing the need for training and curriculum changes to help ensure AI is used responsibly and that clinicians are equipped to maximize its potential. Chen, a medical student at the Kaiser Permanente Bernard J. Tyson School of Medicine, shares how he and his peers use AI tools as study aids, clinical tutors, and second opinions and reflects on the risks of overreliance and the importance of preserving critical thinking.


Learn more:

Perspectives on the Current and Future State of Artificial Intelligence in Medical Genetics (opens in new tab) (Cheatham)
Publication | May 2025

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models (opens in new tab) (Cheatham) 
Publication | February 2023 

The AI Revolution in Medicine: GPT-4 and Beyond 
Book | Peter Lee, Carey Goldberg, Isaac Kohane | April 2023 

Transcript

[MUSIC]     

[BOOK PASSAGE] 

PETER LEE: “Medicine often uses the training approach when trying to assess multipurpose talent. To ensure students can safely and effectively take care of patients, we have them jump through quite a few hoops, … [and] they need good evaluations once they reach the clinic, passing grades on more exams like the USMLE [United States Medical Licensing Examination]. … [But] GPT-4 gets more than 90 percent of questions on licensing exams correct. … Does that provide any level of comfort in using GPT-4 in medicine?” 

[END OF BOOK PASSAGE]     

[THEME MUSIC]     

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.     

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?      

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES]  

The book passage I read at the top is from Chapter 4, “Trust but Verify.” In it, we explore how AI systems like GPT-4 should be evaluated for performance, safety, and reliability and compare this to how humans are both trained and assessed for readiness to deliver healthcare. 

In previous conversations with guests, we’ve spoken a lot about AI in the clinic as well as in labs and companies developing AI-driven tools. We’ve also talked about AI in the hands of patients and consumers. But there has been some discussion also about AI’s role in medical training. And, as a founding board member of a new medical school at Kaiser Permanente, I definitely have my own thoughts about this. But today, I’m excited to welcome two guests who represent the next generation of medical professionals for their insights, Morgan Cheatham and Daniel Chen. 

Morgan Cheatham is a graduate of Brown University’s Warren Alpert Medical School with clinical training in genetics at Harvard and is a clinical fellow at Boston Children’s Hospital. While Morgan is a bona fide doctor in training, he’s also amazingly an influential health technology strategist. He was recently named partner and head of healthcare and life sciences at Breyer Capital and has led investments in several healthcare AI companies that have eclipsed multibillion-dollar valuations.  

Daniel Chen is finishing his second year as a medical student at the Kaiser Permanente Bernard J. Tyson School of Medicine. He holds a neuroscience degree from the University of Washington and was a research assistant in the Raskind Lab at the UW School of Medicine, working with imaging and genetic data analyses for biomedical research. Prior to med school, Daniel pursued experiences that cultivated his interest in the application of AI in medical practice and education.  

Daniel and Morgan exemplify the real-world future of healthcare, a student entering his third year of medical school and a fresh medical-school graduate who is starting a residency while at the same time continuing his work on investing in healthcare startups. 

[TRANSITION MUSIC] 

Here is my interview with Morgan Cheatham: 

LEE: Morgan, thanks for joining. Really, really looking forward to this chat. 

MORGAN CHEATHAM: Peter, it’s a privilege to be here with you. Thank you. 

LEE: So are there any other human beings who are partners at big-league venture firms, residents at, you know, a Harvard-affiliated medical center, author, editor for a leading medical journal? I mean, who are your … who’s your cohort? Who are your peers? 

CHEATHAM: I love this question. There are so many people who I consider peers that I look up to who have paved this path. And I think what is distinct about each of them is they have this physician-plus orientation. They are multilingual in terms of knowing the language of medicine but having learned other disciplines. And we share a common friend, Dr. Zak Kohane, who was among the first to really show how you can meld two worlds as a physician and make significant contributions to the intersections thereof.  

I also deeply, in the world of business, respect physicians like Dr. Krishna Yeshwant at Google Ventures, who simultaneously pursued residency training and built what is now, you know, a large and enduring venture firm.  

So there are plenty of people out there who’ve carved their own path and become these multilingual beings, and I aspire to be one. 

LEE: So, you know, one thing I’ve been trying to explore with people are their origins with respect to the technology of AI. And there’s two eras for that. There’s AI before ChatGPT and before, you know, generative AI really became a big thing, and then afterwards.  

So let’s start first before ChatGPT. You know, what was your contact? What was, you know, your knowledge of AI and machine learning? 

CHEATHAM: Sure, so my experiences in computer science date back to high school. I went to Thomas Jefferson, which is a high school in Alexandria, Virginia, that prides itself on requiring students to take computer science in their first year of high school as kind of a required torturous experience. [LAUGHTER] And I remember that fondly. Our final project was Brick Breaker. It was actually, I joke, all hard coded. So there was nothing intelligent about the Brick Breaker that we built. But that was my first exposure.  

I was a classics nerd, and I was really interested in biology and chemistry as a pre-medical student. So I really wouldn’t intersect with this field again until I was shadowing at Inova Hospital, which was a local hospital near me. And it was interesting because, at the time—I was shadowing in the anesthesia department—they were actually implementing their first instance of Epic. 

LEE: Mmm. Wow. 

CHEATHAM: And I remember that experience fondly because the entire hospital was going from analog—they were going from paper-based charts—to this new digital system. And I didn’t quite know in that moment what it would mean for the field or for my career, but I knew it was a big deal because a lot of people had a lot of emotion around what was going on, and it was in that experience that I kind of decided to attach myself to the intersection of computation and medicine. So when I got to undergrad, I was a pre-medical student. I was very passionate about studying the sacred physician-patient relationship and everything that had to go on in that exam room to provide excellent care.  

But there were a few formative experiences: one, working at a physician-founded startup that was using at the time we called it big data, if you remember, … 

LEE: Yup. 

CHEATHAM: … to match the right patient to the right provider at the right time. And it was in that moment that I realized that as a physician, I could utilize technology to scale that sacred one-to-one patient-provider interaction in nonlinear ways. So that was, kind of, the first experience where I saw deployed systems that were using patient data and clinical information in an intelligent format. 

LEE: Yeah. And so you’re a pre-medical student, but you have this computer science understanding. You have an intuition, I guess is the right way to say it, that the clinical data becoming digital is going to be important. So then what happens from there to your path to medical school? 

CHEATHAM: Yeah, so I had a few formative research experiences in my undergraduate years. You know, nothing that ever amounted to a significant publication, but I was toying around with SVMs [support vector machines] for sepsis and working with the MIMIC [Medical Information Mart for Intensive Care] database early days and really just trying to understand what it meant that medical data was becoming digitized.  

And at the same time, again, I was rather unsatisfied doing that purely in an academic context. And I so early craved seeing how this would roll out in the wild, roll out in a clinical setting that I would soon occupy. And that was really what drove me to work at this company called Kyruus [Health] and understand how these systems, you know, scaled. Obviously, that’s something with AI that we’re now grappling with in a real way because it looks much different.  

LEE: Right. Yep. 

CHEATHAM: So the other experience I had, which is less relevant to AI, but I did do a summer in banking. And I mention this because what I learned in the experience was … it was a master class in business. And I learned that there was another scaling factor that I should appreciate as we think about medicine, and that was capital and business formation. And that was also something that could scale nonlinearly.  

So when you married that with technology, it was, kind of, a natural segue for me before going to med school to think about venture capital and partnering with founders who were going to be building these technologies for the long term. And so that’s how I landed on the venture side. 

LEE: And then how long of a break before you started your medical studies? 

CHEATHAM: It was about four years. Originally, it was going to be a two-year deferral, and the pandemic happened. Our space became quite active in terms of new companies and investment. So it was about four years before I went back. 

LEE: I see. And so you’re in medical school. ChatGPT happened while you were in medical school, is that right? 

CHEATHAM: That’s right. That’s right. Right before I was studying for Step 1. So the funny story, Peter, that I like to share with folks is … 

LEE: Yeah. 

CHEATHAM: … I was just embarking on designing my Step 1 study plan with my mentor. And I went to NeurIPS [Conference] for the first time. And that was in 2022, when, of course, ChatGPT was released.  

And for the remainder of that fall period, you know, I should have been studying for these shelf exams and, you know, getting ready …  

LEE: Yeah. 

CHEATHAM: … for this large board exam. And I was fortunate to partner, actually, with one of our portfolio company CEOs who is a physician—he is an MD/PhD—to work on the first paper that showed that ChatGPT could pass the US Medical Licensing Exam (opens in new tab).  

LEE: Yes. 

CHEATHAM: And that was a riveting experience for a number of reasons. I joke with folks that it was both the best paper I was ever, you know, a part of and proud to be a coauthor of, but also the worst for a lot of reasons that we could talk about.  

It was the best in terms of canonical metrics like citations, but the worst in terms of, wow, did we spend six months as a field thinking this was the right benchmark … [LAUGHTER] 

LEE: Right. 

CHEATHAM: … for how to assess the performance of these models. And I’m so encouraged … 

LEE: You shouldn’t feel bad that way because, you know, at that time, I was secretly, you know, assessing what we now know of as GPT-4 in that period. And what was the first thing I tried to do? Step 1 medical exam.  

By the way, just for our listeners who don’t understand about medical education—in the US, there’s a three-part exam that extends over a couple of years of medical school. Step 1, Step 2, Step 3. And Step 1 and Step 2 in particular are multiple-choice exams. 

And they are very high stakes when you’re in medical school. And you really have to have a command of quite a lot of clinical knowledge to pass these. And it’s funny to hear you say what you were just sharing there because it was also the first impulse I had with GPT-4. And in retrospect, I feel silly about that. 

CHEATHAM: I think many of us do, but I’ve been encouraged over the last two years, to your point, that we really have moved our discourse beyond these exams to thinking about more robust systems for the evaluation of performance, which becomes even more interesting as you and I have spoken about these multi-agent frameworks that we are now, you know, compelled to explore further. 

LEE: Yeah. Well, and even though I know you’re a little sheepish about it now, I think in the show notes, we’ll link to that paper because it really was one of the seminal moments when we think about AI, AI in medicine.

And so you’re seeing this new technology, and it’s happening at a moment when you yourself have to confront taking the Step 1 exam. So how did that feel? 

CHEATHAM: It was humbling. It was shocking. What I had worked two years for, grueling over textbooks and, you know, flashcards and all of the things we do as medical students, to see a system emerge out of thin air that was arguably going to perform far better than I ever would, no matter how much …  

LEE: Yeah. 

CHEATHAM: … I was going to study for that exam, it set me back. It forced me to interrogate what my role in medicine would be. And it dramatically changed the specialties that I considered for myself long term.  

And I hope we talk about, you know, how I stumbled upon genetics and why I’m so excited about that field and its evolution in this computational landscape. I had to do a lot of soul searching to relinquish what I thought it meant to be a physician and how I would adapt in this new environment. 

LEE: You know, one of the things that we wrote in our book, I think it was in a chapter that I contributed, I was imagining that students studying for Step 1 would be able to do it more actively.  

Or you could even do sort of a pseudo-Step 3 exam by having a conversation. You provide the presentation of a patient and then have an encounter, you know, where the ChatGPT is the patient, and then you pretend to be the doctor. And then in the example that we published, then you say, “End of encounter.” And then you ask ChatGPT for an assessment of things.  

So, you know, maybe it all came too late for Step 1 for you because you were already very focused and, you know, had your own kind of study framework. But did you have an influence or use of this kind of technology for Step 2 and Step 3? 

CHEATHAM: So even for Step 1, I would say, it [ChatGPT], you know, dropped in November. I took it [Step 1 exam] in the spring, so I was able to use it to study. But the lesson I learned in that moment, Peter, was really about the importance of trust with AI and clinicians or clinicians in training, because we all have the same resources that we use for these board exams, right. UWorld is this question bank. It’s been around forever. If you’re not using UWorld, like, good luck. And so why would you deviate off of a well-trodden path to study for this really important exam?  

And so I kind of adjunctively used GPT alongside UWorld to come up with more personalized explanations for concepts that I wasn’t understanding, and I found that it was pretty good and it was certainly helpful for me.  

Fortunately, I was, you know, able to pass, but I was very intentional about dogfooding AI when I was a medical student, and part of that was because I had been a venture capitalist, and I’d made investments in companies whose products I could actually use.  

And so, you know, Abridge is a company in the scribing space that you and I have talked about.  

LEE: Yeah. 

CHEATHAM: I was so fortunate in the early days of their product to not just be a user but to get to bring their product across the hospital. I could bring the product to the emergency department one week, to neurology another week, to the PICU [pediatric intensive care unit], you know, the next week, and assess the relative performance of, you know, how it handled really complex genetics cases … 

LEE: Yeah. 

CHEATHAM: … versus these very challenging social situations that you often find yourself navigating in primary care. So not only was I emotional about this technology, but I was a voracious adopter in the moment. 

LEE: Yeah, right. And you had a financial interest then on top of that, right? 

CHEATHAM: I was not paid by Abridge to use the product, but, you know, I joke that the team was probably sick of me. [LAUGHTER] 

LEE: No, no, but you were working for a venture firm that was invested in these, right? So all of these things are wrapped up together. You know, you’re having to get licensed as a doctor while doing all of this.  

So I want to get into that investment and new venture stuff there, but let’s stick just for a few more minutes on medical education. So I mentioned, you know, what we wrote in the book, and I remember writing the example, you know, of an encounter. Is that at all realistic? Is anything like that … that was pure speculation on our part. What’s really happening?  

And then after we talk about what’s really happening, what do you think should happen in medical education given the reality of generative AI? 

CHEATHAM: I’ve been pleasantly surprised talking with my colleagues about AI in clinical settings, how curious people are and how curious they’ve been over the last two years. I think, oftentimes, we say, oh, you know, this technology really is stratified by age and the younger clinicians are using it more and the older physicians are ignoring it. And, you know, maybe that’s true in some regards, but I’ve seen, you know, many, you know, senior attendings pulling up Perplexity, GPT, more recently OpenEvidence (opens in new tab), which has been a really essential tool for me personally at the point of care, to come up with the best decisions for our patients.  

The general skepticism arises when people reflect on their own experience in training and they think, “Well, I had to learn how to do it this way.”  

LEE: Yeah. 

CHEATHAM: “And therefore, you using an AI scribe to document this encounter doesn’t feel right to me because I didn’t get to do that.” And I did face some of those critiques or criticisms, where you need to learn how to do it the old-school way first and then you can use an AI scribe.  

And I haven’t yet seen—maybe even taking a step back—I haven’t seen a lot of integration of AI into the core medical curriculum, period.  

LEE: Yeah. 

CHEATHAM: And, as you know, if you want to add something to medical school curriculum, you can get in a long line of people who also want to do that. 

LEE: Yes. Yeah.

CHEATHAM: But it is urgent that our medical schools do create formalized required trainings for this technology because people are already using it.  

LEE: Yes. 

CHEATHAM: I think what we will need to determine is how much of the old way do people need to learn in order to earn the right to use AI at the point of care and how much of that old understanding, that prior experience, is essential to be able to assess the performance of these tools and whether or not they are having the intended outcome.  

I kind of joke it’s like learning cursive, right? 

LEE: Yes. 

CHEATHAM: I’m old enough to have had to learn cursive. I don’t think people really have to learn it these days. When do I use it? Well, when I’m signing something. I don’t even really sign checks anymore, but … 

LEE: Well … the example I’ve used, which you’ve heard, is, I’m sure you were still taught the technique of manual palpation, even though … 

CHEATHAM: Of course. 

LEE: … you have access to technologies like ultrasound. And in fact, you would use ultrasound in many cases.  

And so I need to pin you down. What is your opinion on these things? Do you need to be trained in the old ways? 

CHEATHAM: When it comes to understanding the architecture of the medical note, I believe it is important for clinicians in training to know how that information is generated, how it’s characterized, and how to go from a very broad-reaching conversation to a distilled clinical document that serves many functions. 

Does that mean that you should be forced to practice without an AI scribe for the entirety of your medical education? No. And I think that as you are learning the architecture of that document, you should be handed an AI scribe and you should be empowered to have visits with patients both in an analog setting—where you are transcribing and generating that note—and soon thereafter, I’m talking in a matter of weeks, working with an AI scribe. That’s my personal belief.  

LEE: Yeah, yeah. So you’re going to … well, first off, congratulations on landing a residency at Boston Children’s [Hospital].  

CHEATHAM: Thank you, Peter. 

LEE: I understand there were only two people selected for this and super competitive. You know, with that perspective, you know, how do you see your future in medicine, just given everything that’s happening with AI right now?  

And are there things that you would urge, let’s say, the dean of the Brown Medical School to consider or to change? Or maybe not the dean of Brown but the head of the LCME [Liaison Committee on Medical Education], the accrediting body for US medical schools. What in your mind needs to change? 

CHEATHAM: Sure. I’ll answer your first question first and then talk about the future.  

LEE: Yeah. 

CHEATHAM: For me personally, I fell into the field of genomics. And so my training program will cover both pediatrics as well as clinical genetics and genomics.  

And I alluded to this earlier, but one of the reasons I’m so excited to join the field is because I really feel like the field of genetics not only is focused on a very underserved patient population, but not in how we typically think of underserved. I’m talking about underserved as in patients who don’t always have answers. Patients for whom the current guidelines don’t offer information or comfort or support. 

Those are patients that are extremely underserved. And I think in this moment of AI, there’s a unique opportunity to utilize the computational systems that we now have access to, to provide these answers more precisely, more quickly.  

And so I’m excited to marry those two fields. And genetics has long been a field that has adopted technology. We just think about the foundational technology of genomic sequencing and variant interpretation. And so it’s a kind of natural evolution of the field, I believe, to integrate AI and specifically generative AI. 

If I were speaking directly to the LCME, I mean, I would just have to encourage the organization, as well as medical societies who partner with attending physicians across specialties, to lean in here.

When I think about prior revolutions in technology and medicine, physicians were not always at the helm. We have a unique opportunity now, and you talk about companies like Abridge in the AI space, companies like Viz.ai, Cleerly—I mean, I could go on: Iterative Health … I could list 20 organizations that are bringing AI to the point of care that are founded by physicians.

This is our moment to have a seat at the table and to shape not only the discourse but the deployment. And the unique lens, of course, that a physician brings is that of prioritizing the patient, and with AI and this time around, we have to do that.

LEE: So LCME for our listeners is, I think it stands for the Liaison Committee on Medical Education (opens in new tab). It’s basically the accrediting body for US medical schools, and it’s very high stakes. It’s very, very rigorous, which I think is a good thing, but it’s also a bit of a straitjacket.  

So if you are on the LCME, are there specific new curricular elements that you would demand that LCME, you know, add to its accreditation standards? 

CHEATHAM: We need to unbundle the different components of the medical appointment and think about the different functions of a human clinician to answer that question.  

There are a couple of areas that are top of mind for me, the first being medical search. There are large organizations and healthcare incumbents that have been around for many decades, companies like UpToDate or even, you know, the guidelines that are housed by our medical societies, that need to respond to the information demands of clinicians at the point of care in a new way with AI.  

And so I would love to see our medical institutions teaching more students how to use AI for medical search problems at the point of care. How to not only, you know, from a prompt perspective, ask questions about patients in a high-efficacy way, but also to interpret the outputs of these systems to inform downstream clinical decision-making.  

People are already adopting, as you know, GPT, OpenEvidence, Perplexity, all of these tools to make these decisions now.  

And so by not—again, it’s a moral imperative of the LCME—by not having curriculum and support for clinicians doing that, we run the risk of folks not utilizing these tools properly but also to their greatest potential. 

LEE: Yeah, then, but zooming forward then, what about board certification? 

CHEATHAM: Board certification today is already transitioning to an open-book format for many specialties, is my understanding. And in talking to some of my fellow geneticists, who, you know, that’s a pretty challenging board exam in clinical genetics or biochemical genetics. They are using OpenEvidence during those open-book exams.  

So what I would like to see us do is move from a system of rote memorization and regurgitation of fact to an assessment framework that is adaptive, is responsive, and assesses for your ability to use the tools that we now have at our disposal to make sound clinical decisions. 

LEE: Yeah. We’ve heard from Sara Murray, you know, that when she’s doing her rounds, she consults routinely with ChatGPT. And that was something we also predicted, especially Carey Goldberg in our book, you know, wrote this fictional account.  

Is that the primary real-world use of AI? Not only by clinicians, but also by medical students … are medical students, you know, engaged with ChatGPT or, you know, similar? 

CHEATHAM: Absolutely. I’ve listed some of the tools. I think there, in general, Peter, there is this new clinician stack that is emerging of these tools that people are trying, and I think the cycles of iteration are quick, right. Some folks are using Claude [Claude.ai] one week, and they’re trying Perplexity, or they’re trying OpenEvidence, they’re trying GPT for a different task.  

There’s this kind of moment in medicine that every clinician experiences where you’re on rounds, and there’s that very senior attending. And you’ve just presented a patient to them, and you think you did an elegant job, and you’ve summarized all the information, and you really feel good about your differential, and they ask you, like, the one question you didn’t think to address. [LAUGHTER] 

And I’ll tell you, some of the funniest moments I’ve had using AI in the hospital has been, and let me take a step back, that process of an attending physician interrogating a medical student is called “pimping,” for lack of a better phrase.  

And some of the funniest use cases I’ve had for AI in that setting is actually using OpenEvidence or GPT as defense against pimping. [LAUGHTER] So quickly while they’re asking me the question, I put it in, and I’m actually able to answer it right away. So it’s been effective for that. But I would say, you know, [in] the halls of most of the hospitals where I’ve trained, I’m seeing this technology in the wild.

LEE: So now you’re so tech-forward, but that off-label use of AI, we also, when we wrote our book, we weren’t sure that at least top health systems would tolerate this. Do you have an opinion about this? Should these things be better regulated or controlled by the CIOs of Boston Children’s? 

CHEATHAM: I’m a big believer that transparency encourages good behaviors. 

And so the first time I actually tried to use ChatGPT in a clinical setting, it was at a hospital in Rhode Island. I will not name which hospital. But the site was actually blocked. I wasn’t able to access it from a desktop. That was the hospital’s first response to this technology was, let’s make sure none of our clinicians can access it. It has so much potential for medicine. The irony of that today.  

And it’s since, you know, become unblocked. But I was able to use it on my phone. So, to your point, if there’s a will, there’s a way. And we will utilize this technology if we are seeing perceived value. 

LEE: So, yeah, no, absolutely. So now, you know, in some discussions, one superpower that seems to be common across people who are really leading the charge here is they seem to be very good readers and students.  

And I understand you also as a voracious reader. In fact, you’re even on an editorial team for a major medical journal. To what extent does that help?  

And then from your vantage point at New England Journal of Medicine AI—and I’ll have a chance to ask Zak Kohane as the editor in chief the same question—you know, what’s your assessment as you reflect over the last two years for the submitted manuscripts? Are you overall surprised at what you’re seeing? Disappointed? Any kind of notable hits or misses, just in the steady stream of research papers that are getting submitted to that leading journal? 

CHEATHAM: I would say overall, the field is becoming more expansive in the kinds of questions that people are asking.  

Again, when we started, it was this very myopic approach of: “Can we pass these medical licensing exams? Can we benchmark this technology to how we benchmark our human clinicians?” I think that’s a trap. Some folks call this the Turing Trap, right, of let’s just compare everything to what a human is capable of.  

Instead of interrogating what is the unique, as you all talk about in the book, what are the unique attributes of this new substrate for computation and what new behaviors emerge from it, whether that’s from a workflow perspective in the back office, or—as I’m personally more passionate and as we’re seeing more people focus on in the literature—what are the diagnostic capabilities, right. 

I love Eric Topol’s framework for “machine eyes,” right, as this notion of like, yes, we as humans have eyes, and we have looked at medical images for many, many decades, but these machines can take a different approach to a retinal image, right.  

It’s not just what you can diagnose in terms of an ophthalmological disease but maybe a neurologic disease or, you know, maybe liver disease, right. 

So I think the literature is, in general, moving to this place of expansion, and I’m excited by that. 

LEE: Yeah, I kind of have referred to that as triangulation. You know, one of the things I think a trap that specialists in medicine can fall into, like a cardiologist will see everything in terms of the cardiac system. And … whereas a nephrologist will see things in a certain lens.  

And one of the things that you oftentimes see in the responses from a large language model is that more expansive view. At the same time, you know, I wonder … we have medical specialties for good reason. And, you know, at times I do wonder, you know, if there can be confusion that builds up.  

CHEATHAM: This is an under-discussed area of AI—AI collapses medical specialties onto themselves, right. 

You have the canonical example of the cardiologist, you know, arguing that, you know, we should diuresis and maybe the nephrologist arguing that we should, you know, protect the kidneys. And how do two disciplines disagree on what is right for the patient when in theory, there is an objective best answer given that patient’s clinical status? 

My understanding is that the emergence of medical specialties was a function of the cognitive overload of medicine in general and how difficult it was to keep all of the specifics of a given specialty in the mind. Of course, general practitioners are tasked with doing this at some level, but they’re also tasked with knowing when they’ve reached their limit and when they need to refer to a specialist.  

So I’m interested in this question of whether medical specialties themselves need to evolve.  

And if we look back in the history of medical technology, there are many times where a new technology forced a medical specialty to evolve, whether it was certain diagnostic tools that have been introduced or, as we’re seeing now with GLP-1s, the entire cardiometabolic field … 

LEE: Right. 

CHEATHAM: … is having to really reimagine itself with these new tools. So I think AI will look very similar, and we should not hold on to this notion of classical medical specialties simply out of convention.  

LEE: Right. All right. So now you’re starting your residency. You’re, you know, basically leading a charge in health and life sciences for a leading venture firm. I’d like you to predict what the world of healthcare is going to look like, you know, two years from now, five years from now, 10 years from now. And to frame that, to make it a little more specific, you know, what do you think will be possible that you, as a doctor and an investor, will be able to do two years from now, five years from now, 10 years from now that you can’t do today?  

CHEATHAM: Two years from now, I’m optimistic we’ll have greater adoption of AI by clinicians, both for back-office use cases. So whether that’s the scribe and the generation of the note for billing purposes, but also now thinking about that for patient-facing applications.  

We’re already doing this with drafting of notes. I think we’ll see greater proliferation of those more obvious use cases over the next two years. And hopefully we’re seeing that across hospital systems, not just large well-funded academics, but really reaching our community hospitals, our rural hospitals, our under-resourced settings.  

I think hopefully we’ll see greater conversion. Right now, we have this challenge of “pilotitis,” right. A lot of people are trying things, but the data shows that only one in three pilots are really converting to production use. So hopefully we’ll kind of move things forward that are working and pare back on those that are not. 

We will not solve the problem of payment models in the next two years. That is a prediction I have.  

Over the next five years, I suspect that, with the help of regulators, we will identify better payment mechanisms to support the adoption of AI because it cannot and will not sustain itself simply by asking health systems and hospitals to pay for it. That is not a scalable solution.  

LEE: Yes. Right. Yep. In fact, I think there have to be new revenue-positive incentives if providers are asked to do more in the adoption of technology. 

CHEATHAM: Absolutely. But as we appreciate, some of the most promising applications of AI have nothing to do with revenue. It might simply be providing a diagnosis to somebody, you know, for whom that might drive additional intervention, but may also not.  

And we have to be OK with that because that’s the right thing to do. It’s our moral imperative as clinicians to implement this where it provides value to the patient.

Over the next 10 years, what I—again, being a techno-optimist—am hopeful we start to see is a dissolving of the barrier that exists between care delivery and biomedical discovery.  

This is the vision of the learning health system that was written over 10 years ago, and we have not realized it in practice. I’m a big proponent of ensuring that every single patient that enters our healthcare system not only receives the best care, but that we learn from the experiences of that individual to help the next. 

And in our current system, that is not how it works. But, with AI, that now becomes possible. 

LEE: Well, I think connecting healthcare experiences to medical discovery—I think that that is really such a great vision for the future. And I do agree [that] AI really gives us real hope that we can make it true.  

Morgan, I think we could talk for a few hours more. It’s just incredible what you’re up to nowadays. Thank you so much for this conversation. I’ve learned a lot talking to you. 

CHEATHAM: Peter, thank you so much for your time. I will be clutching my signed copy of The AI Revolution in Medicine for many years to come.  

[TRANSITION MUSIC] 

LEE: Morgan obviously is not an ordinary med school graduate. In previous episodes, one of the things we’ve seen is that people on the leading edge of real-world AI in medicine oftentimes are both practicing doctors as well as technology developers. Morgan is another example of this type of polymath, being both a med student and a venture capitalist. 

One thing that struck me about Morgan is he’s just incredibly hands-on. He goes out, finds leading-edge tools and technologies, and often these things, even though they’re experimental, he takes them into his education and into his clinical experiences. I think this highlights a potentially important point for medical schools, and that is, it might be incredibly important to provide the support—and, let’s be serious, the permission—to students to access and use new tools and technologies. Indeed, the insight for me when I interact with Morgan is that in these early days of AI in medicine, there is no substitute for hands-on experimentation, and that is likely best done while in medical school.

Here’s my interview with Daniel Chen: 

LEE: Daniel, it’s great to have you here. 

DANIEL CHEN: Yeah, it’s a pleasure being here. 

LEE: Well, you know, I normally get started in these conversations by asking, you know, how do you explain to your mother what you do all day? And the reason that that’s a good question is a lot of the people we have on this podcast have fancy titles and unusual jobs, but I’m guessing that your mother would have already a preconceived notion of what a medical student does. So I’d like to twist the question around a little bit for you and ask, what does your mother not realize about how you spend your days at school?  

Or does she get it all right? [LAUGHS] 

CHEN: Oh, no, she is very observant. I’ll say that off the bat. But I think something that she might not realize, is the amount of efforts spent, kind of, outside the classroom or outside the hospital. You know, she’s always, like, saying you have such long days in the hospital. You’re there so early in the morning.  

But what she doesn’t realize is that maybe when I come back from the hospital, it’s not just like, oh, I’m done for the day. Let’s wind down, go to bed. But it’s more like, OK, I have some more practice questions I need to get through; I didn’t get through my studying. Let me write on, like wrap up this research project I’m working on, get that back to the PI [principal investigator]. It’s never ending to a certain extent. Those are some things she doesn’t realize. 

LEE: Yeah, I think, you know, all the time studying, I think, is something that people expect of second-year medical students. And even nowadays at the top medical schools like this one, being involved in research is also expected.  

I think one thing that is a little unusual is that you are actually in clinic, as well, as a second-year student. How has that gone for you? 

CHEN: Yeah, I mean, it’s definitely interesting. I would say I spend my time, especially this year, it’s kind of three things. There’s the preclinical stuff I’m doing. So that’s your classic, you know, you’re learning from the books, though I don’t feel like many of us do have textbooks anymore. [LAUGHTER] 

There’s the clinical aspect, which you mentioned, which is we have an interesting model, longitudinal integrated clerkships. We can talk about that. And the last component is the research aspect, right. The extracurriculars.  

But I think starting out as a second year and doing your rotations, probably early on in, kind of, the clinical medical education, has been really interesting, especially with our format, because typically med students have two years to read up on all the material and, like, get some foundational knowledge. With us, it’s a bit more, we have maybe one year under our belt before we’re thrown into like, OK, go talk to this patient; they have ankle pain, right. But we might have not even started talking about ankle pain in class, right. Well, where do I begin?  

So I think starting out, it’s kind of, like, you know, the classic drinking from a fire hydrant. But you also, kind of, have that embarrassment of you’re talking to the patient like, I have no clue what’s happening [LAUGHTER] or you might have … my differentials all over the place, right.  

But I think the beauty of the longitudinal aspect is that now that we’re, like, in our last trimester, everything’s kind of coming together. Like, OK, I can start to see, you know, here’s what you’re telling me. Here’s what the physical exam findings are. I’m starting to form a differential. Like, OK, I think these are the top three things. 

But in addition to that, I think these are the next steps you should take so we can really focus and hone in on what exact diagnosis this might be. 

LEE: All right. So, of course, what we’re trying to get into is about AI.  

And, you know, the funny thing about AI and the book that Carey, Zak, and I wrote is we actually didn’t think too much about medical education, although we did have some parts of our book where we, well, first off, we made the guess that medical students would find AI to be useful. And we even had some examples, you know, where, you know, you would have a vignette of a mythical patient, and you would ask the AI to pretend to be that patient.  

And then you would have an interaction and have to have an encounter. And so I want to delve into whether any of that is happening. How real it is. But before we do that, let’s get into first off, your own personal contact with AI. So let me start with a very simple question. Do you ever use generative AI systems like ChatGPT or similar? 

CHEN: All the time, if not every day. 

LEE: [LAUGHS] Every day, OK. And when did that start? 

CHEN: I think when it first launched with GPT-3.5, I was, you know, curious. All my friends work in tech. You know, they’re either software engineers, PMs. They’re like, “Hey, Daniel, take a look at this,” and at first, I thought it was just more of a glorified search engine. You know, I was actually looking back.  

My first question to ChatGPT was, what was the weather going to be like the next week, you know? Something very, like, something easily you could have looked up on Google or your phone app, right.  

I was like, oh, this is pretty cool. But then, kind of, fast-forwarding to, I think, the first instance I was using it in med school. I think the first, like, thing that really helped me was actually a coding problem. It was for a research project. I was trying to use SQL.  

Obviously, I’ve never taken a SQL class in my life. So I asked Chat like, “Hey, can you write me this code to maybe morph two columns together,” right? Something that might have taken me hours to maybe Google on YouTube or like try to read some documentation which just goes through my head.

But ChatGPT was able to, you know, not only produce the code, but, like, walk me through like, OK, you’re going to launch SQL. You’re going to click on this menu, [LAUGHTER] put the code in here, make sure your file names are correct. And it worked.  

So it’s been a very powerful tool in that way in terms of, like, giving me expertise in something that maybe I traditionally had no training in. 

LEE: And so while you’re doing this, I assume you had fellow students, friends, and others. And so what were you observing about their contact with AI? I assume you weren’t alone in this. 

CHEN: Yeah, yeah, I think, … I’m not too sure in terms of what they were doing when it first came out, but I think if we were talking about present day, um, a lot of it’s kind of really spot on to what you guys talked about in the book.  

Um, I think the idea around this personal tutor, personal mentor, is something that we’re seeing a lot. Even if we’re having in-class discussions, the lecturer might be saying something, right. And then I might be or I see a friend in ChatGPT or some other model looking up a question.  

And you guys talked about, you know, how it can, like, explain a concept at different levels, right. But honestly, sometimes if there’s a complex topic, I ask ChatGPT, like, can you explain this to me as if I was a 6-year-old?  

LEE: Yeah. [LAUGHS]  

CHEN: Breaking down complex topics. Yeah. So I think it’s something that we see in the pre-clinical space, in lecture, but also even in the clinical space, there’s a lot of teaching, as well. 

Sometimes if my preceptor is busy with patients, but I had maybe a question, I would maybe converse with ChatGPT, like, “Hey, what are your thoughts about this?” Or, like, a common one is, like, medical doctors love to use abbreviations, … 

LEE: Yes.  

CHEN: … and these abbreviations are sometimes only very niche and unique to their specialty, right. [LAUGHTER] 

And I was reading this note from a urogynecologist. [In] the entire first sentence, I think there were, like, 10 abbreviations. Obviously, I compile lists and ask ChatGPT, like, “Hey, in the context of urogynecology, can you define what these could possibly mean,” right? Instead of hopelessly searching in a Google or maybe, embarrassing, asking the preceptor. So in these instances, it’s played a huge role. 

LEE: Yeah. And when you’re doing things like that, it can make mistakes. And so what are your views of the reliability of generative AI, at least in the form of ChatGPT? 

CHEN: Yeah, I think into the context of medicine, right, we fear a lot about the hallucinations that these models might have. And it’s something I’m always checking for. When I talk with peers about this, we find it most helpful when the model gives us a source linking it back. I think the gold standard nowadays in medicine is using something called UpToDate (opens in new tab) that’s written by clinicians, for clinicians. 

But sometimes searching on UpToDate can be a lot of time as well because it’s a lot of information to, like, sort through. But nowadays a lot of us are using something called OpenEvidence, which is also an LLM. But they always cite their citations with, like, published literature, right.  

So I think being able to be conscious of the downfalls of these models and also being able to have the critical understanding of, like, analyzing the actual literature. I think double checking is just something that we’ve been also getting really good at. 

LEE: How would you assess student attitudes—med student attitudes—about AI? Is it … the way you’re coming across is it’s just a natural part of life. But do people have firm opinions, you know, pro or con, when it comes to AI, and especially AI in medicine? 

CHEN: I think it’s pretty split right now. I think there’s the half, kind of, like us, where we’re very optimistic—cautiously optimistic about, you know, the potential of this, right. It’s able to, you know, give us that extra information, of being that extra tutor, right. It’s also able to give us information very quickly, as well.   

But I think the other flip side of what a lot of students hesitate to, which I agree, is this loss of the ability to critically think. Something that you can easily do is, you know, give these models, like, relevant information about the patient history and be like, “Give me a 10-list differential,” right.

LEE: Yes.  

CHEN: And I think it’s very easy as a student to, you know, [say], “This is difficult. Let me just use what the model says, and we’ll go with that,” right. 

So I think being able to separate that, you know, medical school is a time where, you know, you’re learning to become a good doctor. And part of that requires the ability to be observant and critically think. Having these models simultaneously might hinder the ability to do that.  

LEE: Yeah. 

CHEN: So I think, you know, the next step is, like, these models can be great—a great tool, absolutely wonderful. But how do you make sure that it’s not hindering these abilities to critically think? 

LEE: Right. And so when you’re doing your LIC [longitudinal integrated clerkship] work, these longitudinal experiences, and you’re in clinic, are you pulling the phone out of your pocket and consulting with AI? 

CHEN: Definitely. And I think my own policy for this, to kind of counter this, is that the night before when I’m looking over the patient list, the clinic [schedule] of who’s coming, I’m always giving it my best effort first.  

Like, OK, the chief complaint is maybe just a runny nose for a kid in a pediatric clinic. What could this possibly be? Right? At this point, we’ve seen a lot. Like, OK, it could be URI [upper respiratory infection], it could be viral, it could be bacterial, you know, and then I go through the—you know, I try to do my due diligence of, like, going through the history and everything like that, right. 

But sometimes if it’s a more complex case, something maybe a presentation I’ve never seen before, I’ll still kind of do my best coming up with maybe a differential that might not be amazing. But then I’ll ask, you know, ChatGPT like, OK, in addition to these ideas, what do you think?  

LEE: Yeah. 

CHEN: Am I missing something? You know, and usually, it gives a pretty good response. 

LEE: You know, that particular idea is something that I think Carey, Zak, and I thought would be happening a lot more today than we’re observing. And it’s the idea of a second set of eyes on your work. And somehow, at least our observation is that that isn’t happening quite as much by today as we thought it might.  

And it just seems like one of the really safest and most effective use cases. When you go and you’re looking at yourself and other fellow medical students, other second-year students, what do you see when it comes to the “second set of eyes” idea? 

CHEN: I think, like, a lot of students are definitely consulting ChatGPT in that regard because, you know, even in the very beginning, we’re taught to be, like, never miss these red flags, right. So these red flags are always on our differential, but sometimes, it can be difficult to figure out where to place them on that, right.  

So I think in addition to, you know, coming up with these differentials, something I’ve been finding a lot of value [in] is just chatting with these tools to get their rationale behind their thinking, you know.  

Something I find really helpful—I think this is also a part of the, kind of, art of medicine—is figuring out what to order, right, what labs to order.  

LEE: Right.

CHEN: Obviously, you have your order sets that automate some of the things, like in the ED [emergency department], or, like, there are some gold standard imaging things you should do for certain presentations. 

But then you chat to, like, 10 different physicians on maybe the next steps after that, and they give you 10 different answers.  

LEE: Yes.  

CHEN: But there’s never … I never understand exactly why. It’s always like, I’ve just been doing this for all my training, or that’s how I was taught.  

So asking ChatGPT, like, “Why would you do this next?” Or, like, “Is this a good idea?” And seeing the pros and cons has also been really helpful in my learning. 

LEE: Yeah, wow, that’s super interesting. So now, you know, I’d like to get into the education you’re receiving. And, you know, I think it’s fair to say Kaiser Permanente is very progressive in really trying to be very cutting-edge in how the whole curriculum is set up.  

And for the listeners who don’t know this, I’m actually on the board of directors of the school and have been since the founding of the school. And I think one of the reasons why I was invited to be on the board is the school really wanted to think ahead and be cutting edge when it comes to technology.  

So from where I’ve sat, I’ve never been completely satisfied with the amount of tech that has made it into the curriculum. But at the same time, I’ve also made myself feel better about that just understanding that it’s sort of unstoppable, that students are so tech-forward already.  

But I wanted to delve into a little bit here into what your honest opinions are and your fellow students’ opinions are about whether you feel like you’re getting adequate training and background formally as part of your medical education when it comes to things like artificial intelligence or other technologies.  

What do you think? Are you … would you wish the curriculum would change? 

CHEN: Yeah, I think that’s a great question.  

I think from a tech perspective, the school is very good about implementing, you know, opportunities for us to learn. Like, for example, learning how to use Epic, right, or at Kaiser Permanente, what we call HealthConnect, right. These electronic health records. That, my understanding is, a lot of schools maybe don’t teach that.  

That’s something where we get training sessions maybe once or twice a year, like, “Hey, here’s how to make a shortcut in the environment,” right.  

So I think from that perspective, the school is really proactive in providing those opportunities, and they make it very easy to find resources for that, too. I think it … 

LEE: Yeah, I think you’re pretty much guaranteed to be an Epic black belt by the time you [LAUGHS] finish your degree.  

CHEN: Yes, yes.  

But then I think in terms of the aspects of artificial intelligence, I think the school’s taken a more cautiously optimistic viewpoint. They’re just kind of looking around right now.  

Formally in the curriculum, there hasn’t been anything around this topic. I believe the fourth-year students last year got a student-led lecture around this topic.  

But talking to other peers at other institutions, it looks like it’s something that’s very slowly being built into the curriculum, and it seems like a lot of it is actually student-led, you know.  

You know, my friend at Feinberg [School of Medicine] was like we just got a session before clerkship about best practices on how to use these tools.  

I have another friend at Pitt talking about how they’re leading efforts of maybe incorporating some sort of LLM into their in-house curriculum where students can, instead of clicking around the website trying to find the exact slide, they can just ask this tool, like, “OK. We had class this day. They talked about this … but can you provide more information?” and it can pull from that.  

So I think a lot of this, a lot of it is student-driven. Which I think is really exciting because it begs the question, I think, you know current physicians may not be very well equipped with these tools as well, right?  

So maybe they don’t have a good idea of what exactly is the next steps or what does the curriculum look like. So I think the future in terms of this AI curriculum is really student-led, as well. 

LEE: Yeah, yeah, it’s really interesting.  

I think one of the reasons I think also that that happens is [that] it’s not just necessarily the curriculum that lags but the accreditation standards. You know, accreditation is really important for medical schools because you want to make sure that anyone who holds an MD, you know, is a bona fide doctor, and so accreditation standards are pretty strictly monitored in most countries, including the United States.  

And I think accreditation standards are also—my observation—are slow to understand how to adopt or integrate AI. And it’s not meant as a criticism. It’s a big unknown. No one knows exactly what to do and how to do. And so it’s really interesting to see that, as far as I can tell, I’ve observed the same thing that you just have seen, that most of the innovation in this area about how AI should be integrated into medical education is coming from the students themselves.  

It seems, I think, I’d like to think it’s a healthy development. [LAUGHS]

CHEN: Something tells me maybe the students are a bit better at using these tools, as well.  

You know, I talk to my preceptors because KP [Kaiser Permanente] also has their own version … 

LEE: Preceptor, maybe we should explain what that is. 

CHEN: Yeah, sorry. So a preceptor is an attending physician, fully licensed, finished residency, and they are essentially your kind of teacher in the clinical environment.  

So KP has their own version of some ambient documentation device, as well. And something I always like to ask, you know, like, “Hey, what are your thoughts on these tools,” right?  

And it’s always so polarizing, as well, even among the same specialty. Like, if you ask psychiatrists, which I think is a great use case of these tools, right. My preceptor hates it. Another preceptor next door loves it. [LAUGHTER] 

So I think a lot of it’s, like, it’s still, like, a lot of unknowns, like you were mentioning. 

LEE: Right. Well, in fact, I’m glad you brought that up because one thing that we’ve been hearing from previous guests a lot when it comes to AI in clinic is about ambient listening by AI, for example, to help set up a clinical note or even write a clinical note.  

And another big use case that we heard a lot about that seems to be pretty popular is the use of generative AI to respond to patient messages.  

So let’s start with the clinical note thing. First off, do you have opinions about that technology? 

CHEN: I think it’s definitely good.  

I think especially where, you know, if you’re in the family medicine environment or pediatric environment where you’re spending so much time with patients, a note like that is great, right. 

I think coming from a strictly medical student standpoint, I think it’s—honestly, it’d be great to have—but I think there’s a lot of learning when you write the note, you know. There’s a lot of, you know, all of my preceptors talk about, like, when I read your note, you should present it in a way where I can see your thoughts and then once I get to the assessment and plan, it’s kind of funneling down towards a single diagnosis or a handful of diagnoses. And that’s, I think, a skill that requires you to practice over time, right.  

So a part of me thinks, like, if I had this tool where [it] can just automatically give me a note as a first year, then it takes away from that learning experience, you know. 

Even during our first year throughout school, we frequently get feedback from professors and doctors about these notes. And it’s a lot of feedback. [LAUGHTER] It’s like, “I don’t think you should have written that,” “That should be in this section ” … you know, like a medical note or a SOAP note [Subjective, Objective, Assessment, and Plan], where, you know, the subjective is, like, what the patient tells you. Objective is what the physical findings are, and then your assessment of what’s happening, and then your plan. Like, it’s very particular, and then I think medicine is so structured in a way, that’s kind of, like, how everyone does it, right. So kind of going back to the question, I think it’s a great tool, but I don’t think it’s appropriate for a medical student. 

LEE: Yeah, it’s so interesting to hear you say that. I was … one of our previous guests is the head of R&D at Epic, Seth Hain. He said, “You know, Peter, doctors do a lot of their thinking when they write the note.” 

And, of course, Epic is providing ambient, you know, clinical notetaking automation. But he was urging caution because, you know, you’re saying, well, this is where you’re learning a lot. But actually, it’s also a point where, as a doctor, you’re thinking about the patient. And we do probably have to be careful with how we automate parts of that.  

All right. So you’re gearing up for Step 1 of the USMLE [United States Medical Licensing Examination]. That’ll be a big multiple-choice exam. Then Step 2 is similar: very, very focused on advanced clinical knowledge. And then Step 3, you know, is a little more interactive.  

And so one question that people have had about AI is, you know, how do we regulate the use of AI in medicine? And one of the famous papers that came out of both academia and industry was the concept that you might be able to treat AI like a person and have it go through the same licensing. And this is something that Carey, Zak, and I contemplated in our book.  

In the end, at the time we wrote the book, I personally rejected the idea, but I think it’s still alive. And so I’ve wondered if you have any … you know, first off, are you opinionated at all about, what should the requirements be for the allowable use of AI in the kind of work that you’re going to be doing? 

CHEN: Yeah, I think it’s a tough question because, like, where do you draw that line, right? If you apply the human standards of it’s passing exams, then yes, in theory, it could be maybe a medical doctor, as well, right? It’s more empathetic than medical doctors, right? So where do you draw that line?  

I think, you know, part of me thinks it’s maybe it is that human aspect that patients like to connect with, right. And maybe this really is just, like, these tools are just aids in helping, you know, maybe load off some cognitive load, right.  

But I think the other part of me, I’m thinking about this is the next generation who are growing up with this technology, right. They’re interacting with applications all day. Maybe they’re on their iPads. They’re talking to chatbots. They’re using ChatGPT. This is, kind of, the environment they grew up with. Does that mean they also have increased, like, trust in these tools that maybe our generation or the generations above us don’t have that value that human connection? Would they value human connection less?  

You know, I think those are some troubling thoughts that, you know, yes, at end of the day, maybe I’m not as smart as these tools, but I can still provide that human comfort. But if, at the end of the day, the future generation doesn’t really care about that or they perfectly trust these tools because that’s all they’ve kind of known, then where do human doctors stand?  

I think part of that is, there would be certain specialties where maybe the human connection is more important. The longitudinal aspect of building that trust, I think is important. Family medicine is a great example. I think hematology oncology with cancer treatment.  

Obviously, I think anyone’s not going to be thrilled to hear cancer diagnosis, but something tells me that seeing that on a screen versus maybe a physician prompting you and telling you about that tells me that maybe in those aspects, you know, the human nature, the human touch plays an important role there, too. 

LEE: Yeah, you know, I think it strikes me that it’s going to be your generation that really is going to set the pattern probably for the next 50 years about how this goes. And it’s just so interesting because I think a lot will depend on your reactions to things.  

So, for example, you know, one thing that is already starting to happen are patients who are coming in armed, you know, with a differential [LAUGHS], you know, that they’ve developed themselves with the help of ChatGPT. So let me … you must have thought about these things. So, in fact, has it happened in your clinical work already? 

CHEN: Yeah, I’ve seen people come into the ED during my ED shift, like emergency department, and they’ll be like, “Oh, I have neck pain and here are all the things that, you know, Chat told me, ChatGPT told me. What do you think … do I need? I want this lab ordered, that lab ordered.”  

LEE: Right. 

CHEN: And I think my initial reaction is, “Great. Maybe we should do that.” But I think the other reaction is understanding that not everyone has the clinical background of understanding what’s most important, what do we need to absolutely rule out, right?   

So, I think in some regards, I would think that maybe ChatGPT errs on the side of caution, … 

LEE: Yes.  

CHEN: … giving maybe patients more extreme examples of what this could be just to make sure that it’s, in a way, is not missing any red flags as well, right.  

LEE: Right. Yeah.  

CHEN: But I think a lot of this is … what we’ve been learning is it’s all about shared decision making with the patient, right. Being able to acknowledge like, “Yeah, [in] that list, most of the stuff is very plausible, but maybe you didn’t think about this one symptom you have.”  

So I think part of it, maybe it’s a sidebar here, is the idea of prompting, right. You know, they’ve always talked about all these, you know, prompt engineers, you know, how well can you, like, give it context to answer your question? 

LEE: Yeah. 

CHEN: So I think being able to give these models the correct information and the relevant information and keyword relevant, because relevant is, I guess, where your clinical expertise comes in. Like, what do you give the model, what do you not give? So I think that difference between a medical provider versus maybe your patients is ultimately the difference. 

LEE: Let me press on that a little bit more because you brought up the issue of trust, and trust is so essential for patients to feel good about their medical care.  

And I can imagine you’re a medical student seeing a patient for the first time. So you don’t have a trust relationship with that patient. And the patient comes in maybe trusting ChatGPT more than you. 

CHEN: Very valid. No. I mean, I get that a lot, surprisingly, you know. [LAUGHTER] Sometimes [they’re] like, “Oh, I don’t want to see the medical student,” because we always give the patient an option, right. Like, it’s their time, whether it’s a clinic visit.  

But yeah, those patients, I think it’s perfectly reasonable. If I heard a second-year medical student was going to be part of my care team, taking that history, I’d be maybe a little bit concerned, too. Like, are they asking all the right questions? Are they relaying that information back to their attending physician correctly?  

So I think a lot of it is, at least from a medical student perspective, is framing it so the patient understands that this is a learning opportunity for the students. And something I do a lot is tell them like, “Hey, like, you know, at the end of the day, there is someone double-checking all my work.”  

LEE: Yeah. 

CHEN: But for those that come in with a list, I sometimes sit down with them, and we’ll have a discussion, honestly.  

I’ll be like, “I don’t think you have meningitis because you’re not having a fever. Some of the physical exam maneuvers we did were also negative. So I don’t think you have anything to worry about that,” you know.  

So I think it’s having that very candid conversation with the patient that helps build that initial trust. Telling them like, “Hey … ” 

LEE: It’s impressive to hear how even keeled you are about this. You know, I think, of course, and you’re being very humble saying, well, you know, as a second-year medical student, of course, someone might not, you know, have complete trust. But I think that we will be entering into a world where no doctor is going to be, no matter how experienced or how skilled, is going to be immune from this issue. 

So we’re starting to run toward the end of our time together. And I like to end with one or two more provocative questions.  

And so let me start with this one. Undoubtedly, I mean, you’re close enough to tech and digital stuff, digital health, that you’re undoubtedly familiar with famous predictions, you know, by Turing and Nobel laureates that someday certain medical specialties, most notably radiology, would be completely supplanted by machines. And more recently, there have been predictions by others, like, you know, Elon Musk, that maybe even some types of surgery would be replaced by machines.  

What do you think? Do you have an opinion? 

CHEN: I think replace is a strong term, right. To say that doctors are completely obsolete, I think, is unlikely.  

If anything, I think there might be a shift maybe in what it means to be a doctor, right. Undoubtedly, maybe the demands of radiologists are going to go down because maybe more of the simple things can truly be automated, right. And you just have a supervising radiologist whose output is maybe 10 times as maybe 10 single radiologists, right.  

So I definitely see a future where the demand of certain specialties might go down.  

And I think when I talk about a shift of what it means to be a physician, maybe it’s not so much diagnostic anymore, right, if these models get so good at, like, just taking in large amounts of information, but maybe it pivots to being really good at understanding the limitations of these models and knowing when to intervene is what it means to be the kind of the next generation of physicians.  

I think in terms of surgery, yeah, I think it’s a concern, but maybe not in the next 50 years. Like those Da Vinci robots are great. I think out of Mayo Clinic, they were demoing some videos of these robots leveraging computer vision to, like, close portholes, like laparoscopic scars. And that’s something I do in the OR [operating room], right. And we’re at the same level at this point. [LAUGHTER] So at that point, maybe.  

But I think robotics still has to address the understanding of like, what if something goes wrong, right? Who’s responsible? And I don’t see a future where a robot is able to react to these, you know, dangerous situations when maybe something goes wrong. You still have to have a surgeon on board to, kind of, take over. So in that regard, that’s kind of where I see maybe the future going. 

LEE: So last question. You know, when you are thinking about the division of time, one of the themes that we’ve seen in the previous guests is more and more doctors are doing more technology work, like writing code and so on. And more and more technologists are thinking deeply and getting educated in clinical and preclinical work.  

So for you, let’s look ahead 10 years. What do you see your division of labor to be? Or, you know, how would you … what would you tell your mom then about how you spend a typical day? 

CHEN: Yeah, I mean, I think for me, technology is something I definitely want to be involved in in my line of work, whether it’s, you know, AI work, whether it’s improving quality of healthcare through technology.  

My perfect division would be maybe still being able to see patients but also balancing some maybe more of these higher-level kind of larger projects. But I think having that division would be something nice. 

LEE: Yeah, well, I think you would be great just from the little bit I know about you. And, Daniel, it’s been really great chatting with you. I wish you the best of luck, you know, with your upcoming exams and getting past this year two of your medical studies. And perhaps someday I’ll be your patient. 

[TRANSITION MUSIC]  

CHEN: Thank you so much. 

LEE: You know, one of the lucky things about my job is that I pretty regularly get to talk to students at all levels, spanning high school to graduate school. And when I get to talk especially to med students, I’m always impressed with their intelligence, just how serious they are, and their high energy levels. Daniel is absolutely a perfect example of all that.  

Now, it comes across as trite to say that the older generation is less adept at technology adoption than younger people. But actually, there probably is a lot of truth to that. And in the conversation with Daniel, I think he was actually being pretty diplomatic but also clear that he and his fellow med students don’t necessarily expect the professors in their med school to understand AI as well as they do. 

There’s no doubt in my mind that medical education will have to evolve a lot to help prepare doctors and nurses for an AI future. But where will this evolution come from?  

As I reflect on my conversations with Morgan and Daniel, I start to think that it’s most likely to come from the students themselves. And when you meet people like Morgan and Daniel, it’s impossible to not be incredibly optimistic about the next generation of clinicians. 

[THEME MUSIC] 

Another big thank-you to Morgan and Daniel for taking time to share their experiences with us. And to our listeners, thank you for joining us. We have just a couple of episodes left, one on AI’s impact on the operation of public health departments and healthcare systems and another coauthor roundtable. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 

The post Navigating medical education in the era of generative AI appeared first on Microsoft Research.

]]>
How AI will accelerate biomedical research and discovery http://approjects.co.za/?big=en-us/research/podcast/how-ai-will-accelerate-biomedical-research-and-discovery/ Thu, 10 Jul 2025 16:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1144029 Daphne Koller, Noubar Afeyan, and Dr. Eric Topol, leaders in AI-driven medicine, discuss how AI is changing biomedical research and discovery, from accelerating drug target identification and biotech R&D to helping pursue the “holy grail” of a virtual cell.

The post How AI will accelerate biomedical research and discovery appeared first on Microsoft Research.

]]>
Illustrated images of Peter Lee, Daphne Koller, Noubar Afeyan, and Dr. Eric Topol for the Microsoft Research Podcast

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Daphne Koller (opens in new tab), Noubar Afeyan (opens in new tab), and Dr. Eric Topol (opens in new tab), leaders in AI-driven medicine, join Lee to explore the rapidly evolving role of AI across the biomedical and healthcare landscape. Koller, founder and CEO of Insitro, shares how machine learning is transforming drug discovery, especially target identification for complex diseases like ALS, by uncovering biological patterns across massive datasets. Afeyan, founder and CEO of Flagship Pioneering and co-founder and chairman of Moderna, discusses how AI is being applied across biotech research and development, from protein design to autonomous science platforms. Topol, executive vice president of Scripps Research and founder and director of the Scripps Research Translational Institute, highlights how AI can today help mitigate and prevent the core diseases that erode our health and the possibility of realizing a virtual cell. Through his conversations with the three, Lee investigates how AI is reshaping the discovery, deployment, and delivery of medicine. 

Transcript 

[MUSIC]

[BOOK PASSAGE] 

PETER LEE: “Can GPT-4 indeed accelerate the progression of medicine … ? It seems like a tall order, but if I had been told six months ago that it could rapidly summarize any published paper, that alone would have satisfied me as a strong contribution to research productivity. … But now that I’ve seen what GPT-4 can do with the healthcare process, I expect a lot more in the realm of research.” 

[END OF BOOK PASSAGE]

[THEME MUSIC]

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES]

The book passage I read at the top was from “Chapter 8: Smarter Science,” which was written by Zak.

In writing the book, we were optimistic about AI’s potential to accelerate biomedical research and help get new and much-needed treatments and drugs to patients sooner. One area we explored was generative AI as a designer of clinical trials. We looked at generative AI’s adeptness at summarizing helping speed up pre-trial triage and research. We even went so far as to predict the arrival of a large language model that can serve as a central intellectual tool. 

For a look at how AI is impacting biomedical research today, I’m excited to welcome Daphne Koller, Noubar Afeyan, and Eric Topol. 

Daphne Koller is the CEO and founder of Insitro, a machine learning-driven drug discovery and development company that recently made news for its identification of a novel drug target for ALS and its collaboration with Eli Lilly to license Lilly’s biochemical delivery systems. Prior to founding Insitro, Daphne was the co-founder, co-CEO, and president of the online education platform Coursera.

Noubar Afeyan is the founder and CEO of Flagship Pioneering, which creates biotechnology companies focused on transforming human health and environmental sustainability. He is also co-founder and chairman of the messenger RNA company Moderna. An entrepreneur and biochemical engineer, Noubar has numerous patents to his name and has co-founded many startups in science and technology.

Dr. Eric Topol is the executive vice president of the biomedical research non-profit Scripps Research, where he founded and now directs the Scripps Research Translational Institute. One of the most cited researchers in medicine, Eric has focused on promoting human health and individualized medicine through the use of genomic and digital data and AI. 

These three are likely to have an outsized influence on how drugs and new medical technologies soon will be developed.

[TRANSITION MUSIC] 

Here’s my interview with Daphne Koller:

LEE: Daphne, I’m just thrilled to have you join us. 

DAPHNE KOLLER: Thank you for having me, Peter. It’s a pleasure to be here. 

LEE: Well, you know, you’re quite well-known across several fields. But maybe for some audience members of this podcast, they might not have encountered you before. So where I’d like to start is a question I’ve been asking all of our guests.

How would you describe what you do? And the way I kind of put it is, you know, how do you explain to someone like your parents what you do for a living? 

KOLLER: So that answer obviously has shifted over the years.

What I would say now is that we are working to leverage the incredible convergence of very powerful technologies, of which AI is one but not the only one, to change the way in which we discover and develop new treatments for diseases for which patients are currently suffering and even dying. 

LEE: You know, I think I’ve known you for a long time. 

KOLLER: Longer than I think either of us care to admit. 

LEE: [LAUGHS] In fact, I think I remember you even when you were still a graduate student. But of course, I knew you best when you took up your professorship at Stanford. And I always, in my mind, think of you as a computer scientist and a machine learning person. And in fact, you really made a big name for yourself in computer science research in machine learning.

But now you’re, you know, leading one of the most important biotech companies on the planet. How did that happen?

KOLLER: So people often think that this is a recent transition. That is, after I left Coursera, I looked around and said, “Hmm. What should I do next? Oh, biotech seems like a good thing,” but that’s actually not the way it transpired.

This goes all the way back to my early days at Stanford, where, in fact, I was, you know, as a young faculty member in machine learning, because I was the first machine learning hire into Stanford’s computer science department, I was looking for really exciting places in which this technology could be deployed, and applications back then, because of scarcity of data, were just not that inspiring.

And so I looked around, and this was around the late ’90s, and realized that there was interesting data emerging in biology and medicine. My first application actually was in, interestingly, in epidemiology—patient tracking and tuberculosis. You know, you can think of it as a tiny microcosm of the very sophisticated models that COVID then enabled in a much later stage.

LEE: Right. 

KOLLER: And so initially, this was based almost entirely on just technical interest. It’s kind of like, oh, this is more interesting as a question to tackle than spam filtering. But then I became interested in biology in its own right, biology and medicine, and ended up having a bifurcated existence as a Stanford professor where half my lab continued to do core computer science research published in, you know, NeurIPS and ICML. And the other half actually did biomedical research that was published in, you know, Nature Cell [and] Science. So that was back in, you know, the early, early 2000s, and for most of my Stanford career, I continued to have both interests.

And then the Coursera experience kind of took me out of Stanford and put me in an industry setting for the first time in my life actually. But then when my time at Coursera came to an end, you know, I’d been there for five years. And if you look at the timeline, I left Stanford in early 2012, right as the machine learning revolution was starting. So I missed the beginning.

And it was only in like 2016 or so that, as I picked my head up over the trenches, like, “Oh my goodness, this technology is going to change the world.” And I wanted to deploy that big thing towards places where it would have beneficial impact on the world, like to make the world a better place.

LEE: Yeah. 

KOLLER: And so I decided that one of the areas where I could make a unique, differentiated impact was in really bringing AI and machine learning to the life sciences, having spent, you know, the majority of my career at the boundary of those two disciplines. And notice I say “boundary” with deliberation because there wasn’t very much of an intersection.

LEE: Right. 

KOLLER: I felt like I could do something that was unique. 

LEE: So just to stick on you for a little bit longer, you know, we have been sort of getting into your origin story about what we call AI today—but machine learning, so deep learning. 

And, you know, there has always been a kind of an emotional response for people like you and me and now the general public about their first encounters with what we now call generative AI. I’d love to hear what your first encounter was with generative AI and how you reacted to this. 

KOLLER: I think my first encounter was actually an indirect one. Because, you know, the earlier generations of generative AI didn’t directly touch our work at Insitro (opens in new tab)

And yet at the same time, I had always had an interest in computer vision. That was a large part of my non-bio work when I was at Stanford. 

And so some of my earlier even presentations, when I was trying to convey to people back in 2016 how this technology was going to transform the world, I was talking about the incredible progress in image recognition that had happened up until that point. 

So my first interaction was actually in the generative AI for images, where you are able to go the other way … 

LEE: Yes. 

KOLLER: … where you can take a verbal description of an image and create—and this was back in the days when the images weren’t particularly photorealistic, but still a natural language description to an image was magic given that only two or three years before that, we were barely able to look at an image and write a short phrase saying, “This is a dog on the beach.” And so that arc, that hockey curve, was just mind blowing to me. 

LEE: Did you have moments of skepticism? 

KOLLER: Yeah, I mean the early, you know, early versions of ChatGPT, where it was more like parlor tricks and poking it a little bit revealed all of the easy ways that one could break it and make it do really stupid things. I was like, yeah, OK, this is kind of cute, but is it going to actually make a difference? Is it going to solve a problem that matters? 

And I mean, obviously, I think now everyone agrees that the answer is yes, although there are still people who are like, yeah, but maybe it’s around the edges. I’m not among them, by the way, but … yeah, so initially there were like, “Yeah, this is cute and very impressive, but is it going to make a difference to a problem that matters?” 

LEE: Yeah. So now, maybe this is a good time to get into what you’ve been doing with ALS [amyotrophic lateral sclerosis]. You know, there’s a knee-jerk reaction from the technology side to focus on designing small molecules, on predicting, you know, their properties, you know, maybe binding affinity or aspects of ADME [absorption, distribution, metabolism, and excretion], you know, like absorption or dispersion or whatever. 

And all of that is very useful, but if I understand the work on ALS, you went to a much harder place, which is to actually identify and select targets. 

KOLLER: That’s right. 

LEE: So first off, just for the benefit of the standard listeners of this podcast, explain what that problem is in general. 

KOLLER: No, for sure. And I think maybe I’ll start by just very quickly talking about the drug discovery and development arc, …

LEE: Yeah.

KOLLER: … which, by and large, consists of three main phases. That’s the standard taxonomy. The first is what’s called sometimes target discovery or identifying a therapeutic hypothesis, which looks like: if I modulate this target in this disease, something beneficial will happen. 

Then, you have to take that target and turn it into a molecule that you can actually put into a person. It could be a small molecule. It could be a large molecule like an antibody, whatever. And then you have that construct, that molecule. And the last piece is you put it into a person in the context of a clinical trial, and you measure what has happened. And there’s been AI deployed towards each of those three stages in different ways. 

The last one is mostly like an efficiency gain. You know, the trial is kind of already defined, and you want to deploy technology to make it more efficient and effective, which is great because those are expensive operations. 

LEE: Yep. 

KOLLER: The middle one is where I would say the vast majority of efforts so far has been deployed in AI because it is a nice, well-defined problem. It doesn’t mean it’s easy, but it’s one where you can define the problem. It is, I need to inhibit this protein by this amount, and the molecule needs to be soluble and whatever and go past the blood-brain barrier. And you know probably within a year and a half or so, or two, if you succeeded or not. 

The first stage is the one where I would say the least amount of energy has gone because when you’re uncovering a novel target in the context of an indication, you don’t know that you’ve been successful until you go all the way to the end, which is the clinical trial, which is what makes this a long and risky journey. And not a lot of people have the appetite or the capital to actually do that. 

However, in my opinion, and that of, I think, quite a number of others, it is where the biggest impact can be made. And the reason is that while pharma has its deficiencies, making good molecules is actually something they’re pretty good at. 

It might take them longer than it should, maybe it’s not as efficient as it could be, but at the end of the day, if you tell them to drug A target, pharma is actually pretty good at generating those molecules. However, when you put those molecules into the clinic, 90% of them fail. And the reason they fail is not by and large because the molecule wasn’t good. In the majority of cases, it’s because the target you went after didn’t do anything useful in the context of the patient population in which you put it. 

And so in order to fix the inefficiency of this industry, which is incredible inefficiency, you need to address the problem at the root, and the root is picking the right targets to go after. And so that is what we elected to do. 

It doesn’t mean we don’t make molecules. I mean, of course, you can’t just end up with a target because a target is not actionable. You need to turn it into a molecule. And we absolutely do that. And by the way, the partnership with Lilly (opens in new tab) is actually one where they help us make a molecule. 

LEE: Yes. 

KOLLER: I mean, it’s our target. It’s our program. But Lilly is deploying its very state-of-the-art molecule-making capabilities to help us turn that target into a drug. 

LEE: So let’s get now into the machine learning of this. Again, this just strikes me as such a difficult problem to solve. 

KOLLER: Yeah. 

LEE: So how does machine learning … how does AI help you? 

KOLLER: So I think when you look at how people currently select targets, it’s a combination of oftentimes at this point, with an increasing respect for the power of human genetics, some search for a genetic association, oftentimes with a human-defined, highly subjective, highly noisy clinical outcome, like some ICD [International Classification of Diseases] code. 

And those are often underpowered and very difficult to deconvolute the underlying biology. You combine that with some mechanistic interrogation in a highly reductionist model system looking at a small number of readouts, biochemical readouts, that a biologist thinks are relevant to the disease. Like does this make this, whatever, cholesterol go up or amyloid beta go down? Or whatever. And then you take that as the second stage, and you pick, based on typically human intuition about, Oh, this one looks good to me, and then you take that forward. 

What we’re doing is an attempt to be as unbiased and holistic as possible. So, first of all, rather than rely on human-defined clinical endpoints, like this person has been diagnosed with diabetes or fatty liver, we try and measure as much as we can a holistic physiological state and then use machine learning to find structure, patterns in that human physiological readouts, imaging readouts, and omics readouts from blood, from tissue, different kinds of imaging, and say, these are different vectors that this disease takes, this group of individuals, and here’s a different group of individuals that maybe from a diagnostical perspective are all called the same thing, but they are actually exhibiting a very different biology underlying it. 

And so that is something that doesn’t emerge when a human being takes a reductionist view to looking at this high-content data, and oftentimes, they don’t even look at it and produce an ICD code. 

LEE: Right. Yep. 

KOLLER: The same approach, actually even the same code base, is taken in the cellular data. So we don’t just say, “Well, the thing that matters is, you know, the total amount of lipid in the cell or whatever.” Rather, we say, “Let’s look at multiple readouts, multiple ways of looking at the cells, combine them using the power of machine learning.” And again, looking at imaging readouts where a human’s eyes just glaze over looking at even a few dozen cells, far less a few hundreds of millions of cells, and understand what are the different biological processes that are going on. What are the vectors that the disease might take you in this direction, in this group of cells, or in that direction? 

And then importantly, we take all of that information from the human side, from the cellular side, across these different readouts, and we combine them using an integrative approach that looks at the combined weight of evidence and says, these are the targets that I have the greatest amount of conviction about by looking across all of that information. Whereas we know, and we know this, I’m sure you’ve seen this analysis done for clinicians, a human being typically is able to keep three or four things in their head at the same time. 

LEE: Right. 

KOLLER: A really good human being who’s really expert at what they do can maybe get to six to eight. 

LEE: Yeah. 

KOLLER: The machine learning has no problem doing a few hundred. 

LEE: Right. 

KOLLER: And so you put that together, and that allows you, to your earlier question, really select the targets around which you have the highest conviction. And then those are the ones that we then prioritize for interrogation in more expensive systems like mice and monkeys and then at the end of the day pick the small handful that one can afford to actually take into clinical trials. 

LEE: So now, Insitro recently received $25 million in milestone payments from Bristol Myers Squibb (opens in new tab) after discovering and selecting a novel drug target for ALS. Can you tell us a little bit more about that? 

KOLLER: We are incredibly excited about the first novel target, and there is a couple of others just behind it in line that seem, you know, quite efficacious, as well, that truly seem to reverse, albeit in a cellular system, what we now understand to be ALS pathology across multiple different dimensions. There’s been obviously many attempts made to try and address ALS, which by the way, horrible, horrible disease, worse than most cancers. It kills you almost inevitably in three to five years in a particularly horrific way. 

And what we have in our hands is a target that seems to revert a lot of the pathologies that are associated with the disease, which we now understand has to do with the mis-splicing of multiple proteins within the cell and creating defective versions of those proteins that are just not operational. And we are seeing reversion of many of those. 

So can I tell you for sure it’ll work in a human? No, there’s many steps between now and then. But we couldn’t be more excited about the opportunity to provide what we hope will be a disease-modifying intervention for these patients who really desperately need something. 

LEE: Well, it’s certainly been making waves in the biotech and biomedical world. 

KOLLER: Thank you. 

LEE: So we’ll be really watching very closely. 

So, you know, I think just reflecting on, you know, what we missed and what we got right in our book, I think in our book, we did have the insight that there would be an ability to connect, say, genotypic and phenotypic data and, you know, just broadly the kinds of clinical measurements that get made on real patients and that these things could be brought together. And I think the work that you’re doing really illustrates that in a very, very sophisticated, very ambitious way. 

But the fact that this could be connected all the way down to the biology, to the biochemistry, I think we didn’t have any clue what would happen, at least not this quickly. 

KOLLER: Well, I think the … 

LEE: And I realize, you’ve been at this for quite a few years, but still, it’s quite amazing. 

KOLLER: The thread that connects them is human genetics. And I think that has, to us, been, sort of, the, kind of, the connective tissue that allows you to translate across different systems and say, “What does this gene do? What does this gene do in this organ and in that organ? What does it do in this type of cell and in that type of cell?” 

And then use that as sort of the thread, if you will, that follows the impact of modulating this gene all the way from the simple systems where you can do the experiment to the complex systems where you can’t do the experiment until the very end, but you have the human genetics as a way of looking at the statistics and understanding what the impact might be. 

LEE: So I’d like to now switch gears and take … I want to take two steps in the remainder of this conversation towards the future. So one step into that future, of course, we’re living through now, which is just all of the crazy pace of work and advancement in generative AI generally, you know, just the scale of transformers, of post-training, and now inference scale and reasoning models and so on. And where do you see all of that going with respect to the goals that you have and that Insitro has? 

KOLLER: So I think first and foremost is the parallel, if you will, to the predictions that you focused on in your book, which is this will transform a lot of the core data processing tasks, the information tasks. And sure, the doctors and nurses is one thing. But if you just think of clinical trial operations or the submission of regulatory documents, these are all kind of simple data … they’re not simple, obviously, but they’re data processing tasks. They involve natural language. That’s not going to be our focus, but I hope that others will use that to make clinical trials faster, more efficient, less expensive. 

There’s already a lot of progress that’s happening on the molecular design side of things and taking hypotheses and turning them quickly and effectively into molecules. As I said, this is part of our work that we absolutely do and we don’t talk about it very much, simply because it’s a very crowded landscape and a lot of companies are engaged on that. But I think it’s really important to be able to take biological insights and turn them into new molecules. 

And then, of course, the transformer models and their likes play a very significant role in that sort of turning insights into molecules because you can have foundation models for proteins. There are increasing efforts to create foundation models for other categories of molecules. And so that will undoubtedly accelerate the process by which you can quickly generate different molecular hypotheses and test them and learn from what you did so that you can do fewer iterations … 

LEE: Right. 

KOLLER: … before you converge on a successful molecule. 

I do think that arguably the biggest impact as yet to be had is in that understanding of core human biology and what are the right ways to intervene in it. And that plays a role in a couple different ways. First of all, it certainly plays a role in which … if we are able to understand the human physiological state and, you know, the state of different systems all the way down to the cell level, that will inform our ability to pick hypotheses that are more likely to actually impact the right biologies underneath. 

LEE: Yep. Yeah. 

KOLLER: And the more data we’re able to collect about humans and about cells, the more successful our models will be at representing that human physiological state or the cell biological state and making predictions reliably on the impact of these interventions. 

The other side of it, though, and this comes back, I think, to themes that were very much in your book, is this will impact not only the early stages of which hypotheses we interrogate, which molecules we move forward, but also hopefully at the end of the day, which molecule we prescribe to which patient. 

LEE: Right. 

KOLLER: And I think there’s been obviously so much narrative over the years about precision medicine, personalized medicine, and very little of that has come to fruition, with the exception of, you know, certain islands in oncology, primarily on genetically driven cancers. 

But I think the opportunity is still there. We just haven’t been able to bring it to life because of the lack of the right kind of data. And I think with the increasing amount of human, kind of, foundational data that we’re able to acquire, things that are not sort of distilled through the eye of a clinician, for example, … 

LEE: Yes. 

KOLLER: … but really measurements of human pathology, we can start to get to some of that precision, carving out of the human population and then get to a world where we can prescribe the right medicine to the right patient and not only in cancer but also in other diseases that are also not a single disease. 

LEE: All right, so now to wrap up this time together, I always try to ask one more provocative last question. One of the dreams that comes naturally to someone like me or any of my colleagues, probably even to you, is this idea of, you know, wouldn’t it be possible someday to have a foundation model for biology or for human biology or foundation model for the human cell or something along these lines? 

And in fact, there are, of course, you and I are both aware of people who are taking that idea seriously and chasing after it. I have people in our labs that think hard about this kind of thing. Is it a reasonable thought at all? 

KOLLER: I have learned over the years to avoid saying the word never because technology proceeds in ways that you often don’t expect. And so will we at some point be able to measure the cell in enough different ways across enough different channels at the same time that you can piece together what a cell does? I think that is eminently feasible, not today, but over time. 

I don’t think it’s feasible using today’s technology, although the efforts to get there may expose where the biggest opportunities lie to, you know, build that next layer. So I think it’s good that people are working on really hard problems. I would also point out that even if one were to solve that really challenging problem of creating a model of a cell, there is thousands of different types of cells within the human body. 

They’re very different. They also talk to each other … 

LEE: Yep. 

KOLLER: … both within the cell type and across different cell types. So the combinatorial complexity of that system is, I think, unfathomable to many people. I mean, I would say to all of us. 

LEE: Yeah. 

KOLLER: And so even from that very lofty goal, there is multiple big steps that would need to be taken to a mechanistic model of the full organism. So will we ever get there? Again, you know, I don’t see a reason why this is impossible to do. So I think over time, technology will get better and will allow us to build more and more elaborate models of more and more complex systems. 

Patients can’t wait …

LEE: Right. Yeah. 

KOLLER: … for that to happen in order for us to get them better medicines. So I think there is a great basic science initiative on that side of things. And, in parallel, we need to make do with the data that we have or can collect or can print. We print a lot of data in our internal wet labs and get to drugs that are effective even though they don’t benefit from having a full-blown mechanistic model. 

LEE: Last question: where do you think we’ll be in five years? 

KOLLER: Phew. If I had answered that question five years ago, I would have been very badly embarrassed at the inaccuracy of my answer. [LAUGHTER] So I will not answer it today either. 

I will say that the thing about exponential curves is that they are very, very tricky, and they move in unexpected ways. I would hope that in five years, we will have made a sufficient investment in the generation of scientific data that we will be able to move beyond data that was generated entirely by humans and therefore insights that are derivative of what people already know to things that are truly novel discoveries. 

And I think in order to do that in, you know, math, maybe because math is entirely conceptual, maybe you can do that today. Math is effectively a construct of the human mind. I don’t think biology is a construct of the human mind, and therefore one needs to collect enough data to really build those models that will give rise to those novel insights. 

And that’s where I hope we will have made considerable progress in five years. 

LEE: Well, I’m with you. I hope so, too. Well, you know, thank you, Daphne, so much for this conversation. I learn a lot talking to you, and it was great to, you know, connect again on this. And congratulations on all of this success. It’s really groundbreaking. 

KOLLER: Thank you very much, Peter. It was a pleasure chatting with you, as well. 

[TRANSITION MUSIC] 

LEE: I still think of Daphne first and foremost as an AI researcher. And for sure, her research work in machine learning continues to be incredibly influential to this day. But it’s her work on AI-enhanced drug development that now is on the verge of making a really big difference on some of the most difficult diseases afflicting people today. 

In our book, Carey, Zak, and I predicted that AI might be a meaningful accelerant in biomedical research, but I don’t know that we foresaw the incredible potential specifically in drug development. 

Today, we’re seeing a flurry of activity at companies, universities, and startups on generative AI systems that aid and maybe even completely automate the design of new molecules as drug candidates. But now, in our conversation with Daphne, seeing AI go even further than that to do what one might reasonably have assumed to be impossible, to identify and select novel drug targets, especially for a neurodegenerative disease like ALS, it’s just, well, mind blowing. 

Let’s continue our deep dive on AI and biomedical research with this conversation with Noubar Afeyan: 

LEE: Noubar, thanks so much for joining. I’m really looking forward to this conversation. 

NOUBAR AFEYAN: Peter, thanks. Thrilled to be here. 

LEE: While I think most of the listeners to this podcast have heard of Flagship Pioneering (opens in new tab), it’s still worth hearing from you, you know, what is Flagship? And maybe a little bit about your background. And finally, you found a way to balance science and business creation. And so, you know, your approach and philosophy to all of that. 

AFEYAN: Well, great. So maybe I’ll just start out by way of quick background. You know, my … and since we’re going talk about AI, I’ll also highlight my first contact with the topic of AI. So as an undergraduate in 1980 up at McGill University, I was an engineering student, but I was really captivated by, at that time, the talk on the campus around the expert system, heuristic-based, rule-based kind of programs. 

LEE: Right. 

AFEYAN: And so actually I had the dubious distinction of writing my one and only college newspaper article. [LAUGHTER] That was a short career. And it was all about how artificial intelligence would be impacting medicine, would be impacting, you know, speech capture, translation, and some of the ideas that were there that it’s interesting to see now 45 years later re-emerge with some of the new learning-based models. 

My journey after college ended up taking me into biotechnology. In the early ’80s, I came to MIT to do a PhD. At the time, the field was brand new. I ended up being the first PhD graduate from MIT in this combination biology and engineering degree. And since then, I’ve basically been—so since 1987—a founder, a technologist in the space of biotechnology for human health and as well for planetary health. 

And then in 1999/2000 formed what is now Flagship Pioneering, which essentially was an attempt to bring together the three elements of what we know are important in startups. That is scientific capital, human capital, and financial capital. Right now, startups get that from different places. The science in our fields mostly come from academia, research hospitals. The human capital comes from other startups … 

LEE: Yeah. 

AFEYAN: … or large companies or some academics leave. And then the financial capital is usually venture capital, but there’s also now more and more other deeper pockets of money. 

What we thought was, what if all that existed in one entity and instead of having to convince each other how much they should believe the other if we just said, “Let’s use that power to go work on much further out things”? But in a way where nobody would believe it in the beginning, but we could give ourselves a little bit of time to do impactful big things. 

Twenty-five years later, that’s the road we’ve stayed on. 

LEE: OK. So let’s get into AI. Now, you know, what I’ve been asking guests is kind of an origin story. And there’s the origin story of contact with AI, you know, before the emergence of generative AI and afterwards. I don’t think there’s much of a point to asking you the pre-ChatGPT. But … so let’s focus on your first encounter with ChatGPT or generative AI. When did that happen, and what went through your head? 

AFEYAN: Yeah. So, if you permit me, Peter, just for very briefly, let me actually say I had the interesting opportunity over the last 25 years to actually stay pretty close to the machine learning world … 

LEE: Yeah. Yeah. 

AFEYAN: … because one, as you well know, among the most prolific users of machine learning has been the bioinformatics computational biology world because it’s been so data rich that anything that can be done, people have thrown at these problems because unlike most other things, we’re not working on man-made data. We’re looking at data that comes from nature, the complexity of which far exceeds our ability to comprehend. 

So you could imagine that any approach to statistically reduce complexity, get signal out of scant data—that’s a problem that’s been around. 

The other place where I’ve been exposed to this, which I’m going to come back to because that’s where it first felt totally different to me, is that some 25 years ago, actually the very first company we started was a company that attempted to use evolutionary algorithms to essentially iteratively evolve consumer-packaged goods online. Literally, we tried to, you know, consider features of products as genes and create little genomes of them. And by recombination and mutation, we could create variety. And then we could get people through panels online—this was 2002/2003 timeframe—we could essentially get people through iterative cycles of voting to create a survival of the fittest. And that’s a company that was called Affinnova. 

The reason I say that is that I knew that there’s a much better way to do this if only: one, you can generate variety … 

LEE: Yeah. 

AFEYAN: … without having to prespecify genes. We couldn’t do that before. And, two, which we’ve come back to nowadays, you can actually mimic how humans think about voting on things and just get rid of that element of it. 

So then to your question of when does this kind of begin to feel different? So you could imagine that in biotechnology, you know, as an engineer by background, I always wanted to do CAD, and I picked the one field in which CAD doesn’t exist, which is biology. Computer-aided design is kind of a notional thing in that space. But boy, have we tried. For a long time, …

LEE: Yep. 

AFEYAN: … people would try to do, you know, hidden Markov models of genomes to try to figure out what should be the next, you know, base that you may want to or where genes might be, etc. But the notion of generating in biology has been something we’ve tried for a while. And in the late teens, so kind of 2018, ’17, ’18, because we saw deep learning come along, and you could basically generate novelty with some of the deep learning models … and so we started asking, “Could you generate a protein basically by training a correspondence table, if you will, between protein structures and their underlying DNA sequence?” Not their protein sequence, but their DNA sequence. 

LEE: Yeah. 

AFEYAN: So that’s a big leap. So ’17/’18, we started this thing. It was called 56. It was FL56, Flagship Labs 56, our 56th project. 

By the way, we started this parallel one called “57” that did it in a very different way. So one of them did pure black box model-building. The other one said, you know what, we don’t want to do the kind of … at that time, AlphaFold was in its very early embodiments. And we said, “Is there a way we could actually take little, you know, multi amino acid kind of almost grammars, if you will, a little piece, and then see if we could compose a protein that way?” So we were experimenting. 

And what we found was that actually, if you show enough instances and you could train a transformer model—back in the day, that’s what we were using—you could actually, say, predict another sequence that should have the same activity as the first one. 

LEE: Yeah. 

AFEYAN: So we trained on green fluorescent proteins. Now, we’re talking about seven years ago. We trained on enzymes, and then we got to antibodies. 

With antibodies, we started seeing that, boy, this could be a pretty big deal because it has big market impact. And we started bringing in some of the diffusion models that were beginning to come along at that time. And so we started getting much more excited. This was all done in a company that subsequently got renamed from FL56 to Generate:Biomedicines (opens in new tab), … 

LEE: Yep, yep. 

AFEYAN: … which is one of the leaders in protein design using the generative techniques. It was interesting because Generate:Biomedicines is a company that was called that before generative AI was a thing, [LAUGHTER] which was kind of very ironic. 

And, of course, that team, which operates today very, very kind of at the cutting edge, has published their models. They came up with this first Chroma (opens in new tab) model, which is a diffusion-based model, and then started incorporating a lot of the LLM capabilities and fusing them. 

Now we’re doing atomistic models and many other things. The point being, that gave us a glimpse of how quickly the capability was gaining, … 

LEE: Yeah. Yeah. 

AFEYAN: … just like evolution shows you. Sometimes evolution is super silent, and then all of a sudden, all hell breaks loose. And that’s what we saw. 

LEE: Right. One of the things that I reflect on just in my own journey through this is there are other emotions that come up. One that was prominent for me early on was skepticism. Were there points when even in your own work, transformer-based work on this early on, that you had doubts or skepticism that these transformer architectures would be or diffusion-based approaches would be worth anything? 

AFEYAN: You know, it’s interesting, I think that, I’m going to say this to you in a kind of a friendly way, but you’ll understand what I mean. In the world I live in, it’s kind of like the slums of innovation, [LAUGHTER] kind of like just doing things that are not supposed to work. The notion of skepticism is a luxury, right. I assume everything we do won’t work. And then once in a while I’m wrong. 

And so I don’t actually try to evaluate whether before I bring something in, like just think about it. We, some hundred or so times a year, ask “what if” questions that lead us to totally weird places of thought. We then try to iterate, iterate, iterate to come up with something that’s testable. Then we go into a lab, and we test it. 

So in that world, right, sitting there going, like, “How do I know this transformer is going to work?” The answer is, “For what?” Like, it’s going to work. To make something up … well, guess what? We knew early on with LLMs that hallucination was a feature, not a bug for what we wanted to do. 

So it’s just such a different use that, of course, I have trained scientific skepticism, but it’s a little bit like looking at a competitive situation in an ecology and saying, “I bet that thing’s going to die.” Well, you’d be right—most of the time, you’d be right. [LAUGHTER] 

So I just don’t … like, it … and that’s why—I guess, call me an early adopter—for us, things that could move the needle even a little, but then upon repetition a lot, let alone this, … 

LEE: Yeah. 

AFEYAN: … you have to embrace. You can’t wait there and say, I’ll embrace it once it’s ready. And so that’s what we did. 

LEE: Hmm. All right. So let’s get into some specifics and what you are seeing either in your portfolio companies or in the research projects or out in the industry. What is going on today with respect to AI really being used for something meaningful in the design and development of drugs? 

AFEYAN: In companies that are doing as diverse things as—let me give you a few examples—a project that’s now become a named company called ProFound Therapeutics (opens in new tab) that literally discovered three, four years ago, and would not have been able to without some of the big data-model-building capabilities, that our cells make literally thousands, if not tens of thousands, of more proteins than we were aware of, full stop. 

We had done the human genome sequence, there was 20,000 genes, we thought that there was … 

LEE: Wow. 

AFEYAN: … maybe 70-80,000, 100,000 proteins, and that’s that. And it turns out that our cells have a penchant to express themselves in the form of proteins, and they have many other ways than we knew to do that. 

Now, so what does that mean? That means that we have generated a massive amount of data, the interpretation of which, the use of which to guide what you do and what these things might be involved with is purely being done using the most cutting-edge data-trained models that allow you to navigate such complexity. 

LEE: Wow. Hmm. 

AFEYAN: That’s just one example. Another example: a company called Quotient Therapeutics (opens in new tab), again three, four years old. I can talk about the ones that are three, four years old because we’ve kind of gotten to a place where we’ve decided that it’s not going to fail yet, [LAUGHTER] so we can talk about it. 

You know, we discovered—our team discovered—that in our cells, right, so we know that when we get cancer, our cells have genetic mutations in them or DNA mutations that are correlated and often causal to the hyperproliferative stages of cancer. But what we assume is that all the other cells in our body, pretty much, have one copy of their genes from our mom, one copy from our dad, and that’s that. 

And when very precise deep sequencing came along, we always asked the question, “How much variation is there cell to cell?” 

LEE: Right. 

AFEYAN: And the answer was it’s kind of noise, random variation. Well, our team said, “Well, what if it’s not really that random?” because upon cell division cycles, there’s selection happening on these cells. And so not just in cancer but in liver cells, in muscle cells, in skin cells … 

LEE: Oh, interesting. 

AFEYAN: … can you imagine that there’s an evolutionary experiment that is favoring either compensatory mutations that are helping you avoid disease or disease-caused mutations that are gaining advantage as a way to understand the mechanism? Sure enough—I wouldn’t be telling you otherwise—with massive amount of single cell sequencing from individual patient samples, we’ve now discovered that the human genome is mutated on average in our bodies 10,000 times, like over every base, like, it’s huge numbers. 

And we’re finding very interesting big signals come out of this massive amount of data. By the way, data of the sort that the human mind, if it tries to assign causal explanations to what’s happening … 

LEE: Right. 

AFEYAN: … is completely inadequate. 

LEE: When you think about a language model, we’re learning from human language, and the totality of human language—at least relative to what we’re able to compute today in terms of constructing a model—the totality of human language is actually pretty limited. And in fact, you know, as is always written about in click-baity titles, you know, the big model builders are actually starting to run short. 

AFEYAN: Running out, running out, yes. [LAUGHTER] 

LEE: But one of the things that perplexes me and maybe even worries me—like these two examples—are generally in the realm of cellular biology and the complexity. Let’s just take the example of your company, ProFound. You know, the complexity of what’s going on and the potential genetic diversity is such that, can we ever have enough data? You know, because there just aren’t that many human beings. There just aren’t that many samples. 

AFEYAN: Well, it depends on what you want to train, right. So if you want to train a de novo evolutionary model that could take you from bacteria to human mammalian cells and the like, there may not be—and I’m not an expert in that—but that’s a question that we often kind of think about. 

But if you’re trying to train a … like you know what the proteins we know about, how they interact with pathways and disease mechanisms and the like. Now all of a sudden you find out that there’s a whole continent of them missing in your explanations. But there are things you can reason, in quotations, through analogy, functional analogy, sequence analogy, homology. So there’s a lot of things that we could do to essentially make use of this, even though you may not have the totality of data needed to, kind of, predict, based on a de novo sequence, exactly what it’s going to do. 

So I agree with the comparison. But … but you’re right. The complexity is … just keep in mind, on average, a protein may be interacting with 50 to 100 other proteins. 

LEE: Right. 

AFEYAN: So if you find thousands of proteins, you’ve found a massive interaction space through which information is being processed in a living cell. 

LEE: But do you find in your AI companies that access to data ends up being a key challenge? Or, you know, how central is that? 

AFEYAN: Access to data is a key challenge for the companies we have that are trying to build just models. But that’s the minority of things we do. The majority of things we do is to actually co-develop the data and the models. And as you know well, because you guys, you know, have given us some ideas around this space, that, you know, you could generate data and then think about what you’re to do with it, which is the way biotech is operated with bioinformatics. 

LEE: Right, right. 

AFEYAN: Or you could generate bespoke data that is used to train the model that’s quite separate from what you would have done in the natural course of biology. So we’re doing much more of the latter of late, and I think that’ll continue. So, but these things are proliferating. 

I mean, it’s hard to find a place where we’re not using this. And the “this” is any and all data-driven model building, generative, LLM-based, but also every other technique to make progress. 

LEE: Sure. So now moving away from the straight biochemistry applications, what about AI in the process of building a business, of making investment decisions, of actually running an operation? What are you seeing there? 

AFEYAN: So, well, you know, Moderna, which is a company that I’m quite proud of being a founder and chairman of, has adopted a significant, significant amount of AI embedded into their operations in all aspects: from the manufacturing, quality control, the clinical monitoring, the design—every aspect. And in fact, they’ve had a partnership that they’ve had for a little while here with OpenAI, and they’ve tried many different ways to stay at the cutting edge of that. 

So we see that play out at some scale. That’s a 5,000-, 6,000-person organization, and what they’re doing is a good example of what early adopters would do, at least in our kind of biotechnology company. 

But then, you know, in our space, I would say the efficiency impact is kind of no different, than, you know, anywhere else in academia you might adopt it or in other kinds of companies. But where I find it an interesting kind of maybe segue is the degree to which it may fundamentally change the way we think about how to do science, which is a whole other use, right? 

LEE: Right. 

AFEYAN: So it’s not an efficiency gain per se, although it’s maybe an effectiveness gain when it comes to science, but can you just fundamentally train models to generate hypotheses? 

LEE: Yep. 

AFEYAN: And we have done that, and we’ve been doing this for the last three years. And now it’s getting better and better, the better these reasoning engines are getting and kind of being able to extrapolate and train for novelty. Can you convert that to the world’s best experimental protocol to very precisely falsify your hypothesis, on and on? 

That closing of that loop, kind of what we call autonomous science, which we’ve been trying to do for the last two, three years and are making some progress in, that to me is another kind of bespoke use of these things, not to generate molecules in its chemistry, but to change the behavior of how science is done. 

LEE: Yeah. So I always end with a couple of provocative questions, but I need—before we do that, while we’re on this subject—to get your take on Lila Sciences (opens in new tab)

And there is a vision there that I think is very interesting. It’d be great to hear it described by you. 

AFEYAN: Sure. So Lila, after operating for two to three years in kind of a preparatory kind of stealth mode, we’ve now had a little bit more visibility around, and essentially what we’re trying to do there is to create what we call automated science factories, and such a factory would essentially be able to take problems, either computationally specified or human-specified, and essentially do the experimental work in order to either make an optimization happen or enable something that just didn’t exist. And it’s really, at this point, we’ve shown proof of concept in narrow areas. 

LEE: Yep. 

AFEYAN: But it’s hard to say that if you can do this, you can’t do some other things, so we’re just expanding it that way. We don’t think we need a complete proof or complete demonstration of it for every aspect. 

LEE: Right. 

AFEYAN: So we’re just kind of being opportunistic. The idea for Lila is to partner with a number of companies. The good news is, within Flagship, there’s 48 of them. And so there’s a whole lot of them they can partner with to get their learning cycles. But eventually they want to be a real alternative to every time somebody has an idea, having to kind of go into a lab and manually do this. 

I do want to say one thing we touched on, Peter, though, just on that front, which is … 

LEE: Yep. 

AFEYAN: … if you say, like, “What problem is this going to solve?” It’s several but an important one is just the flat-out human capacity to reason on this much data and this much complexity that is real. Because nature doesn’t try to abstract itself in a human understandable form. 

LEE: Right. Yeah. 

AFEYAN: In biology, since it’s kind of like progress happens through evolutionary kind of selections, the evidence of which [has] long been lost, and so therefore, you just see what you have, and then it has a behavior. I really do think that there’s something to be said, and I want to—just for your audience—lay out a provocative, at least, thought on all this, which Lila is a beginning embodiment of, which is that I really think that what’s going to happen over the next five, 10 years, even while we’re all fascinated with the impending arrival of AGI [artificial general intelligence] is really what I call poly-intelligence, which is the combination of human intelligence, machine intelligence, AI, and nature’s intelligence. 

We’re all fascinated at the human-machine interface. We know the human-nature interface, but imagine the machine-nature interface—that is, actually letting loose a digital kind of information processing life form through the algorithms that are being developed and the commensurately complex, maybe much more complex. We’ll see. And so now the question becomes, what does the human do? 

And we’re living in a world which is human dominated, which means the humans say, “If I don’t understand it, it’s not real, basically. And if I don’t understand it, I can’t regulate it.” And we’re going to have to make peace with the fact that we’re not going to be able to predictably affect things without necessarily understanding them the way we could if we just forced ourselves to only work on problems we can understand. And that world we’re not ready for at all. 

LEE: Yeah. All right. So this one I predict is going to be a little harder for you because I think while you think about the future, you live very much in the present. But I’d like you to make some predictions about what the biotech and biopharmaceutical industries are going to be able to do two years from now, five years from now, 10 years from now. 

AFEYAN: Yeah, well, it’s hard for me because you know my nature, which is that I think this is all emergent. 

LEE: Right. 

AFEYAN: And so I would be the conceit of predicting. So I would say with likelihood positive predictive value of less than 10%, I’m happy to answer your question. So I’m not trying to score high [LAUGHTER] because I really think that my job is to envision it, not to predict it. And that’s a little bit different, right? 

LEE: Yeah, I actually was trying to pick what would be the hardest possible question I could ask you, [LAUGHTER] and this is what I came up with. 

AFEYAN: Yeah, no, no, I’m kidding here. So now look, I think that we will cross this threshold of understandability. And of course you’re seeing that in a lot of LLM things today. And of course, people are trying to train for things that are explainers and all that whole, there’s a whole world of that. But I think at some point we’re going to have to kind of let go and get comfortable working on things that, you know … 

I sometimes tell people, you know, and I’m not the first, but scientists and engineers are different, it’s said, in that engineers work on things that they don’t wait until they get a full understanding of before they work with them. Well, now scientists are going to have to get used to that, too, right? 

LEE: Yeah. Yeah. 

AFEYAN: Because insisting that it’s only valid if it’s understandable. So, I would say, look, I hope that the time … for example, I think major improvements will be made in patient selection. If we can test drugs on patients that are more synchronized as to the stage of their disease … 

LEE: Yep. 

AFEYAN: … I think the answer will be much better. We’re working on that. It’s a company called Etiome (opens in new tab), very, very early stage. It’s really beautiful data, very early data that shows that when we talk about MASH [metabolic dysfunction-associated steatohepatitis], liver disease, when we talk about Parkinson’s, there’s such a heterogeneity, not only of the subset type of the disease, but the stage of the disease, that this notion that you have stage one cancer, stage two cancer, again, nobody told nature there’s stages of that kind. It’s a continuum. 

But if you can synchronize based on training, kind of, the ability to detect who are the patients that are in enough of a close proximity that should be treated so that the trial—much smaller a trial size—could give you a drug, then afterwards, you can prescribe it using these approaches. 

Kind of we’re going to find that what we thought is one disease is more like 15 diseases. That’s bad news because we’re not going to be able to claim that we can treat everything which we can. It’s good news in that there’s going to be people who are going to start making much more specific solutions to things. 

LEE: Right. 

AFEYAN: So I can imagine that. I can imagine a generation of, kind of, students who are going to be able to play in this space without having 25 years of graduate education on the subject. So what is deemed knowledge sufficient to do creative things will change. I can go on and on, but I think all this is very close by and it’s very exciting. 

LEE: Noubar, I just always have so much fun, and I learn really a lot. It’s high-density learning when I talk to you. And so I hope our listeners feel the same way. It’s something I really appreciate. 

AFEYAN: Well, Peter, thanks for this. And I think your listeners know that if I was asking you questions, you would be answering them with equal if not more fascinating stuff. So, thanks for giving me the chance to do that today. 

[TRANSITION MUSIC] 

LEE: I’m always fascinated by Noubar’s perspectives on fundamental research and how it connects to human health and the building of successful companies. I see him as a classic “systems thinker,” and by that, I mean he builds impressive things like Flagship Pioneering itself, which he created as a kind of biomedical innovation system. 

In our conversation, I was really struck by the fact that he’s been thinking about the potential impact of transformers—transformers being the fundamental building block of large language models—as far back as 2017, when the first paper on the attention mechanism in transformers was published by Google. 

But, you know, it isn’t only about using AI to do things like understand and design molecules and antibodies faster. It’s interesting that he is also pushing really hard towards a future where AI might “close the loop” from hypothesis generation, to experiment design, to analysis, and so on. 

Now, here’s my conversation with Dr. Eric Topol: 

LEE: Eric, it’s really great to have you here. 

ERIC TOPOL: Oh, Peter, I’m thrilled to be here with you here at Microsoft. 

LEE: You’re a super famous person. Extremely well known to researchers even in computer science, as we have here at Microsoft Research. 

But the question I’d like to ask is, how would you explain to your parents what you do every day? 

TOPOL: [LAUGHS] That’s a good question. If I was just telling them I’m trying to come up with better ways to keep people healthy, that probably would be the easiest way to do it because if I ever got in deeper, I would lose them real quickly. They’re not around, but just thinking about what they could understand. 

LEE: Right. 

TOPOL: I think as long as they knew it was work centered on innovative paths to promoting and preserving human health, that would get to them, I think. 

LEE: OK, so now, kind of the second topic, and then we let the conversation flow, is about origin stories with respect to AI. And with most of our guests, you know, I factor that into two pieces: the encounters with AI before ChatGPT and what we call generative AI and then the first contacts after. 

And, of course, you have extensive contact with both now. But let’s start with how you got interested in machine learning and AI prior to ChatGPT. How did that happen? 

TOPOL: Yeah, it was out of necessity. So back, you know, when I started at Scripps at the end of ’06, we started accumulating, you know, massive datasets. First, it was whole genomes. We did one of the early big cohorts of 1,400 people of healthy aging. We called the Wellderly whole genome sequence (opens in new tab)

And then we started big in the sensor world, and then we started saying, what are we going to do with all this data, with electronic health records and all those sensors? And now we got whole genomes. 

And basically, what we were doing, we were in hoarding mode. We didn’t have a way to meaningfully analyze it. 

LEE: Right. 

TOPOL: You would read about how, you know, data is the new oil and, you know, gold and whatnot. But we just didn’t have a way to extract the juice. And even when we wanted to analyze genomes, it was incredibly laborious. 

LEE: Yeah. 

TOPOL: And we weren’t extracting a lot of the important information. So that’s why … not having any training in computer science, when I was doing the … about three years of work to do the book Deep Medicine, I started really, first auto-didactic about, you know, machine learning. And then I started contacting a lot of the real top people in the field and hanging out with them, and learning from them, getting their views as to, you know, where we are today, what models are coming in the future. 

And then I said, “You know what? We are going to be able to fix this mess.” [LAUGHS] We’re going to get out of the hoarding phase, and we’re going to get into, you know, really making a difference. 

So that’s when I embraced the future of AI. And I knew, you know, back—that was six years ago when it was published and probably eight or nine years ago when I was doing the research, and I knew that we weren’t there yet. 

You know, at the time, we were seeing the image interpretation. That was kind of the early promise. But really, the models that were transformative, the transformer models, they were incubating back in 2017. So people knew something was brewing. 

LEE: Right. Yes. 

TOPOL: And everyone said we’re going to get there. 

LEE: So then, ChatGPT comes out November of 2022; there’s GPT-4 in 2023, and now a lot has happened. Do you remember what your first encounter with that technology was? 

TOPOL: Oh, sure. First, ChatGPT. You know, in the last days of November ’22, I was just blown away. I mean, I’m having a conversation. I’m having fun. And this is humanoid responding to me. I said, “What?” You know? So that was to me, a moment I’ll never forget. And so I knew that the world was, you know, at a very kind of momentous changing point. 

Of course, knowing, too, that this is going to be built on, and built on quickly. Of course, I didn’t know how soon GPT-4 and all the others were going to come forward, but that was a wake-up call that the capabilities of AI had just made a humongous jump, which seemingly was all of a sudden, although I did know this had been percolating … 

LEE: Right. 

TOPOL: … you know, for what, at least five years, that, you know, it really was getting into its position to do this. 

LEE: I know one of the things that was challenging psychologically and emotionally for me is, it made me rethink a lot of things that were going on in Microsoft Research in areas like causal reasoning, natural language processing, speech processing, and so on. 

I’m imagining you must have had some emotional struggles too because you have this amazing book, Deep Medicine. Did you have to … did it go through your mind to rethink what you wrote in Deep Medicine in light of this or, or, you know, how did that feel? 

TOPOL: It’s funny you ask that because in this one chapter I have on the virtual health coach, I wrote a whole bunch of scenarios … 

LEE: Yeah. 

TOPOL: … that were very kind of futuristic. You know, about how the AI interacts with the person’s health and schedules their appointment for this and their scan and tells them what lab tests they should tell their doctor to have, and, you know, all these things. And I sent a whole bunch of these, thinking that they were a little too far-fetched. 

LEE: Yes. 

TOPOL: And I sent them to my editor when I wrote the book, and he says, “Oh, these are great. You should put them all in.” [LAUGHTER] What I didn’t realize is they weren’t that, you know, they were all going to happen. 

 LEE: Yeah. They weren’t that far-fetched at all. 

TOPOL: Not at all. If there’s one thing I’ve learned from all this, is our imagination isn’t big enough. 

 LEE: Yeah. 

TOPOL: We think too small. 

LEE: Now in our book that Carey, Zak, and I wrote, you know, we made, you know, we sort of guessed that GPT-4 might help biomedical researchers, but I don’t think that any of us had the thought in mind that the architecture around generative AI would be so directly applicable to, you know, say, protein structures or, you know, to clinical health records and so on. 

And so a lot of that seems much more obvious today. But two years ago, it wasn’t. But we did guess that biomedical researchers would find this interesting and be helped along. 

So as you reflect over the past two years, you know, do you have things that you think are very important, kind of, meaningful applications of generative AI in the kinds of research that Scripps does? 

TOPOL: Yeah. I mean, I think for one, you pointed out how the term generative AI is a misnomer. 

LEE: Yeah. 

TOPOL: And so it really was prescient about how, you know, it had a pluripotent capability in every respect, you know, of editing and creating. So that was something that I think was telling us, an indicator that this is, you know, a lot bigger than how it’s being labeled. And our expectations can actually be more than what we had seen previously with the earlier version. 

So I think what’s happened is that now, we keep jumping. It’s so quick that we can’t … you know, first we think, oh, well, we’ve gone into the agentic era, and then we could pass that with reasoning. [LAUGHTER] And, you know, we just can’t … 

LEE: Right. 

TOPOL: It’s just wild. 

LEE: Yeah. 

TOPOL: So I think so many of us now will put in prompts that will necessitate or ideally result in a not-immediate gratification, but rather one that requires, you know, quite a bit of combing through the corpus of knowledge … 

LEE: Yeah. 

TOPOL: … and getting, with all the citations, a report or a response. And I think now this has been a reset because to do that on our own, it takes, you know, many, many hours. And it’s usually incomplete. 

But one of the things that was so different in the beginning was you would get the references from up to a year and a half previously. 

LEE: Yep. 

TOPOL: And that’s not good enough. [LAUGHS] 

LEE: Right. 

TOPOL: And now you get references, like, from the day before. 

LEE: Yes. Yeah. 

TOPOL: And so, you say, “Why would you do a regular search for anything when you could do something like this?” 

LEE: Yeah. 

TOPOL: And then, you know, the reasoning power. And a lot of people who are not using this enough still are talking about, “Well, there’s no reasoning.” 

LEE: Yeah.

TOPOL: Which you dealt with really well in the book. But what, of course, you couldn’t have predicted is the new dimensions. 

LEE: Right. 

TOPOL: I think you nailed it with GPT-4. But it’s all these just, kind of, stepwise progressions that have been occurring because of the velocity that’s unprecedented. I just can’t believe it. 

LEE: We were aware of the idea of multi-modality, but we didn’t appreciate, you know, what that would mean. Like AlphaFold (opens in new tab) [protein structure database], you know, the ability for AI to understand—or crystal structures—to really start understanding something more fundamental about biochemistry or medicinal chemistry. 

I have to admit, when we wrote the book, we really had no idea. 

TOPOL: Well, I feel the same way. I still today can’t get over it because the reason AlphaFold and Demis [Hassabis] and John Jumper [AlphaFold’s co-creators] were so successful is there was this protein databank. 

LEE: Yes. 

TOPOL: And it had been kept for decades. And so, they had the substrate to work with. 

LEE: Right. 

TOPOL: So, you say, “OK, we can do proteins.” But then how do you do everything else? 

LEE: Right. 

TOPOL: And so this whole, what I call, “large language of life model” work, which has gone into high gear like I’ve never seen. 

LEE: Yeah. 

TOPOL: You know, now to this holy grail of a virtual cell, and … 

LEE: Yeah. 

TOPOL: You know, it’s basically … it’s … it was inspired by proteins. But now it’s hitting on, you know, ligands and small molecules, cells. I mean, nothing is being held back here. 

LEE: Yeah. 

TOPOL: So how could anybody have predicted that? 

LEE: Right. 

TOPOL: I sure wouldn’t have thought it would be possible at this point. 

LEE: Yeah. So just to challenge you, where do you think that is going to be two years from now? Five years from now? Ten years from now? Like, so you talk about a virtual cell. Is that achievable within 10 years, or is that still too far out? 

TOPOL: No, I think within 10 years for sure. You know the group that got assembled that Steve Quake (opens in new tab) pulled together? 

LEE: Right. 

TOPOL: I think has 42 authors in a paper (opens in new tab) in Cell. The fact that he could get these 42 experts in life science and some in computer science to come together and all agree … 

LEE: Yeah. 

TOPOL: … that not only is this a worthy goal, but it’s actually going to be realized, that was impressive. 

I challenged him about that. How did you get these people all to agree? So many of them were naysayers. And by the time the workshop finished, they were fully convinced. I think that what we’re seeing is so much progress happening so quickly. And then all the different models, you know, across DNA, RNA, and everything are just zooming forward. 

LEE: Yeah. 

TOPOL: And it’s just a matter of pulling this together. Now when we have that, and I think it could easily be well before a decade and possibly, you know, between the five- and 10-year mark—that’s just a guess—but then we’re moving into another era of life science because right now, you know, this whole buzz about drug discovery. 

LEE: Yep. 

TOPOL: It’s not… with the ability to do all these perturbations at a cellular level. 

LEE: Right. 

TOPOL: Or the cell of interest. 

LEE: Yeah. 

TOPOL: Or the cell-to-cell interactions or the intra-cell interaction. So once you nail that, yeah, it takes it to a kind of another predictive level that we haven’t really fathomed. So, yes, there’s going to be drug discovery that’s accelerated. But this would make that and also the underpinnings of diseases. 

LEE: Yeah. 

TOPOL: So the idea that there’s so many diseases we don’t understand now. And if you had virtual cell, … 

LEE: Yeah. 

TOPOL: … you would probably get to that answer … 

LEE: Yeah. 

TOPOL: … much more quickly. So whether it’s underpinnings of diseases or what it’s going to take to really come up with far better treatments—preventions—I think that’s where virtual cell will get us. 

LEE: There’s a technical question … I wonder if you have an opinion. You may or may not. There is sort of what I would refer to as ab initio approaches to this. You know, you start from the fundamental physics and chemistry, and we know the laws, we have the math and, you know, we can try to derive from there … in fact, we can even run simulations of that math to generate training data to build generative models and work up to a cell, or forget all of that and just take as many observations and measurements of, say, living cells as possible, and just have faith that hidden amongst all of the observational data, there is structure and language that can be derived. 

So that’s sort of bottom-up versus top-down approaches. Do you have an opinion about which way? 

TOPOL: Oh, I think you go after both. And clearly whenever you’re positing that you’ve got a virtual cell model that’s working, you’ve got to do the traditional methods as well to validate it, and … so all that. You know, I think if you’re going to go out after this seriously, you have to pull out all the stops. Both approaches, I think, are going to be essential. 

LEE: You know, if what you’re saying is true, and it is amazing to hear the confidence, the one thing I tried to explain to someone nontechnical is that for a lot of problems in medicine, we just don’t have enough data in a really profound way. And the most profound way to say that is, since Adam and Eve, there have only been an estimated 106 billion people who have ever lived. 

So even if we had the DNA of every human being, every individual of Homo sapiens, there are certain problems for which we would not have enough data. 

TOPOL: Sure. 

LEE: And so I think another thing that seems profound to me, if we can actually have a virtual cell, is we can actually make trillions of virtual … 

TOPOL: Yeah 

LEE: … human beings. The true genetic diversity could be realized for our species. 

TOPOL: I think you nailed it. The ability to have that type of data, no less synthetic data, I mean, it’s just extraordinary. 

LEE: Yeah. 

TOPOL: We will get there someday. I’m confident of that. We may be wrong in projections. And I do think [science writer] Philip Ball won’t be right that it will never happen, though. [LAUGHTER] No, I think that if there’s a holy grail of biology, this is it. 

LEE: Yeah. 

TOPOL: And I think you’re absolutely right about where that will get us. 

LEE: Yeah. 

TOPOL: Transcending the beginning of the species. 

LEE: Yeah. 

TOPOL: Of our species. 

LEE: Yeah. All right. So now, we’re starting to run short on time here. And so I wanted to ask you about, I’m in my 60s, so I actually think about this a lot more. [LAUGHTER] And I know you’ve been thinking a lot about longevity. And, of course, your new book, Super Agers

And one of the reasons I’m so eager to read is it’s a topic very top of mind for me and actually for a lot of people. Where is this going? Because this is another area where you hear so much hype. At the same time, you see Nobel laureate scientists … 

TOPOL: Yeah. 

LEE: … working on this. 

TOPOL: Yeah. 

LEE: So, so what’s, what’s real there? 

TOPOL: Yeah. Well, it’s really … the real deal is the science of aging is zooming forward. 

And that’s exciting. But I see it bifurcating. On the one hand, all these new ideas, strategies to reverse aging are very ambitious. Like cell reprogramming and senolytics and, you know, the rejuvenation of our thymus gland, and it’s a long list. 

LEE: Yeah. 

TOPOL: And they’re really cool science, and it used to be the mouse lived longer. Now it’s the old mouse looks really young. 

LEE: Yeah. Yeah. 

TOPOL: All the different features. A blind mouse with cataracts is all of a sudden there’s no cataracts. I mean, so these things are exciting, but none of them are proven in people, and they all have significant risk, no less, you know, the expense that might be attached. 

LEE: Right. 

TOPOL: And some people are jumping the gun. They’re taking rapamycin, which can really knock out their immune system. So they all carry a lot of risk. And people are just getting a little carried away. We’re not there yet. 

But the other side, which is what I emphasize in the book, which is exciting, is that we have all these new metrics that came out of the science of aging. 

LEE: Yes. 

TOPOL: So we have clocks of the body. Our biological clock versus our chronological clock, and we have organ clocks. So I can say, you know, Peter, we’ve assessed all your organs and your immune system. And guess what? Every one of them is either at or less than your actual age. 

LEE: Right. 

TOPOL: And that’s very reassuring. And by the way, your methylation clock is also … I don’t need to worry about you so much. And then I have these other tests that I can do now, like, for example, the brain. We have an amazing protein p-Tau217 that we can say over 20 years in advance of you developing Alzheimer’s, … 

LEE: Yeah. 

TOPOL: … we can look at that, and it’s modifiable by lifestyle, bringing it down. It should be you can change the natural history. So what we’ve seen is an explosion of knowledge of metrics, proteins, no less, you know, our understanding at the gene level, the gut microbiome, the immune system. So that’s what’s so exciting. How our immune system ages. Immunosenescence. How we have more inflammation—inflammaging—with aging. So basically, we have three diseases that kill us, that take away our health: heart, cancer, and neurodegenerative. 

LEE: Yep. 

TOPOL: And they all take more than 20 years. They all have a defective immune system inflammation problem, and they’re all going to be preventable. 

LEE: Yeah. 

TOPOL: That’s what’s so exciting. So we don’t have to have reverse aging. We can actually work on … 

LEE: Just prevent aging in the first place. 

TOPOL: the age-related diseases. So basically, what it means is: I got to find out if you have a risk, if you’re in this high-risk group for this particular condition, because if you are—and we have many levels, layers, orthogonal ways to check—we don’t just bank it all on one polygenic test. We’re going to have several ways, say this is the one we are going … 

And then we go into high surveillance, where, let’s say if it’s your brain, we do more p-Tau, if we need to do brain imaging—whatever it takes. And also, we do preventive treatments on top of the lifestyle [changes], that one of the problems we have today is a lot of people know generally, what are good lifestyle factors. Although, I go through a lot more than people generally acknowledge. 

But they don’t incorporate them because they don’t know that they’re at risk and they could change their … extend their health span and prevent that disease. So what I at least put out there, a blueprint, is how we can use AI, because it’s multimodal AI, with all these layers of data, and then temporally, it’s like today you could say if you have two protein tests, not only are you going to have Alzheimer’s, but within a two-year time frame when … 

LEE: Yep. 

TOPOL: … and if you don’t change things, if we don’t gear up … you know, we can … we can completely prevent this, so … or at least defer it for a decade or more. So that’s why I’m excited, is that we made these strides in the science of aging. But we haven’t acknowledged the part that doesn’t require reversing aging. There’s this much less flashy, attainable, less risky approach … 

LEE: Yeah. 

TOPOL: than the one that … when you reverse aging, you’re playing with the hallmarks of cancer. They are like, if you look at the hallmarks of cancer … 

LEE: That has been one of the primary challenges. 

TOPOL: They’re lined up. 

LEE: Yeah. 

TOPOL: They’re all the same, you know, whether it’s telomeres, or whether it’s … you know … so this is the problem. I actually say in the book, I do think one of these—we have so many shots on goal—one of these reverse aging things will likely happen someday. But we’re nowhere close. 

On the other hand, let’s gear up. Let’s do what we can do. Because we have these new metrics that’s … people don’t … like, when I read the organ clock paper (opens in new tab) from Tony Wyss-Coray from Stanford. It was published end of ’23; it was the cover of Nature. It blew me away. 

LEE: Yeah. 

TOPOL: And I wrote a Substack (opens in new tab) [article] on it. And Tony said, “Well, that’s so nice of you.” I said, “So nice? This is revolutionary, you know.” [LAUGHTER] So … 

LEE: By the way, what’s so interesting is, how these things, this kind of understanding and AI, are coming together.

TOPOL: Yes. 

LEE: It’s almost eerie the timing of these things. 

TOPOL: Absolutely. Because you couldn’t take all these layers of data, just like we were talking about data hoarding.

LEE: Yep.

TOPOL: Now we have data hoarding on individual with no way to be able to make these assessments of what level of risk, when, what are we going to do in this individual to prevent that? We can do that now. 

We can do it today. And we could keep building on that. So I’m really excited about it. I think that, you know, when I wrote the last book on deep medicine, it was our overarching goal should be to bring back the patient-doctor relationship. I’m an old dog, and I know what it used to be when I got out of medical school. 

It’s totally … you couldn’t imagine how much erosion from the ’70s, ’80s to now. But now I have a new overarching goal. I’m thinking that that still is really important—humanity in medicine—but let’s prevent these three … big three diseases because it’s an opportunity that we’re not … you know, in medicine, all my life we’ve been hearing and talking about we need to prevent diseases. 

Curing is much harder than prevention. And the economics. Oh my gosh. But we haven’t done it. 

LEE: Yeah. 

TOPOL: Now we can do it. Primary prevention. We’d do really well. Somebody’s had heart attack. 

LEE: Yeah. 

TOPOL: Oh, we’re going to get all over it. Why did they have a heart attack in the first place? 

LEE: Well, the thing that makes so much sense in what you’re saying is that we understand we have an understanding both economically and medically that prevention is a good thing. And extending the concept of prevention to these age-related conditions, I think, makes all the sense in the world. 

You know, Eric, maybe on that optimistic note, it’s time to wrap up this conversation. Really appreciate you coming. Let me just brag in closing that I’m now the proud owner of an autographed copy of your latest book, and, really, thank you for that. 

TOPOL: Oh, thank you. I could spend the rest of the day talking to you. I’ve really enjoyed it. Thanks. 

[TRANSITION MUSIC] 

LEE: For me, the biggest takeaway from our conversation was Eric’s supremely optimistic predictions about what AI will allow us to do in much less than 10 years. 

You know, for me personally, I started off several years ago with the typical techie naivete that if we could solve protein folding using machine learning, we would solve human biology. But as I’ve gotten smarter, I’ve realized that things are way, way more complicated than that, and so hearing Eric’s techno-optimism on this is really both heartening and so interesting. 

Another thing that really caught my attention are Eric’s views on AI in medical diagnosis. That really stood out to me because within our labs here at Microsoft Research, we have been doing a lot of work on this, for example in creating foundation models for whole-slide digital pathology. 

The bottom line, though, is that biomedical research and development is really changing and changing quickly. It’s something that we thought about and wrote briefly about in our book, but just hearing it from these three people gives me reason to believe that this is going to create tremendous benefits in the diagnosis and treatment of disease. 

And in fact, I wonder now how regulators, such as the Food and Drug Administration here in the United States, will be able to keep up with what might become a really big increase in the number of animal and human studies that need to be approved. On this point, it’s clear that the FDA and other regulators will need to use AI to help process the likely rise in the pace of discovery and experimentation. And so stay tuned for more information about that. 

[THEME MUSIC] 

I’d like to thank Daphne, Noubar, and Eric again for their time and insights. And to our listeners, thank you for joining us. There are several episodes left in the series, including discussions on medical students’ experiences with AI and AI’s influence on the operation of health systems and public health departments. We hope you’ll continue to tune in. 

Until next time. 

[MUSIC FADES] 

The post How AI will accelerate biomedical research and discovery appeared first on Microsoft Research.

]]>
How AI is reshaping the future of healthcare and medical research http://approjects.co.za/?big=en-us/research/podcast/how-ai-is-reshaping-the-future-of-healthcare-and-medical-research/ Thu, 12 Jun 2025 16:17:04 +0000 http://approjects.co.za/?big=en-us/research/?p=1141685 Technologists Bill Gates and Sébastien Bubeck discuss the state of generative AI in medicine, how access to “medical intelligence” might help empower people across healthcare, and how AI’s accelerating improvements are likely to affect both delivery and discovery.

The post How AI is reshaping the future of healthcare and medical research appeared first on Microsoft Research.

]]>

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this episode, Microsoft co-founder and Gates Foundation Chair Bill Gates (opens in new tab) and OpenAI research lead Sébastien Bubeck (opens in new tab), formerly Microsoft’s VP of AI, join Lee to discuss how they’re seeing generative AI’s adoption in healthcare unfolding globally and the opportunities for further adoption, such as the development of proper benchmarks. Together, the three use insights drawn from unparalleled access to the continuing evolution of AI to explore the yet untapped potential of the technology to empower clinicians and patients alike and talk about the urgency to create AI-driven healthcare systems in underserved countries. They also reflect on the distinction between healthcare delivery and healthcare discovery and how the type and pace of change brought on by AI may differ for each. 

Transcript

[MUSIC]     

[BOOK PASSAGE]  

PETER LEE: “In ‘The Little Black Bag,’ a classic science fiction story, a high-tech doctor’s kit of the future is accidentally transported back to the 1950s, into the shaky hands of a washed-up, alcoholic doctor. The ultimate medical tool, it redeems the doctor wielding it, allowing him to practice gratifyingly heroic medicine. … The tale ends badly for the doctor and his treacherous assistant, but it offered a picture of how advanced technology could transform medicine—powerful when it was written nearly 75 years ago and still so today. What would be the Al equivalent of that little black bag? At this moment when new capabilities are emerging, how do we imagine them into medicine?”  

[END OF BOOK PASSAGE]    

[THEME MUSIC]    

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.  

[THEME MUSIC FADES]

The book passage I read at the top is from “Chapter 10: The Big Black Bag.” 

In imagining AI in medicine, Carey, Zak, and I included in our book two fictional accounts. In the first, a medical resident consults GPT-4 on her personal phone as the patient in front of her crashes. Within seconds, it offers an alternate response based on recent literature. In the second account, a 90-year-old woman with several chronic conditions is living independently and receiving near-constant medical support from an AI aide.   

In our conversations with the guests we’ve spoken to so far, we’ve caught a glimpse of these predicted futures, seeing how clinicians and patients are actually using AI today and how developers are leveraging the technology in the healthcare products and services they’re creating. In fact, that first fictional account isn’t so fictional after all, as most of the doctors in the real world actually appear to be using AI at least occasionally—and sometimes much more than occasionally—to help in their daily clinical work. And as for the second fictional account, which is more of a science fiction account, it seems we are indeed on the verge of a new way of delivering and receiving healthcare, though the future is still very much open. 

As we continue to examine the current state of AI in healthcare and its potential to transform the field, I’m pleased to welcome Bill Gates and Sébastien Bubeck.  

Bill may be best known as the co-founder of Microsoft, having created the company with his childhood friend Paul Allen in 1975. He’s now the founder of Breakthrough Energy, which aims to advance clean energy innovation, and TerraPower, a company developing groundbreaking nuclear energy and science technologies. He also chairs the world’s largest philanthropic organization, the Gates Foundation, and focuses on solving a variety of health challenges around the globe and here at home. 

Sébastien is a research lead at OpenAI. He was previously a distinguished scientist, vice president of AI, and a colleague of mine here at Microsoft, where his work included spearheading the development of the family of small language models known as Phi. While at Microsoft, he also coauthored the discussion-provoking 2023 paper “Sparks of Artificial General Intelligence,” which presented the results of early experiments with GPT-4 conducted by a small team from Microsoft Research.   

[TRANSITION MUSIC]  

Here’s my conversation with Bill Gates and Sébastien Bubeck. 

LEE: Bill, welcome. 

BILL GATES: Thank you. 

LEE: Seb … 

SÉBASTIEN BUBECK: Yeah. Hi, hi, Peter. Nice to be here. 

LEE: You know, one of the things that I’ve been doing just to get the conversation warmed up is to talk about origin stories, and what I mean about origin stories is, you know, what was the first contact that you had with large language models or the concept of generative AI that convinced you or made you think that something really important was happening? 

And so, Bill, I think I’ve heard the story about, you know, the time when the OpenAI folks—Sam Altman, Greg Brockman, and others—showed you something, but could we hear from you what those early encounters were like and what was going through your mind?  

GATES: Well, I’d been visiting OpenAI soon after it was created to see things like GPT-2 and to see the little arm they had that was trying to match human manipulation and, you know, looking at their games like Dota that they were trying to get as good as human play. And honestly, I didn’t think the language model stuff they were doing, even when they got to GPT-3, would show the ability to learn, you know, in the same sense that a human reads a biology book and is able to take that knowledge and access it not only to pass a test but also to create new medicines. 

And so my challenge to them was that if their LLM could get a five on the advanced placement biology test, then I would say, OK, it took biologic knowledge and encoded it in an accessible way and that I didn’t expect them to do that very quickly but it would be profound.  

And it was only about six months after I challenged them to do that, that an early version of GPT-4 they brought up to a dinner at my house, and in fact, it answered most of the questions that night very well. The one it got totally wrong, we were … because it was so good, we kept thinking, Oh, we must be wrong. It turned out it was a math weakness [LAUGHTER] that, you know, we later understood that that was an area of, weirdly, of incredible weakness of those early models. But, you know, that was when I realized, OK, the age of cheap intelligence was at its beginning. 

LEE: Yeah. So I guess it seems like you had something similar to me in that my first encounters, I actually harbored some skepticism. Is it fair to say you were skeptical before that? 

GATES: Well, the idea that we’ve figured out how to encode and access knowledge in this very deep sense without even understanding the nature of the encoding, … 

LEE: Right.  

GATES: … that is a bit weird.  

LEE: Yeah. 

GATES: We have an algorithm that creates the computation, but even say, OK, where is the president’s birthday stored in there? Where is this fact stored in there? The fact that even now when we’re playing around, getting a little bit more sense of it, it’s opaque to us what the semantic encoding is, it’s, kind of, amazing to me. I thought the invention of knowledge storage would be an explicit way of encoding knowledge, not an implicit statistical training. 

LEE: Yeah, yeah. All right. So, Seb, you know, on this same topic, you know, I got—as we say at Microsoft—I got pulled into the tent. [LAUGHS] 

BUBECK: Yes.  

LEE: Because this was a very secret project. And then, um, I had the opportunity to select a small number of researchers in MSR [Microsoft Research] to join and start investigating this thing seriously. And the first person I pulled in was you

BUBECK: Yeah. 

LEE: And so what were your first encounters? Because I actually don’t remember what happened then. 

BUBECK: Oh, I remember it very well. [LAUGHS] My first encounter with GPT-4 was in a meeting with the two of you, actually. But my kind of first contact, the first moment where I realized that something was happening with generative AI, was before that. And I agree with Bill that I also wasn’t too impressed by GPT-3. 

I though that it was kind of, you know, very naturally mimicking the web, sort of parroting what was written there in a nice way. Still in a way which seemed very impressive. But it wasn’t really intelligent in any way. But shortly after GPT-3, there was a model before GPT-4 that really shocked me, and this was the first image generation model, DALL-E 1. 

So that was in 2021. And I will forever remember the press release of OpenAI where they had this prompt of an avocado chair and then you had this image of the avocado chair. [LAUGHTER] And what really shocked me is that clearly the model kind of “understood” what is a chair, what is an avocado, and was able to merge those concepts. 

So this was really, to me, the first moment where I saw some understanding in those models.  

LEE: So this was, just to get the timing right, that was before I pulled you into the tent. 

BUBECK: That was before. That was like a year before. 

LEE: Right.  

BUBECK: And now I will tell you how, you know, we went from that moment to the meeting with the two of you and GPT-4. 

So once I saw this kind of understanding, I thought, OK, fine. It understands concept, but it’s still not able to reason. It cannot—as, you know, Bill was saying—it cannot learn from your document. It cannot reason.  

So I set out to try to prove that. You know, this is what I was in the business of at the time, trying to prove things in mathematics. So I was trying to prove that basically autoregressive transformers could never reason. So I was trying to prove this. And after a year of work, I had something reasonable to show. And so I had the meeting with the two of you, and I had this example where I wanted to say, there is no way that an LLM is going to be able to do x

And then as soon as I … I don’t know if you remember, Bill. But as soon as I said that, you said, oh, but wait a second. I had, you know, the OpenAI crew at my house recently, and they showed me a new model. Why don’t we ask this new model this question?  

LEE: Yeah.  

BUBECK: And we did, and it solved it on the spot. And that really, honestly, just changed my life. Like, you know, I had been working for a year trying to say that this was impossible. And just right there, it was shown to be possible.  

LEE: [LAUGHS] One of the very first things I got interested in—because I was really thinking a lot about healthcare—was healthcare and medicine. 

And I don’t know if the two of you remember, but I ended up doing a lot of tests. I ran through, you know, step one and step two of the US Medical Licensing Exam. Did a whole bunch of other things. I wrote this big report. It was, you know, I can’t remember … a couple hundred pages.  

And I needed to share this with someone. I didn’t … there weren’t too many people I could share it with. So I sent, I think, a copy to you, Bill. Sent a copy to you, Seb.  

I hardly slept for about a week putting that report together. And, yeah, and I kept working on it. But I was far from alone. I think everyone who was in the tent, so to speak, in those early days was going through something pretty similar. All right. So I think … of course, a lot of what I put in the report also ended up being examples that made it into the book. 

But the main purpose of this conversation isn’t to reminisce about [LAUGHS] or indulge in those reminiscences but to talk about what’s happening in healthcare and medicine. And, you know, as I said, we wrote this book. We did it very, very quickly. Seb, you helped. Bill, you know, you provided a review and some endorsements. 

But, you know, honestly, we didn’t know what we were talking about because no one had access to this thing. And so we just made a bunch of guesses. So really, the whole thing I wanted to probe with the two of you is, now with two years of experience out in the world, what, you know, what do we think is happening today? 

You know, is AI actually having an impact, positive or negative, on healthcare and medicine? And what do we now think is going to happen in the next two years, five years, or 10 years? And so I realize it’s a little bit too abstract to just ask it that way. So let me just try to narrow the discussion and guide us a little bit.  

Um, the kind of administrative and clerical work, paperwork, around healthcare—and we made a lot of guesses about that—that appears to be going well, but, you know, Bill, I know we’ve discussed that sometimes that you think there ought to be a lot more going on. Do you have a viewpoint on how AI is actually finding its way into reducing paperwork? 

GATES: Well, I’m stunned … I don’t think there should be a patient-doctor meeting where the AI is not sitting in and both transcribing, offering to help with the paperwork, and even making suggestions, although the doctor will be the one, you know, who makes the final decision about the diagnosis and whatever prescription gets done.  

It’s so helpful. You know, when that patient goes home and their, you know, son who wants to understand what happened has some questions, that AI should be available to continue that conversation. And the way you can improve that experience and streamline things and, you know, involve the people who advise you. I don’t understand why that’s not more adopted, because there you still have the human in the loop making that final decision. 

But even for, like, follow-up calls to make sure the patient did things, to understand if they have concerns and knowing when to escalate back to the doctor, the benefit is incredible. And, you know, that thing is ready for prime time. That paradigm is ready for prime time, in my view. 

LEE: Yeah, there are some good products, but it seems like the number one use right now—and we kind of got this from some of the previous guests in previous episodes—is the use of AI just to respond to emails from patients. [LAUGHTER] Does that make sense to you? 

BUBECK: Yeah. So maybe I want to second what Bill was saying but maybe take a step back first. You know, two years ago, like, the concept of clinical scribes, which is one of the things that we’re talking about right now, it would have sounded, in fact, it sounded two years ago, borderline dangerous. Because everybody was worried about hallucinations. What happened if you have this AI listening in and then it transcribes, you know, something wrong? 

Now, two years later, I think it’s mostly working. And in fact, it is not yet, you know, fully adopted. You’re right. But it is in production. It is used, you know, in many, many places. So this rate of progress is astounding because it wasn’t obvious that we would be able to overcome those obstacles of hallucination. It’s not to say that hallucinations are fully solved. In the case of the closed system, they are.  

Now, I think more generally what’s going on in the background is that there is something that we, that certainly I, underestimated, which is this management overhead. So I think the reason why this is not adopted everywhere is really a training and teaching aspect. People need to be taught, like, those systems, how to interact with them. 

And one example that I really like, a study that recently appeared where they tried to use ChatGPT for diagnosis and they were comparing doctors without and with ChatGPT (opens in new tab). And the amazing thing … so this was a set of cases where the accuracy of the doctors alone was around 75%. ChatGPT alone was 90%. So that’s already kind of mind blowing. But then the kicker is that doctors with ChatGPT was 80%.  

Intelligence alone is not enough. It’s also how it’s presented, how you interact with it. And ChatGPT, it’s an amazing tool. Obviously, I absolutely love it. But it’s not … you don’t want a doctor to have to type in, you know, prompts and use it that way. 

It should be, as Bill was saying, kind of running continuously in the background, sending you notifications. And you have to be really careful of the rate at which those notifications are being sent. Because if they are too frequent, then the doctor will learn to ignore them. So you have to … all of those things matter, in fact, at least as much as the level of intelligence of the machine. 

LEE: One of the things I think about, Bill, in that scenario that you described, doctors do some thinking about the patient when they write the note. So, you know, I’m always a little uncertain whether it’s actually … you know, you wouldn’t necessarily want to fully automate this, I don’t think. Or at least there needs to be some prompt to the doctor to make sure that the doctor puts some thought into what happened in the encounter with the patient. Does that make sense to you at all? 

GATES: At this stage, you know, I’d still put the onus on the doctor to write the conclusions and the summary and not delegate that. 

The tradeoffs you make a little bit are somewhat dependent on the situation you’re in. If you’re in Africa, where most people never meet a real doctor their entire life, the idea of being able to have some of this advice and diagnosis is extremely advantageous because you’re comparing it to nothing. 

So, yes, the doctor’s still going to have to do a lot of work, but just the quality of letting the patient and the people around them interact and ask questions and have things explained, that alone is such a quality improvement. It’s mind blowing.  

LEE: So since you mentioned, you know, Africa—and, of course, this touches on the mission and some of the priorities of the Gates Foundation and this idea of democratization of access to expert medical care—what’s the most interesting stuff going on right now? Are there people and organizations or technologies that are impressing you or that you’re tracking? 

GATES: Yeah. So the Gates Foundation has given out a lot of grants to people in Africa doing education, agriculture but more healthcare examples than anything. And the way these things start off, they often start out either being patient-centric in a narrow situation, like, OK, I’m a pregnant woman; talk to me. Or, I have infectious disease symptoms; talk to me. Or they’re connected to a health worker where they’re helping that worker get their job done. And we have lots of pilots out, you know, in both of those cases.  

The dream would be eventually to have the thing the patient consults be so broad that it’s like having a doctor available who understands the local things.  

LEE: Right.  

GATES: We’re not there yet. But over the next two or three years, you know, particularly given the worsening financial constraints against African health systems, where the withdrawal of money has been dramatic, you know, figuring out how to take this—what I sometimes call “free intelligence”—and build a quality health system around that, we will have to be more radical in low-income countries than any rich country is ever going to be.  

LEE: Also, there’s maybe a different regulatory environment, so some of those things maybe are easier? Because right now, I think the world hasn’t figured out how to and whether to regulate, let’s say, an AI that might give a medical diagnosis or write a prescription for a medication. 

BUBECK: Yeah. I think one issue with this, and it’s also slowing down the deployment of AI in healthcare more generally, is a lack of proper benchmark. Because, you know, you were mentioning the USMLE [United States Medical Licensing Examination], for example. That’s a great test to test human beings and their knowledge of healthcare and medicine. But it’s not a great test to give to an AI. 

It’s not asking the right questions. So finding what are the right questions to test whether an AI system is ready to give diagnosis in a constrained setting, that’s a very, very important direction, which to my surprise, is not yet accelerating at the rate that I was hoping for. 

LEE: OK, so that gives me an excuse to get more now into the core AI tech because something I’ve discussed with both of you is this issue of what are the right tests. And you both know the very first test I give to any new spin of an LLM is I present a patient, the results—a mythical patient—the results of my physical exam, my mythical physical exam. Maybe some results of some initial labs. And then I present or propose a differential diagnosis. And if you’re not in medicine, a differential diagnosis you can just think of as a prioritized list of the possible diagnoses that fit with all that data. And in that proposed differential, I always intentionally make two mistakes. 

I make a textbook technical error in one of the possible elements of the differential diagnosis, and I have an error of omission. And, you know, I just want to know, does the LLM understand what I’m talking about? And all the good ones out there do now. But then I want to know, can it spot the errors? And then most importantly, is it willing to tell me I’m wrong, that I’ve made a mistake?  

That last piece seems really hard for AI today. And so let me ask you first, Seb, because at the time of this taping, of course, there was a new spin of GPT-4o last week that became overly sycophantic. In other words, it was actually prone in that test of mine not only to not tell me I’m wrong, but it actually praised me for the creativity of my differential. [LAUGHTER] What’s up with that? 

BUBECK: Yeah, I guess it’s a testament to the fact that training those models is still more of an art than a science. So it’s a difficult job. Just to be clear with the audience, we have rolled back that [LAUGHS] version of GPT-4o, so now we don’t have the sycophant version out there. 

Yeah, no, it’s a really difficult question. It has to do … as you said, it’s very technical. It has to do with the post-training and how, like, where do you nudge the model? So, you know, there is this very classical by now technique called RLHF [reinforcement learning from human feedback], where you push the model in the direction of a certain reward model. So the reward model is just telling the model, you know, what behavior is good, what behavior is bad. 

But this reward model is itself an LLM, and, you know, Bill was saying at the very beginning of the conversation that we don’t really understand how those LLMs deal with concepts like, you know, where is the capital of France located? Things like that. It is the same thing for this reward model. We don’t know why it says that it prefers one output to another, and whether this is correlated with some sycophancy is, you know, something that we discovered basically just now. That if you push too hard in optimization on this reward model, you will get a sycophant model. 

So it’s kind of … what I’m trying to say is we became too good at what we were doing, and we ended up, in fact, in a trap of the reward model. 

LEE: I mean, you do want … it’s a difficult balance because you do want models to follow your desires and … 

BUBECK: It’s a very difficult, very difficult balance. 

LEE: So this brings up then the following question for me, which is the extent to which we think we’ll need to have specially trained models for things. So let me start with you, Bill. Do you have a point of view on whether we will need to, you know, quote-unquote take AI models to med school? Have them specially trained? Like, if you were going to deploy something to give medical care in underserved parts of the world, do we need to do something special to create those models? 

GATES: We certainly need to teach them the African languages and the unique dialects so that the multimedia interactions are very high quality. We certainly need to teach them the disease prevalence and unique disease patterns like, you know, neglected tropical diseases and malaria. So we need to gather a set of facts that somebody trying to go for a US customer base, you know, wouldn’t necessarily have that in there. 

Those two things are actually very straightforward because the additional training time is small. I’d say for the next few years, we’ll also need to do reinforcement learning about the context of being a doctor and how important certain behaviors are. Humans learn over the course of their life to some degree that, I’m in a different context and the way I behave in terms of being willing to criticize or be nice, you know, how important is it? Who’s here? What’s my relationship to them?  

Right now, these machines don’t have that broad social experience. And so if you know it’s going to be used for health things, a lot of reinforcement learning of the very best humans in that context would still be valuable. Eventually, the models will, having read all the literature of the world about good doctors, bad doctors, it’ll understand as soon as you say, “I want you to be a doctor diagnosing somebody.” All of the implicit reinforcement that fits that situation, you know, will be there.

LEE: Yeah.

GATES: And so I hope three years from now, we don’t have to do that reinforcement learning. But today, for any medical context, you would want a lot of data to reinforce tone, willingness to say things when, you know, there might be something significant at stake. 

LEE: Yeah. So, you know, something Bill said, kind of, reminds me of another thing that I think we missed, which is, the context also … and the specialization also pertains to different, I guess, what we still call “modes,” although I don’t know if the idea of multimodal is the same as it was two years ago. But, you know, what do you make of all of the hubbub around—in fact, within Microsoft Research, this is a big deal, but I think we’re far from alone—you know, medical images and vision, video, proteins and molecules, cell, you know, cellular data and so on. 

BUBECK: Yeah. OK. So there is a lot to say to everything … to the last, you know, couple of minutes. Maybe on the specialization aspect, you know, I think there is, hiding behind this, a really fundamental scientific question of whether eventually we have a singular AGI [artificial general intelligence] that kind of knows everything and you can just put, you know, explain your own context and it will just get it and understand everything. 

That’s one vision. I have to say, I don’t particularly believe in this vision. In fact, we humans are not like that at all. I think, hopefully, we are general intelligences, yet we have to specialize a lot. And, you know, I did myself a lot of RL, reinforcement learning, on mathematics. Like, that’s what I did, you know, spent a lot of time doing that. And I didn’t improve on other aspects. You know, in fact, I probably degraded in other aspects. [LAUGHTER] So it’s … I think it’s an important example to have in mind. 

LEE: I think I might disagree with you on that, though, because, like, doesn’t a model have to see both good science and bad science in order to be able to gain the ability to discern between the two? 

BUBECK: Yeah, no, that absolutely. I think there is value in seeing the generality, in having a very broad base. But then you, kind of, specialize on verticals. And this is where also, you know, open-weights model, which we haven’t talked about yet, are really important because they allow you to provide this broad base to everyone. And then you can specialize on top of it. 

LEE: So we have about three hours of stuff to talk about, but our time is actually running low.

BUBECK: Yes, yes, yes.  

LEE: So I think I want … there’s a more provocative question. It’s almost a silly question, but I need to ask it of the two of you, which is, is there a future, you know, where AI replaces doctors or replaces, you know, medical specialties that we have today? So what does the world look like, say, five years from now? 

GATES: Well, it’s important to distinguish healthcare discovery activity from healthcare delivery activity. We focused mostly on delivery. I think it’s very much within the realm of possibility that the AI is not only accelerating healthcare discovery but substituting for a lot of the roles of, you know, I’m an organic chemist, or I run various types of assays. I can see those, which are, you know, testable-output-type jobs but with still very high value, I can see, you know, some replacement in those areas before the doctor.  

The doctor, still understanding the human condition and long-term dialogues, you know, they’ve had a lifetime of reinforcement of that, particularly when you get into areas like mental health. So I wouldn’t say in five years, either people will choose to adopt it, but it will be profound that there’ll be this nearly free intelligence that can do follow-up, that can help you, you know, make sure you went through different possibilities. 

And so I’d say, yes, we’ll have doctors, but I’d say healthcare will be massively transformed in its quality and in efficiency by AI in that time period. 

LEE: Is there a comparison, useful comparison, say, between doctors and, say, programmers, computer programmers, or doctors and, I don’t know, lawyers? 

GATES: Programming is another one that has, kind of, a mathematical correctness to it, you know, and so the objective function that you’re trying to reinforce to, as soon as you can understand the state machines, you can have something that’s “checkable”; that’s correct. So I think programming, you know, which is weird to say, that the machine will beat us at most programming tasks before we let it take over roles that have deep empathy, you know, physical presence and social understanding in them. 

LEE: Yeah. By the way, you know, I fully expect in five years that AI will produce mathematical proofs that are checkable for validity, easily checkable, because they’ll be written in a proof-checking language like Lean or something but will be so complex that no human mathematician can understand them. I expect that to happen.  

I can imagine in some fields, like cellular biology, we could have the same situation in the future because the molecular pathways, the chemistry, biochemistry of human cells or living cells is as complex as any mathematics, and so it seems possible that we may be in a state where in wet lab, we see, Oh yeah, this actually works, but no one can understand why. 

BUBECK: Yeah, absolutely. I mean, I think I really agree with Bill’s distinction of the discovery and the delivery, and indeed, the discovery’s when you can check things, and at the end, there is an artifact that you can verify. You know, you can run the protocol in the wet lab and see [if you have] produced what you wanted. So I absolutely agree with that.  

And in fact, you know, we don’t have to talk five years from now. I don’t know if you know, but just recently, there was a paper that was published on a scientific discovery using o3- mini (opens in new tab). So this is really amazing. And, you know, just very quickly, just so people know, it was about this statistical physics model, the frustrated Potts model, which has to do with coloring, and basically, the case of three colors, like, more than two colors was open for a long time, and o3 was able to reduce the case of three colors to two colors.  

LEE: Yeah. 

BUBECK: Which is just, like, astounding. And this is not … this is now. This is happening right now. So this is something that I personally didn’t expect it would happen so quickly, and it’s due to those reasoning models.  

Now, on the delivery side, I would add something more to it for the reason why doctors and, in fact, lawyers and coders will remain for a long time, and it’s because we still don’t understand how those models generalize. Like, at the end of the day, we are not able to tell you when they are confronted with a really new, novel situation, whether they will work or not. 

Nobody is able to give you that guarantee. And I think until we understand this generalization better, we’re not going to be willing to just let the system in the wild without human supervision. 

LEE: But don’t human doctors, human specialists … so, for example, a cardiologist sees a patient in a certain way that a nephrologist … 

BUBECK: Yeah.

LEE: … or an endocrinologist might not.

BUBECK: That’s right. But another cardiologist will understand and, kind of, expect a certain level of generalization from their peer. And this, we just don’t have it with AI models. Now, of course, you’re exactly right. That generalization is also hard for humans. Like, if you have a human trained for one task and you put them into another task, then you don’t … you often don’t know. But you have other examples. So if you have two humans that were trained on a task and you put them on another one, then you kind of expect that they will do the same on the other task. 

LEE: OK. You know, the podcast is focused on what’s happened over the last two years. But now, I’d like one provocative prediction about what you think the world of AI and medicine is going to be at some point in the future. You pick your timeframe. I don’t care if it’s two years or 20 years from now, but, you know, what do you think will be different about AI in medicine in that future than today? 

BUBECK: Yeah, I think the deployment is going to accelerate soon. Like, we’re really not missing very much. There is this enormous capability overhang. Like, even if progress completely stopped, with current systems, we can do a lot more than what we’re doing right now. So I think this will … this has to be realized, you know, sooner rather than later. 

And I think it’s probably dependent on these benchmarks and proper evaluation and tying this with regulation. So these are things that take time in human society and for good reason. But now we already are at two years; you know, give it another two years and it should be really …  

LEE: Will AI prescribe your medicines? Write your prescriptions? 

BUBECK: I think yes. I think yes. 

LEE: OK. Bill? 

GATES: Well, I think the next two years, we’ll have massive pilots, and so the amount of use of the AI, still in a copilot-type mode, you know, we should get millions of patient visits, you know, both in general medicine and in the mental health side, as well. And I think that’s going to build up both the data and the confidence to give the AI some additional autonomy. You know, are you going to let it talk to you at night when you’re panicked about your mental health with some ability to escalate?  

And, you know, I’ve gone so far as to tell politicians with national health systems that if they deploy AI appropriately, that the quality of care, the overload of the doctors, the improvement in the economics will be enough that their voters will be stunned because they just don’t expect this, and, you know, they could be reelected [LAUGHTER] just on this one thing of fixing what is a very overloaded and economically challenged health system in these rich countries. 

You know, my personal role is going to be to make sure that in the poorer countries, there isn’t some lag; in fact, in many cases, that we’ll be more aggressive because, you know, we’re comparing to having no access to doctors at all. And, you know, so I think whether it’s India or Africa, there’ll be lessons that are globally valuable because we need medical intelligence. And, you know, thank god AI is going to provide a lot of that. 

LEE: Well, on that optimistic note, I think that’s a good way to end. Bill, Seb, really appreciate all of this.  

I think the most fundamental prediction we made in the book is that AI would actually find its way into the practice of medicine, and I think that that at least has come true, maybe in different ways than we expected, but it’s come true, and I think it’ll only accelerate from here. So thanks again, both of you. 

[TRANSITION MUSIC] 

GATES: Yeah. Thanks, you guys. 

BUBECK: Thank you, Peter. Thanks, Bill. 

LEE: I just always feel such a sense of privilege to have a chance to interact and actually work with people like Bill and Sébastien.   

With Bill, I’m always amazed at how practically minded he is. He’s really thinking about the nuts and bolts of what AI might be able to do for people, and his thoughts about underserved parts of the world, the idea that we might actually be able to empower people with access to expert medical knowledge, I think is both inspiring and amazing.  

And then, Seb, Sébastien Bubeck, he’s just absolutely a brilliant mind. He has a really firm grip on the deep mathematics of artificial intelligence and brings that to bear in his research and development work. And where that mathematics takes him isn’t just into the nuts and bolts of algorithms but into philosophical questions about the nature of intelligence.  

One of the things that Sébastien brought up was the state of evaluation of AI systems. And indeed, he was fairly critical in our conversation. But of course, the world of AI research and development is just moving so fast, and indeed, since we recorded our conversation, OpenAI, in fact, released a new evaluation metric that is directly relevant to medical applications, and that is something called HealthBench (opens in new tab). And Microsoft Research also released a new evaluation approach or process called ADeLe.  

HealthBench and ADeLe are examples of new approaches to evaluating AI models that are less about testing their knowledge and ability to pass multiple-choice exams and instead are evaluation approaches designed to assess how well AI models are able to complete tasks that actually arise every day in typical healthcare or biomedical research settings. These are examples of really important good work that speak to how well AI models work in the real world of healthcare and biomedical research and how well they can collaborate with human beings in those settings. 

You know, I asked Bill and Seb to make some predictions about the future. You know, my own answer, I expect that we’re going to be able to use AI to change how we diagnose patients, change how we decide treatment options.  

If you’re a doctor or a nurse and you encounter a patient, you’ll ask questions, do a physical exam, you know, call out for labs just like you do today, but then you’ll be able to engage with AI based on all of that data and just ask, you know, based on all the other people who have gone through the same experience, who have similar data, how were they diagnosed? How were they treated? What were their outcomes? And what does that mean for the patient I have right now? Some people call it the “patients like me” paradigm. And I think that’s going to become real because of AI within our lifetimes. That idea of really grounding the delivery in healthcare and medical practice through data and intelligence, I actually now don’t see any barriers to that future becoming real. 

[THEME MUSIC] 

I’d like to extend another big thank you to Bill and Sébastien for their time. And to our listeners, as always, it’s a pleasure to have you along for the ride. I hope you’ll join us for our remaining conversations, as well as a second coauthor roundtable with Carey and Zak.  

Until next time.  

[MUSIC FADES]

The post How AI is reshaping the future of healthcare and medical research appeared first on Microsoft Research.

]]>
What AI’s impact on individuals means for the health workforce and industry http://approjects.co.za/?big=en-us/research/podcast/what-ais-impact-on-individuals-means-for-the-health-workforce-and-industry/ Thu, 29 May 2025 15:13:48 +0000 http://approjects.co.za/?big=en-us/research/?p=1140315 Ethan Mollick and Azeem Azhar, thought leaders at the forefront of AI’s influence on work, education, and society, discuss the impact of AI at the individual level and what that means for the healthcare workforce and the organizations and systems in medicine.

The post What AI’s impact on individuals means for the health workforce and industry appeared first on Microsoft Research.

]]>
Illustrated headshots of Azeem Azhar, Peter Lee, and Ethan Mollick.

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Ethan Mollick (opens in new tab) and Azeem Azhar (opens in new tab), thought leaders at the forefront of AI’s impact on work, education, and society, join Lee to discuss how generative AI is reshaping healthcare and organizational systems. Mollick, professor at the Wharton School, discusses the conflicting emotions that come with navigating AI’s effect on the tasks we enjoy and those we don’t; the systemic challenges in AI adoption; and the need for organizations to actively experiment with AI rather than wait for top-down solutions. Azhar, a technology analyst and writer who explores the intersection of AI, economics, and society, explores how generative AI is transforming healthcare through applications like medical scribing, clinician support, and consumer health monitoring.

Transcript

[MUSIC]   

[BOOK PASSAGE] 

PETER LEE: “In American primary care, the missing workforce is stunning in magnitude, the shortfall estimated to reach up to 48,000 doctors within the next dozen years. China and other countries with aging populations can expect drastic shortfalls, as well. Just last month, I asked a respected colleague retiring from primary care who he would recommend as a replacement; he told me bluntly that, other than expensive concierge care practices, he could not think of anyone, even for himself. This mismatch between need and supply will only grow, and the US is far from alone among developed countries in facing it.”

[END OF BOOK PASSAGE]   

[THEME MUSIC]   

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.   

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?    

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.     

[THEME MUSIC FADES]

The book passage I read at the top is from “Chapter 4: Trust but Verify,” which was written by Zak.

You know, it’s no secret that in the US and elsewhere shortages in medical staff and the rise of clinician burnout are affecting the quality of patient care for the worse. In our book, we predicted that generative AI would be something that might help address these issues.

So in this episode, we’ll delve into how individual performance gains that our previous guests have described might affect the healthcare workforce as a whole, and on the patient side, we’ll look into the influence of generative AI on the consumerization of healthcare. Now, since all of this consumes such a huge fraction of the overall economy, we’ll also get into what a general-purpose technology as disruptive as generative AI might mean in the context of labor markets and beyond.  

To help us do that, I’m pleased to welcome Ethan Mollick and Azeem Azhar.

Ethan Mollick is the Ralph J. Roberts Distinguished Faculty Scholar, a Rowan Fellow, and an associate professor at the Wharton School of the University of Pennsylvania. His research into the effects of AI on work, entrepreneurship, and education is applied by organizations around the world, leading him to be named one of Time magazine’s most influential people in AI for 2024. He’s also the author of the New York Times best-selling book Co-Intelligence.

Azeem Azhar is an author, founder, investor, and one of the most thoughtful and influential voices on the interplay between disruptive emerging technologies and business and society. In his best-selling book, The Exponential Age, and in his highly regarded newsletter and podcast, Exponential View, he explores how technologies like AI are reshaping everything from healthcare to geopolitics.

Ethan and Azeem are two leading thinkers on the ways that disruptive technologies—and especially AI—affect our work, our jobs, our business enterprises, and whole industries. As economists, they are trying to work out whether we are in the midst of an economic revolution as profound as the shift from an agrarian to an industrial society.

[TRANSITION MUSIC]

Here is my interview with Ethan Mollick:

LEE: Ethan, welcome.

ETHAN MOLLICK: So happy to be here, thank you.

LEE: I described you as a professor at Wharton, which I think most of the people who listen to this podcast series know of as an elite business school. So it might surprise some people that you study AI. And beyond that, you know, that I would seek you out to talk about AI in medicine. [LAUGHTER] So to get started, how and why did it happen that you’ve become one of the leading experts on AI?

MOLLICK: It’s actually an interesting story. I’ve been AI-adjacent my whole career. When I was [getting] my PhD at MIT, I worked with Marvin Minsky (opens in new tab) and the MIT [Massachusetts Institute of Technology] Media Labs AI group. But I was never the technical AI guy. I was the person who was trying to explain AI to everybody else who didn’t understand it.

And then I became very interested in, how do you train and teach? And AI was always a part of that. I was building games for teaching, teaching tools that were used in hospitals and elsewhere, simulations. So when LLMs burst into the scene, I had already been using them and had a good sense of what they could do. And between that and, kind of, being practically oriented and getting some of the first research projects underway, especially under education and AI and performance, I became sort of a go-to person in the field.

And once you’re in a field where nobody knows what’s going on and we’re all making it up as we go along—I thought it’s funny that you led with the idea that you have a couple of months head start for GPT-4, right. Like that’s all we have at this point, is a few months’ head start. [LAUGHTER] So being a few months ahead is good enough to be an expert at this point. Whether it should be or not is a different question.

LEE: Well, if I understand correctly, leading AI companies like OpenAI, Anthropic, and others have now sought you out as someone who should get early access to really start to do early assessments and gauge early reactions. How has that been?

MOLLICK: So, I mean, I think the bigger picture is less about me than about two things that tells us about the state of AI right now.

One, nobody really knows what’s going on, right. So in a lot of ways, if it wasn’t for your work, Peter, like, I don’t think people would be thinking about medicine as much because these systems weren’t built for medicine. They weren’t built to change education. They weren’t built to write memos. They, like, they weren’t built to do any of these things. They weren’t really built to do anything in particular. It turns out they’re just good at many things.

And to the extent that the labs work on them, they care about their coding ability above everything else and maybe math and science secondarily. They don’t think about the fact that it expresses high empathy. They don’t think about its accuracy and diagnosis or where it’s inaccurate. They don’t think about how it’s changing education forever.

So one part of this is the fact that they go to my Twitter feed or ask me for advice is an indicator of where they are, too, which is they’re not thinking about this. And the fact that a few months’ head start continues to give you a lead tells you that we are at the very cutting edge. These labs aren’t sitting on projects for two years and then releasing them. Months after a project is complete or sooner, it’s out the door. Like, there’s very little delay. So we’re kind of all in the same boat here, which is a very unusual space for a new technology.

LEE: And I, you know, explained that you’re at Wharton. Are you an odd fit as a faculty member at Wharton, or is this a trend now even in business schools that AI experts are becoming key members of the faculty?

MOLLICK: I mean, it’s a little of both, right. It’s faculty, so everybody does everything. I’m a professor of innovation-entrepreneurship. I’ve launched startups before and working on that and education means I think about, how do organizations redesign themselves? How do they take advantage of these kinds of problems? So medicine’s always been very central to that, right. A lot of people in my MBA class have been MDs either switching, you know, careers or else looking to advance from being sort of individual contributors to running teams. So I don’t think that’s that bad a fit. But I also think this is general-purpose technology; it’s going to touch everything. The focus on this is medicine, but Microsoft does far more than medicine, right. It’s … there’s transformation happening in literally every field, in every country. This is a widespread effect.

So I don’t think we should be surprised that business schools matter on this because we care about management. There’s a long tradition of management and medicine going together. There’s actually a great academic paper that shows that teaching hospitals that also have MBA programs associated with them have higher management scores and perform better (opens in new tab). So I think that these are not as foreign concepts, especially as medicine continues to get more complicated.

LEE: Yeah. Well, in fact, I want to dive a little deeper on these issues of management, of entrepreneurship, um, education. But before doing that, if I could just stay focused on you. There is always something interesting to hear from people about their first encounters with AI. And throughout this entire series, I’ve been doing that both pre-generative AI and post-generative AI. So you, sort of, hinted at the pre-generative AI. You were in Minsky’s lab. Can you say a little bit more about that early encounter? And then tell us about your first encounters with generative AI.

MOLLICK: Yeah. Those are great questions. So first of all, when I was at the media lab, that was pre-the current boom in sort of, you know, even in the old-school machine learning kind of space. So there was a lot of potential directions to head in. While I was there, there were projects underway, for example, to record every interaction small children had. One of the professors was recording everything their baby interacted with in the hope that maybe that would give them a hint about how to build an AI system.

There was a bunch of projects underway that were about labeling every concept and how they relate to other concepts. So, like, it was very much Wild West of, like, how do we make an AI work—which has been this repeated problem in AI, which is, what is this thing?

The fact that it was just like brute force over the corpus of all human knowledge turns out to be a little bit of like a, you know, it’s a miracle and a little bit of a disappointment in some ways [LAUGHTER] compared to how elaborate some of this was. So, you know, I think that, that was sort of my first encounters in sort of the intellectual way.

The generative AI encounters actually started with the original, sort of, GPT-3, or, you know, earlier versions. And it was actually game-based. So I played games like AI Dungeon. And as an educator, I realized, oh my gosh, this stuff could write essays at a fourth-grade level. That’s really going to change the way, like, middle school works, was my thinking at the time. And I was posting about that back in, you know, 2021 that this is a big deal. But I think everybody was taken surprise, including the AI companies themselves, by, you know, ChatGPT, by GPT-3.5. The difference in degree turned out to be a difference in kind.

LEE: Yeah, you know, if I think back, even with GPT-3, and certainly this was the case with GPT-2, it was, at least, you know, from where I was sitting, it was hard to get people to really take this seriously and pay attention.

MOLLICK: Yes.

LEE: You know, it’s remarkable. Within Microsoft, I think a turning point was the use of GPT-3 to do code completions. And that was actually productized as GitHub Copilot (opens in new tab), the very first version. That, I think, is where there was widespread belief. But, you know, in a way, I think there is, even for me early on, a sense of denial and skepticism. Did you have those initially at any point?

MOLLICK: Yeah, I mean, it still happens today, right. Like, this is a weird technology. You know, the original denial and skepticism was, I couldn’t see where this was going. It didn’t seem like a miracle because, you know, of course computers can complete code for you. Like, what else are they supposed to do? Of course, computers can give you answers to questions and write fun things. So there’s difference of moving into a world of generative AI. I think a lot of people just thought that’s what computers could do. So it made the conversations a little weird. But even today, faced with these, you know, with very strong reasoner models that operate at the level of PhD students, I think a lot of people have issues with it, right.

I mean, first of all, they seem intuitive to use, but they’re not always intuitive to use because the first use case that everyone puts AI to, it fails at because they use it like Google or some other use case. And then it’s genuinely upsetting in a lot of ways. I think, you know, I write in my book about the idea of three sleepless nights. That hasn’t changed. Like, you have to have an intellectual crisis to some extent, you know, and I think people do a lot to avoid having that existential angst of like, “Oh my god, what does it mean that a machine could think—apparently think—like a person?”

So, I mean, I see resistance now. I saw resistance then. And then on top of all of that, there’s the fact that the curve of the technology is quite great. I mean, the price of GPT-4 level intelligence from, you know, when it was released has dropped 99.97% at this point, right.

LEE: Yes. Mm-hmm.

MOLLICK: I mean, I could run a GPT-4 class system basically on my phone. Microsoft’s releasing things that can almost run on like, you know, like it fits in almost no space, that are almost as good as the original GPT-4 models. I mean, I don’t think people have a sense of how fast the trajectory is moving either.

LEE: Yeah, you know, there’s something that I think about often. There is this existential dread, or will this technology replace me? But I think the first people to feel that are researchers—people encountering this for the first time. You know, if you were working, let’s say, in Bayesian reasoning or in traditional, let’s say, Gaussian mixture model based, you know, speech recognition, you do get this feeling, Oh, my god, this technology has just solved the problem that I’ve dedicated my life to. And there is this really difficult period where you have to cope with that. And I think this is going to be spreading, you know, in more and more walks of life. And so this … at what point does that sort of sense of dread hit you, if ever?

MOLLICK: I mean, you know, it’s not even dread as much as like, you know, Tyler Cowen wrote that it’s impossible to not feel a little bit of sadness as you use these AI systems, too. Because, like, I was talking to a friend, just as the most minor example, and his talent that he was very proud of was he was very good at writing limericks for birthday cards. He’d write these limericks. Everyone was always amused by them. [LAUGHTER]

And now, you know, GPT-4 and GPT-4.5, they made limericks obsolete. Like, anyone can write a good limerick, right. So this was a talent, and it was a little sad. Like, this thing that you cared about mattered.

You know, as academics, we’re a little used to dead ends, right, and like, you know, some getting the lap. But the idea that entire fields are hitting that way. Like in medicine, there’s a lot of support systems that are now obsolete. And the question is how quickly you change that. In education, a lot of our techniques are obsolete.

What do you do to change that? You know, it’s like the fact that this brute force technology is good enough to solve so many problems is weird, right. And it’s not just the end of, you know, of our research angles that matter, too. Like, for example, I ran this, you know, 14-person-plus, multimillion-dollar effort at Wharton to build these teaching simulations, and we’re very proud of them. It took years of work to build one.

Now we’ve built a system that can build teaching simulations on demand by you talking to it with one team member. And, you know, you literally can create any simulation by having a discussion with the AI. I mean, you know, there’s a switch to a new form of excitement, but there is a little bit of like, this mattered to me, and, you know, now I have to change how I do things. I mean, adjustment happens. But if you haven’t had that displacement, I think that’s a good indicator that you haven’t really faced AI yet.

LEE: Yeah, what’s so interesting just listening to you is you use words like sadness, and yet I can see the—and hear the—excitement in your voice and your body language. So, you know, that’s also kind of an interesting aspect of all of this. 

MOLLICK: Yeah, I mean, I think there’s something on the other side, right. But, like, I can’t say that I haven’t had moments where like, ughhhh, but then there’s joy and basically like also, you know, freeing stuff up. I mean, I think about doctors or professors, right. These are jobs that bundle together lots of different tasks that you would never have put together, right. If you’re a doctor, you would never have expected the same person to be good at keeping up with the research and being a good diagnostician and being a good manager and being good with people and being good with hand skills.

Like, who would ever want that kind of bundle? That’s not something you’re all good at, right. And a lot of our stress of our job comes from the fact that we suck at some of it. And so to the extent that AI steps in for that, you kind of feel bad about some of the stuff that it’s doing that you wanted to do. But it’s much more uplifting to be like, I don’t have to do this stuff I’m bad anymore, or I get the support to make myself good at it. And the stuff that I really care about, I can focus on more. Well, because we are at kind of a unique moment where whatever you’re best at, you’re still better than AI. And I think it’s an ongoing question about how long that lasts. But for right now, like you’re not going to say, OK, AI replaces me entirely in my job in medicine. It’s very unlikely.

But you will say it replaces these 17 things I’m bad at, but I never liked that anyway. So it’s a period of both excitement and a little anxiety.

LEE: Yeah, I’m going to want to get back to this question about in what ways AI may or may not replace doctors or some of what doctors and nurses and other clinicians do. But before that, let’s get into, I think, the real meat of this conversation. In previous episodes of this podcast, we talked to clinicians and healthcare administrators and technology developers that are very rapidly injecting AI today to do various forms of workforce automation, you know, automatically writing a clinical encounter note, automatically filling out a referral letter or request for prior authorization for some reimbursement to an insurance company.

And so these sorts of things are intended not only to make things more efficient and lower costs but also to reduce various forms of drudgery, cognitive burden on frontline health workers. So how do you think about the impact of AI on that aspect of workforce, and, you know, what would you expect will happen over the next few years in terms of impact on efficiency and costs?

MOLLICK: So I mean, this is a case where I think we’re facing the big bright problem in AI in a lot of ways, which is that this is … at the individual level, there’s lots of performance gains to be gained, right. The problem, though, is that we as individuals fit into systems, in medicine as much as anywhere else or more so, right. Which is that you could individually boost your performance, but it’s also about systems that fit along with this, right.

So, you know, if you could automatically, you know, record an encounter, if you could automatically make notes, does that change what you should be expecting for notes or the value of those notes or what they’re for? How do we take what one person does and validate it across the organization and roll it out for everybody without making it a 10-year process that it feels like IT in medicine often is? Like, so we’re in this really interesting period where there’s incredible amounts of individual innovation in productivity and performance improvements in this field, like very high levels of it, but not necessarily seeing that same thing translate to organizational efficiency or gains.

And one of my big concerns is seeing that happen. We’re seeing that in nonmedical problems, the same kind of thing, which is, you know, we’ve got research showing 20 and 40% performance improvements, like not uncommon to see those things. But then the organization doesn’t capture it; the system doesn’t capture it. Because the individuals are doing their own work and the systems don’t have the ability to, kind of, learn or adapt as a result.

LEE: You know, where are those productivity gains going, then, when you get to the organizational level?

MOLLICK: Well, they’re dying for a few reasons. One is, there’s a tendency for individual contributors to underestimate the power of management, right.

Practices associated with good management increase happiness, decrease, you know, issues, increase success rates. In the same way, about 40%, as far as we can tell, of the US advantage over other companies, of US firms, has to do with management ability. Like, management is a big deal. Organizing is a big deal. Thinking about how you coordinate is a big deal.

At the individual level, when things get stuck there, right, you can’t start bringing them up to how systems work together. It becomes, How do I deal with a doctor that has a 60% performance improvement? We really only have one thing in our playbook for doing that right now, which is, OK, we could fire 40% of the other doctors and still have a performance gain, which is not the answer you want to see happen.

So because of that, people are hiding their use. They’re actually hiding their use for lots of reasons.

And it’s a weird case because the people who are able to figure out best how to use these systems, for a lot of use cases, they’re actually clinicians themselves because they’re experimenting all the time. Like, they have to take those encounter notes. And if they figure out a better way to do it, they figure that out. You don’t want to wait for, you know, a med tech company to figure that out and then sell that back to you when it can be done by the physicians themselves.

So we’re just not used to a period where everybody’s innovating and where the management structure isn’t in place to take advantage of that. And so we’re seeing things stalled at the individual level, and people are often, especially in risk-averse organizations or organizations where there’s lots of regulatory hurdles, people are so afraid of the regulatory piece that they don’t even bother trying to make change.

LEE: If you are, you know, the leader of a hospital or a clinic or a whole health system, how should you approach this? You know, how should you be trying to extract positive success out of AI?

MOLLICK: So I think that you need to embrace the right kind of risk, right. We don’t want to put risk on our patients … like, we don’t want to put uninformed risk. But innovation involves risk to how organizations operate. They involve change. So I think part of this is embracing the idea that R&D has to happen in organizations again.

What’s happened over the last 20 years or so has been organizations giving that up. Partially, that’s a trend to focus on what you’re good at and not try and do this other stuff. Partially, it’s because it’s outsourced now to software companies that, like, Salesforce tells you how to organize your sales team. Workforce tells you how to organize your organization. Consultants come in and will tell you how to make change based on the average of what other people are doing in your field.

So companies and organizations and hospital systems have all started to give up their ability to create their own organizational change. And when I talk to organizations, I often say they have to have two approaches. They have to think about the crowd and the lab.

So the crowd is the idea of how to empower clinicians and administrators and supporter networks to start using AI and experimenting in ethical, legal ways and then sharing that information with each other. And the lab is, how are we doing R&D about the approach of how to [get] AI to work, not just in direct patient care, right. But also fundamentally, like, what paperwork can you cut out? How can we better explain procedures? Like, what management role can this fill?

And we need to be doing active experimentation on that. We can’t just wait for, you know, Microsoft to solve the problems. It has to be at the level of the organizations themselves.

LEE: So let’s shift a little bit to the patient. You know, one of the things that we see, and I think everyone is seeing, is that people are turning to chatbots, like ChatGPT, actually to seek healthcare information for, you know, their own health or the health of their loved ones.

And there was already, prior to all of this, a trend towards, let’s call it, consumerization of healthcare. So just in the business of healthcare delivery, do you think AI is going to hasten these kinds of trends, or from the consumer’s perspective, what … ?

MOLLICK: I mean, absolutely, right. Like, all the early data that we have suggests that for most common medical problems, you should just consult AI, too, right. In fact, there is a real question to ask: at what point does it become unethical for doctors themselves to not ask for a second opinion from the AI because it’s cheap, right? You could overrule it or whatever you want, but like not asking seems foolish.

I think the two places where there’s a burning almost, you know, moral imperative is … let’s say, you know, I’m in Philadelphia, I’m a professor, I have access to really good healthcare through the Hospital University of Pennsylvania system. I know doctors. You know, I’m lucky. I’m well connected. If, you know, something goes wrong, I have friends who I can talk to. I have specialists. I’m, you know, pretty well educated in this space.

But for most people on the planet, they don’t have access to good medical care, they don’t have good health. It feels like it’s absolutely imperative to say when should you use AI and when not. Are there blind spots? What are those things?

And I worry that, like, to me, that would be the crash project I’d be invoking because I’m doing the same thing in education, which is this system is not as good as being in a room with a great teacher who also uses AI to help you, but it’s better than not getting an, you know, to the level of education people get in many cases. Where should we be using it? How do we guide usage in the right way? Because the AI labs aren’t thinking about this. We have to.

So, to me, there is a burning need here to understand this. And I worry that people will say, you know, everything that’s true—AI can hallucinate, AI can be biased. All of these things are absolutely true, but people are going to use it. The early indications are that it is quite useful. And unless we take the active role of saying, here’s when to use it, here’s when not to use it, we don’t have a right to say, don’t use this system. And I think, you know, we have to be exploring that.

LEE: What do people need to understand about AI? And what should schools, universities, and so on be teaching?

MOLLICK: Those are, kind of, two separate questions in lot of ways. I think a lot of people want to teach AI skills, and I will tell you, as somebody who works in this space a lot, there isn’t like an easy, sort of, AI skill, right. I could teach you prompt engineering in two to three classes, but every indication we have is that for most people under most circumstances, the value of prompting, you know, any one case is probably not that useful.

A lot of the tricks are disappearing because the AI systems are just starting to use them themselves. So asking good questions, being a good manager, being a good thinker tend to be important, but like magic tricks around making, you know, the AI do something because you use the right phrase used to be something that was real but is rapidly disappearing.

So I worry when people say teach AI skills. No one’s been able to articulate to me as somebody who knows AI very well and teaches classes on AI, what those AI skills that everyone should learn are, right.

I mean, there’s value in learning a little bit how the models work. There’s a value in working with these systems. A lot of it’s just hands on keyboard kind of work. But, like, we don’t have an easy slam dunk “this is what you learn in the world of AI” because the systems are getting better, and as they get better, they get less sensitive to these prompting techniques. They get better prompting themselves. They solve problems spontaneously and start being agentic. So it’s a hard problem to ask about, like, what do you train someone on? I think getting people experience in hands-on-keyboards, getting them to … there’s like four things I could teach you about AI, and two of them are already starting to disappear.

But, like, one is be direct. Like, tell the AI exactly what you want. That’s very helpful. Second, provide as much context as possible. That can include things like acting as a doctor, but also all the information you have. The third is give it step-by-step directions—that’s becoming less important. And the fourth is good and bad examples of the kind of output you want. Those four, that’s like, that’s it as far as the research telling you what to do, and the rest is building intuition.

LEE: I’m really impressed that you didn’t give the answer, “Well, everyone should be teaching my book, Co-Intelligence.” [LAUGHS]

MOLLICK: Oh, no, sorry! Everybody should be teaching my book Co-Intelligence. I apologize. [LAUGHTER]

LEE: It’s good to chuckle about that, but actually, I can’t think of a better book, like, if you were to assign a textbook in any professional education space, I think Co-Intelligence would be number one on my list. Are there other things that you think are essential reading?

MOLLICK: That’s a really good question. I think that a lot of things are evolving very quickly. I happen to, kind of, hit a sweet spot with Co-Intelligence to some degree because I talk about how I used it, and I was, sort of, an advanced user of these systems.

So, like, it’s, sort of, like my Twitter feed, my online newsletter. I’m just trying to, kind of, in some ways, it’s about trying to make people aware of what these systems can do by just showing a lot, right. Rather than picking one thing, and, like, this is a general-purpose technology. Let’s use it for this. And, like, everybody gets a light bulb for a different reason. So more than reading, it is using, you know, and that can be Copilot or whatever your favorite tool is.

But using it. Voice modes help a lot. In terms of readings, I mean, I think that there is a couple of good guides to understanding AI that were originally blog posts. I think Tim Lee has one called Understanding AI (opens in new tab), and it had a good overview …

LEE: Yeah, that’s a great one.

MOLLICK: … of that topic that I think explains how transformers work, which can give you some mental sense. I think [Andrej] Karpathy (opens in new tab) has some really nice videos of use that I would recommend.

Like on the medical side, I think the book that you did, if you’re in medicine, you should read that. I think that that’s very valuable. But like all we can offer are hints in some ways. Like there isn’t … if you’re looking for the instruction manual, I think it can be very frustrating because it’s like you want the best practices and procedures laid out, and we cannot do that, right. That’s not how a system like this works.

LEE: Yeah.

MOLLICK: It’s not a person, but thinking about it like a person can be helpful, right.

LEE: One of the things that has been sort of a fun project for me for the last few years is I have been a founding board member of a new medical school at Kaiser Permanente. And, you know, that medical school curriculum is being formed in this era. But it’s been perplexing to understand, you know, what this means for a medical school curriculum. And maybe even more perplexing for me, at least, is the accrediting bodies, which are extremely important in US medical schools; how accreditors should think about what’s necessary here.

Besides the things that you’ve … the, kind of, four key ideas you mentioned, if you were talking to the board of directors of the LCME [Liaison Committee on Medical Education] accrediting body, what’s the one thing you would want them to really internalize?

MOLLICK: This is both a fast-moving and vital area. This can’t be viewed like a usual change, which [is], “Let’s see how this works.” Because it’s, like, the things that make medical technologies hard to do, which is like unclear results, limited, you know, expensive use cases where it rolls out slowly. So one or two, you know, advanced medical facilities get access to, you know, proton beams or something else at multi-billion dollars of cost, and that takes a while to diffuse out. That’s not happening here. This is all happening at the same time, all at once. This is now … AI is part of medicine.

I mean, there’s a minor point that I’d make that actually is a really important one, which is large language models, generative AI overall, work incredibly differently than other forms of AI. So the other worry I have with some of these accreditors is they blend together algorithmic forms of AI, which medicine has been trying for long time—decision support, algorithmic methods, like, medicine more so than other places has been thinking about those issues. Generative AI, even though it uses the same underlying techniques, is a completely different beast.

So, like, even just take the most simple thing of algorithmic aversion, which is a well-understood problem in medicine, right. Which is, so you have a tool that could tell you as a radiologist, you know, the chance of this being cancer; you don’t like it, you overrule it, right.

We don’t find algorithmic aversion happening with LLMs in the same way. People actually enjoy using them because it’s more like working with a person. The flaws are different. The approach is different. So you need to both view this as universal applicable today, which makes it urgent, but also as something that is not the same as your other form of AI, and your AI working group that is thinking about how to solve this problem is not the right people here.

LEE: You know, I think the world has been trained because of the magic of web search to view computers as question-answering machines. Ask a question, get an answer.

MOLLICK: Yes. Yes.

LEE: Write a query, get results. And as I have interacted with medical professionals, you can see that medical professionals have that model of a machine in mind. And I think that’s partly, I think psychologically, why hallucination is so alarming. Because you have a mental model of a computer as a machine that has absolutely rock-solid perfect memory recall.

But the thing that was so powerful in Co-Intelligence, and we tried to get at this in our book also, is that’s not the sweet spot. It’s this sort of deeper interaction, more of a collaboration. And I thought your use of the term Co-Intelligence really just even in the title of the book tried to capture this. When I think about education, it seems like that’s the first step, to get past this concept of a machine being just a question-answering machine. Do you have a reaction to that idea?

MOLLICK: I think that’s very powerful. You know, we’ve been trained over so many years at both using computers but also in science fiction, right. Computers are about cold logic, right. They will give you the right answer, but if you ask it what love is, they explode, right. Like that’s the classic way you defeat the evil robot in Star Trek, right. “Love does not compute.” [LAUGHTER]

Instead, we have a system that makes mistakes, is warm, beats doctors in empathy in almost every controlled study on the subject, right. Like, absolutely can outwrite you in a sonnet but will absolutely struggle with giving you the right answer every time. And I think our mental models are just broken for this. And I think you’re absolutely right. And that’s part of what I thought your book does get at really well is, like, this is a different thing. It’s also generally applicable. Again, the model in your head should be kind of like a person even though it isn’t, right.

There’s a lot of warnings and caveats to it, but if you start from person, smart person you’re talking to, your mental model will be more accurate than smart machine, even though both are flawed examples, right. So it will make mistakes; it will make errors. The question is, what do you trust it on? What do you not trust it? As you get to know a model, you’ll get to understand, like, I totally don’t trust it for this, but I absolutely trust it for that, right.

LEE: All right. So we’re getting to the end of the time we have together. And so I’d just like to get now into something a little bit more provocative. And I get the question all the time. You know, will AI replace doctors? In medicine and other advanced knowledge work, project out five to 10 years. What do think happens?

MOLLICK: OK, so first of all, let’s acknowledge systems change much more slowly than individual use. You know, doctors are not individual actors; they’re part of systems, right. So not just the system of a patient who like may or may not want to talk to a machine instead of a person but also legal systems and administrative systems and systems that allocate labor and systems that train people.

So, like, it’s hard to imagine that in five to 10 years medicine being so upended that even if AI was better than doctors at every single thing doctors do, that we’d actually see as radical a change in medicine as you might in other fields. I think you will see faster changes happen in consulting and law and, you know, coding, other spaces than medicine.

But I do think that there is good reason to suspect that AI will outperform people while still having flaws, right. That’s the difference. We’re already seeing that for common medical questions in enough randomized controlled trials that, you know, best doctors beat AI, but the AI beats the mean doctor, right. Like, that’s just something we should acknowledge is happening at this point.

Now, will that work in your specialty? No. Will that work with all the contingent social knowledge that you have in your space? Probably not.

Like, these are vignettes, right. But, like, that’s kind of where things are. So let’s assume, right … you’re asking two questions. One is, how good will AI get?

LEE: Yeah.

MOLLICK: And we don’t know the answer to that question. I will tell you that your colleagues at Microsoft and increasingly the labs, the AI labs themselves, are all saying they think they’ll have a machine smarter than a human at every intellectual task in the next two to three years. If that doesn’t happen, that makes it easier to assume the future, but let’s just assume that that’s the case. I think medicine starts to change with the idea that people feel obligated to use this to help for everything.

Your patients will be using it, and it will be your advisor and helper at the beginning phases, right. And I think that I expect people to be better at empathy. I expect better bedside manner. I expect management tasks to become easier. I think administrative burden might lighten if we handle this right way or much worse if we handle it badly. Diagnostic accuracy will increase, right.

And then there’s a set of discovery pieces happening, too, right. One of the core goals of all the AI companies is to accelerate medical research. How does that happen and how does that affect us is a, kind of, unknown question. So I think clinicians are in both the eye of the storm and surrounded by it, right. Like, they can resist AI use for longer than most other fields, but everything around them is going to be affected by it.

LEE: Well, Ethan, this has been really a fantastic conversation. And, you know, I think in contrast to all the other conversations we’ve had, this one gives especially the leaders in healthcare, you know, people actually trying to lead their organizations into the future, whether it’s in education or in delivery, a lot to think about. So I really appreciate you joining.

MOLLICK: Thank you.

[TRANSITION MUSIC]  

I’m a computing researcher who works with people who are right in the middle of today’s bleeding-edge developments in AI. And because of that, I often lose sight of how to talk to a broader audience about what it’s all about. And so I think one of Ethan’s superpowers is that he has this knack for explaining complex topics in AI in a really accessible way, getting right to the most important points without making it so simple as to be useless. That’s why I rarely miss an opportunity to read up on his latest work.

One of the first things I learned from Ethan is the intuition that you can, sort of, think of AI as a very knowledgeable intern. In other words, think of it as a persona that you can interact with, but you also need to be a manager for it and to always assess the work that it does.

In our discussion, Ethan went further to stress that there is, because of that, a serious education gap. You know, over the last decade or two, we’ve all been trained, mainly by search engines, to think of computers as question-answering machines. In medicine, in fact, there’s a question-answering application that is really popular called UpToDate (opens in new tab). Doctors use it all the time. But generative AI systems like ChatGPT are different. There’s therefore a challenge in how to break out of the old-fashioned mindset of search to get the full value out of generative AI.

The other big takeaway for me was that Ethan pointed out while it’s easy to see productivity gains from AI at the individual level, those same gains, at least today, don’t often translate automatically to organization-wide or system-wide gains. And one, of course, has to conclude that it takes more than just making individuals more productive; the whole system also has to adjust to the realities of AI.

Here’s now my interview with Azeem Azhar:

LEE: Azeem, welcome.

AZEEM AZHAR: Peter, thank you so much for having me. 

LEE: You know, I think you’re extremely well known in the world. But still, some of the listeners of this podcast series might not have encountered you before.

And so one of the ways I like to ask people to introduce themselves is, how do you explain to your parents what you do every day?

AZHAR: Well, I’m very lucky in that way because my mother was the person who got me into computers more than 40 years ago. And I still have that first computer, a ZX81 with a Z80 chip …

LEE: Oh wow.

AZHAR: … to this day. It sits in my study, all seven and a half thousand transistors and Bakelite plastic that it is. And my parents were both economists, and economics is deeply connected with technology in some sense. And I grew up in the late ’70s and the early ’80s. And that was a time of tremendous optimism around technology. It was space opera, science fiction, robots, and of course, the personal computer and, you know, Bill Gates and Steve Jobs. So that’s where I started.

And so, in a way, my mother and my dad, who passed away a few years ago, had always known me as someone who was fiddling with computers but also thinking about economics and society. And so, in a way, it’s easier to explain to them because they’re the ones who nurtured the environment that allowed me to research technology and AI and think about what it means to firms and to the economy at large.

LEE: I always like to understand the origin story. And what I mean by that is, you know, what was your first encounter with generative AI? And what was that like? What did you go through?

AZHAR: The first real moment was when Midjourney and Stable Diffusion emerged in that summer of 2022. I’d been away on vacation, and I came back—and I’d been off grid, in fact—and the world had really changed.

Now, I’d been aware of GPT-3 and GPT-2, which I played around with and with BERT, the original transformer paper about seven or eight years ago, but it was the moment where I could talk to my computer, and it could produce these images, and it could be refined in natural language that really made me think we’ve crossed into a new domain. We’ve gone from AI being highly discriminative to AI that’s able to explore the world in particular ways. And then it was a few months later that ChatGPT came out—November, the 30th.

And I think it was the next day or the day after that I said to my team, everyone has to use this, and we have to meet every morning and discuss how we experimented the day before. And we did that for three or four months. And, you know, it was really clear to me in that interface at that point that, you know, we’d absolutely pass some kind of threshold.

LEE: And who’s the we that you were experimenting with?

AZHAR: So I have a team of four who support me. They’re mostly researchers of different types. I mean, it’s almost like one of those jokes. You know, I have a sociologist, an economist, and an astrophysicist. And, you know, they walk into the bar, [LAUGHTER] or they walk into our virtual team room, and we try to solve problems.

LEE: Well, so let’s get now into brass tacks here. And I think I want to start maybe just with an exploration of the economics of all this and economic realities. Because I think in a lot of your work—for example, in your book—you look pretty deeply at how automation generally and AI specifically are transforming certain sectors like finance, manufacturing, and you have a really, kind of, insightful focus on what this means for productivity and which ways, you know, efficiencies are found.  

And then you, sort of, balance that with risks, things that can and do go wrong. And so as you take that background and looking at all those other sectors, in what ways are the same patterns playing out or likely to play out in healthcare and medicine?

AZHAR: I’m sure we will see really remarkable parallels but also new things going on. I mean, medicine has a particular quality compared to other sectors in the sense that it’s highly regulated, market structure is very different country to country, and it’s an incredibly broad field. I mean, just think about taking a Tylenol and going through laparoscopic surgery. Having an MRI and seeing a physio. I mean, this is all medicine. I mean, it’s hard to imagine a sector that is [LAUGHS] more broad than that.

So I think we can start to break it down, and, you know, where we’re seeing things with generative AI will be that the, sort of, softest entry point, which is the medical scribing. And I’m sure many of us have been with clinicians who have a medical scribe running alongside—they’re all on Surface Pros I noticed, right? [LAUGHTER] They’re on the tablet computers, and they’re scribing away.

And what that’s doing is, in the words of my friend Eric Topol, it’s giving the clinician time back (opens in new tab), right. They have time back from days that are extremely busy and, you know, full of administrative overload. So I think you can obviously do a great deal with reducing that overload.

And within my team, we have a view, which is if you do something five times in a week, you should be writing an automation for it. And if you’re a doctor, you’re probably reviewing your notes, writing the prescriptions, and so on several times a day. So those are things that can clearly be automated, and the human can be in the loop. But I think there are so many other ways just within the clinic that things can help.

So, one of my friends, my friend from my junior school—I’ve known him since I was 9—is an oncologist who’s also deeply into machine learning, and he’s in Cambridge in the UK. And he built with Microsoft Research a suite of imaging AI tools from his own discipline, which they then open sourced.

So that’s another way that you have an impact, which is that you actually enable the, you know, generalist, specialist, polymath, whatever they are in health systems to be able to get this technology, to tune it to their requirements, to use it, to encourage some grassroots adoption in a system that’s often been very, very heavily centralized.

LEE: Yeah.

AZHAR: And then I think there are some other things that are going on that I find really, really exciting. So one is the consumerization of healthcare. So I have one of those sleep tracking rings, the Oura (opens in new tab).

LEE: Yup.

AZHAR: That is building a data stream that we’ll be able to apply more and more AI to. I mean, right now, it’s applying traditional, I suspect, machine learning, but you can imagine that as we start to get more data, we start to get more used to measuring ourselves, we create this sort of pot, a personal asset that we can turn AI to.

And there’s still another category. And that other category is one of the completely novel ways in which we can enable patient care and patient pathway. And there’s a fantastic startup in the UK called Neko Health (opens in new tab), which, I mean, does physicals, MRI scans, and blood tests, and so on.

It’s hard to imagine Neko existing without the sort of advanced data, machine learning, AI that we’ve seen emerge over the last decade. So, I mean, I think that there are so many ways in which the temperature is slowly being turned up to encourage a phase change within the healthcare sector.

And last but not least, I do think that these tools can also be very, very supportive of a clinician’s life cycle. I think we, as patients, we’re a bit …  I don’t know if we’re as grateful as we should be for our clinicians who are putting in 90-hour weeks. [LAUGHTER] But you can imagine a world where AI is able to support not just the clinicians’ workload but also their sense of stress, their sense of burnout.

So just in those five areas, Peter, I sort of imagine we could start to fundamentally transform over the course of many years, of course, the way in which people think about their health and their interactions with healthcare systems

LEE: I love how you break that down. And I want to press on a couple of things.

You also touched on the fact that medicine is, at least in most of the world, is a highly regulated industry. I guess finance is the same way, but they also feel different because the, like, finance sector has to be very responsive to consumers, and consumers are sensitive to, you know, an abundance of choice; they are sensitive to price. Is there something unique about medicine besides being regulated?

AZHAR: I mean, there absolutely is. And in finance, as well, you have much clearer end states. So if you’re not in the consumer space, but you’re in the, you know, asset management space, you have to essentially deliver returns against the volatility or risk boundary, right. That’s what you have to go out and do. And I think if you’re in the consumer industry, you can come back to very, very clear measures, net promoter score being a very good example.

In the case of medicine and healthcare, it is much more complicated because as far as the clinician is concerned, people are individuals, and we have our own parts and our own responses. If we didn’t, there would never be a need for a differential diagnosis. There’d never be a need for, you know, Let’s try azithromycin first, and then if that doesn’t work, we’ll go to vancomycin, or, you know, whatever it happens to be. You would just know. But ultimately, you know, people are quite different. The symptoms that they’re showing are quite different, and also their compliance is really, really different.

I had a back problem that had to be dealt with by, you know, a physio and extremely boring exercises four times a week, but I was ruthless in complying, and my physio was incredibly surprised. He’d say well no one ever does this, and I said, well you know the thing is that I kind of just want to get this thing to go away.

LEE: Yeah.

AZHAR: And I think that that’s why medicine is and healthcare is so different and more complex. But I also think that’s why AI can be really, really helpful. I mean, we didn’t talk about, you know, AI in its ability to potentially do this, which is to extend the clinician’s presence throughout the week.

LEE: Right. Yeah.

AZHAR: The idea that maybe some part of what the clinician would do if you could talk to them on Wednesday, Thursday, and Friday could be delivered through an app or a chatbot just as a way of encouraging the compliance, which is often, especially with older patients, one reason why conditions, you know, linger on for longer.

LEE: You know, just staying on the regulatory thing, as I’ve thought about this, the one regulated sector that I think seems to have some parallels to healthcare is energy delivery, energy distribution.

Because like healthcare, as a consumer, I don’t have choice in who delivers electricity to my house. And even though I care about it being cheap or at least not being overcharged, I don’t have an abundance of choice. I can’t do price comparisons.

And there’s something about that, just speaking as a consumer of both energy and a consumer of healthcare, that feels similar. Whereas other regulated industries, you know, somehow, as a consumer, I feel like I have a lot more direct influence and power. Does that make any sense to someone, you know, like you, who’s really much more expert in how economic systems work?

AZHAR: I mean, in a sense, one part of that is very, very true. You have a limited panel of energy providers you can go to, and in the US, there may be places where you have no choice.

I think the area where it’s slightly different is that as a consumer or a patient, you can actually make meaningful choices and changes yourself using these technologies, and people used to joke about you know asking Dr. Google. But Dr. Google is not terrible, particularly if you go to WebMD. And, you know, when I look at long-range change, many of the regulations that exist around healthcare delivery were formed at a point before people had access to good quality information at the touch of their fingertips or when educational levels in general were much, much lower. And many regulations existed because of the incumbent power of particular professional sectors.

I’ll give you an example from the United Kingdom. So I have had asthma all of my life. That means I’ve been taking my inhaler, Ventolin, and maybe a steroid inhaler for nearly 50 years. That means that I know … actually, I’ve got more experience, and I—in some sense—know more about it than a general practitioner.

LEE: Yeah.

AZHAR: And until a few years ago, I would have to go to a general practitioner to get this drug that I’ve been taking for five decades, and there they are, age 30 or whatever it is. And a few years ago, the regulations changed. And now pharmacies can … or pharmacists can prescribe those types of drugs under certain conditions directly.

LEE: Right.

AZHAR: That was not to do with technology. That was to do with incumbent lock-in. So when we look at the medical industry, the healthcare space, there are some parallels with energy, but there are a few little things that the ability that the consumer has to put in some effort to learn about their condition, but also the fact that some of the regulations that exist just exist because certain professions are powerful.

LEE: Yeah, one last question while we’re still on economics. There seems to be a conundrum about productivity and efficiency in healthcare delivery because I’ve never encountered a doctor or a nurse that wants to be able to handle even more patients than they’re doing on a daily basis.

And so, you know, if productivity means simply, well, your rounds can now handle 16 patients instead of eight patients, that doesn’t seem necessarily to be a desirable thing. So how can we or should we be thinking about efficiency and productivity since obviously costs are, in most of the developed world, are a huge, huge problem?

AZHAR: Yes, and when you described doubling the number of patients on the round, I imagined you buying them all roller skates so they could just whizz around [LAUGHTER] the hospital faster and faster than ever before.

We can learn from what happened with the introduction of electricity. Electricity emerged at the end of the 19th century, around the same time that cars were emerging as a product, and car makers were very small and very artisanal. And in the early 1900s, some really smart car makers figured out that electricity was going to be important. And they bought into this technology by putting pendant lights in their workshops so they could “visit more patients.” Right?

LEE: Yeah, yeah.

AZHAR: They could effectively spend more hours working, and that was a productivity enhancement, and it was noticeable. But, of course, electricity fundamentally changed the productivity by orders of magnitude of people who made cars starting with Henry Ford because he was able to reorganize his factories around the electrical delivery of power and to therefore have the moving assembly line, which 10xed the productivity of that system.

So when we think about how AI will affect the clinician, the nurse, the doctor, it’s much easier for us to imagine it as the pendant light that just has them working later …

LEE: Right.

AZHAR: … than it is to imagine a reconceptualization of the relationship between the clinician and the people they care for.

And I’m not sure. I don’t think anybody knows what that looks like. But, you know, I do think that there will be a way that this changes, and you can see that scale out factor. And it may be, Peter, that what we end up doing is we end up saying, OK, because we have these brilliant AIs, there’s a lower level of training and cost and expense that’s required for a broader range of conditions that need treating. And that expands the market, right. That expands the market hugely. It’s what has happened in the market for taxis or ride sharing. The introduction of Uber and the GPS system …

LEE: Yup.

AZHAR: … has meant many more people now earn their living driving people around in their cars. And at least in London, you had to be reasonably highly trained to do that.

So I can see a reorganization is possible. Of course, entrenched interests, the economic flow … and there are many entrenched interests, particularly in the US between the health systems and the, you know, professional bodies that might slow things down. But I think a reimagining is possible.

And if I may, I’ll give you one example of that, which is, if you go to countries outside of the US where there are many more sick people per doctor, they have incentives to change the way they deliver their healthcare. And well before there was AI of this quality around, there was a few cases of health systems in India—Aravind Eye Care (opens in new tab) was one, and Narayana Hrudayalaya [now known as Narayana Health (opens in new tab)] was another. And in the latter, they were a cardiac care unit where you couldn’t get enough heart surgeons.

LEE: Yeah, yep.

AZHAR: So specially trained nurses would operate under the supervision of a single surgeon who would supervise many in parallel. So there are ways of increasing the quality of care, reducing the cost, but it does require a systems change. And we can’t expect a single bright algorithm to do it on its own.

LEE: Yeah, really, really interesting. So now let’s get into regulation. And let me start with this question. You know, there are several startup companies I’m aware of that are pushing on, I think, a near-term future possibility that a medical AI for consumer might be allowed, say, to prescribe a medication for you, something that would normally require a doctor or a pharmacist, you know, that is certified in some way, licensed to do. Do you think we’ll get to a point where for certain regulated activities, humans are more or less cut out of the loop?

AZHAR: Well, humans would have been in the loop because they would have provided the training data, they would have done the oversight, the quality control. But to your question in general, would we delegate an important decision entirely to a tested set of algorithms? I’m sure we will. We already do that. I delegate less important decisions like, What time should I leave for the airport to Waze. I delegate more important decisions to the automated braking in my car. We will do this at certain levels of risk and threshold.

If I come back to my example of prescribing Ventolin. It’s really unclear to me that the prescription of Ventolin, this incredibly benign bronchodilator that is only used by people who’ve been through the asthma process, needs to be prescribed by someone who’s gone through 10 years or 12 years of medical training. And why that couldn’t be prescribed by an algorithm or an AI system.

LEE: Right. Yep. Yep.

AZHAR: So, you know, I absolutely think that that will be the case and could be the case. I can’t really see what the objections are. And the real issue is where do you draw the line of where you say, “Listen, this is too important,” or “The cost is too great,” or “The side effects are too high,” and therefore this is a point at which we want to have some, you know, human taking personal responsibility, having a liability framework in place, having a sense that there is a person with legal agency who signed off on this decision. And that line I suspect will start fairly low, and what we’d expect to see would be that that would rise progressively over time.

LEE: What you just said, that scenario of your personal asthma medication, is really interesting because your personal AI might have the benefit of 50 years of your own experience with that medication. So, in a way, there is at least the data potential for, let’s say, the next prescription to be more personalized and more tailored specifically for you.

AZHAR: Yes. Well, let’s dig into this because I think this is super interesting, and we can look at how things have changed. So 15 years ago, if I had a bad asthma attack, which I might have once a year, I would have needed to go and see my general physician.

In the UK, it’s very difficult to get an appointment. I would have had to see someone privately who didn’t know me at all because I’ve just walked in off the street, and I would explain my situation. It would take me half a day. Productivity lost. I’ve been miserable for a couple of days with severe wheezing. Then a few years ago the system changed, a protocol changed, and now I have a thing called a rescue pack, which includes prednisolone steroids. It includes something else I’ve just forgotten, and an antibiotic in case I get an upper respiratory tract infection, and I have an “algorithm.” It’s called a protocol. It’s printed out. It’s a flowchart

I answer various questions, and then I say, “I’m going to prescribe this to myself.” You know, UK doctors don’t prescribe prednisolone, or prednisone as you may call it in the US, at the drop of a hat, right. It’s a powerful steroid. I can self-administer, and I can now get that repeat prescription without seeing a physician a couple of times a year. And the algorithm, the “AI” is, it’s obviously been done in PowerPoint naturally, and it’s a bunch of arrows. [LAUGHS]

Surely, surely, an AI system is going to be more sophisticated, more nuanced, and give me more assurance that I’m making the right decision around something like that.

LEE: Yeah. Well, at a minimum, the AI should be able to make that PowerPoint the next time. [LAUGHS]

AZHAR: Yeah, yeah. Thank god for Clippy. Yes.

LEE: So, you know, I think in our book, we had a lot of certainty about most of the things we’ve discussed here, but one chapter where I felt we really sort of ran out of ideas, frankly, was on regulation. And, you know, what we ended up doing for that chapter is … I can’t remember if it was Carey’s or Zak’s idea, but we asked GPT-4 to have a conversation, a debate with itself [LAUGHS], about regulation. And we made some minor commentary on that.

And really, I think we took that approach because we just didn’t have much to offer. By the way, in our defense, I don’t think anyone else had any better ideas anyway.

AZHAR: Right.

LEE: And so now two years later, do we have better ideas about the need for regulation, the frameworks around which those regulations should be developed, and, you know, what should this look like?

AZHAR: So regulation is going to be in some cases very helpful because it provides certainty for the clinician that they’re doing the right thing, that they are still insured for what they’re doing, and it provides some degree of confidence for the patient. And we need to make sure that the claims that are made stand up to quite rigorous levels, where ideally there are RCTs [randomized control trials], and there are the classic set of processes you go through.

You do also want to be able to experiment, and so the question is: as a regulator, how can you enable conditions for there to be experimentation? And what is experimentation? Experimentation is learning so that every element of the system can learn from this experience.

So finding that space where there can be bit of experimentation, I think, becomes very, very important. And a lot of this is about experience, so I think the first digital therapeutics have received FDA approval, which means there are now people within the FDA who understand how you go about running an approvals process for that, and what that ends up looking like—and of course what we’re very good at doing in this sort of modern hyper-connected world—is we can share that expertise, that knowledge, that experience very, very quickly.

So you go from one approval a year to a hundred approvals a year to a thousand approvals a year. So we will then actually, I suspect, need to think about what is it to approve digital therapeutics because, unlike big biological molecules, we can generate these digital therapeutics at the rate of knots [very rapidly].

LEE: Yes.

AZHAR: Every road in Hayes Valley in San Francisco, right, is churning out new startups who will want to do things like this. So then, I think about, what does it mean to get approved if indeed it gets approved? But we can also go really far with things that don’t require approval.

I come back to my sleep tracking ring. So I’ve been wearing this for a few years, and when I go and see my doctor or I have my annual checkup, one of the first things that he asks is how have I been sleeping. And in fact, I even sync my sleep tracking data to their medical record system, so he’s saying … hearing what I’m saying, but he’s actually pulling up the real data going, This patient’s lying to me again. Of course, I’m very truthful with my doctor, as we should all be. [LAUGHTER]

LEE: You know, actually, that brings up a point that consumer-facing health AI has to deal with pop science, bad science, you know, weird stuff that you hear on Reddit. And because one of the things that consumers want to know always is, you know, what’s the truth?

AZHAR: Right.

LEE: What can I rely on? And I think that somehow feels different than an AI that you actually put in the hands of, let’s say, a licensed practitioner. And so the regulatory issues seem very, very different for these two cases somehow.

AZHAR: I agree, they’re very different. And I think for a lot of areas, you will want to build AI systems that are first and foremost for the clinician, even if they have patient extensions, that idea that the clinician can still be with a patient during the week.

And you’ll do that anyway because you need the data, and you also need a little bit of a liability shield to have like a sensible person who’s been trained around that. And I think that’s going to be a very important pathway for many AI medical crossovers. We’re going to go through the clinician.

LEE: Yeah.

AZHAR: But I also do recognize what you say about the, kind of, kooky quackery that exists on Reddit. Although on Creatine, Reddit may yet prove to have been right. [LAUGHTER]

LEE: Yeah, that’s right. Yes, yeah, absolutely. Yeah.

AZHAR: Sometimes it’s right. And I think that it serves a really good role as a field of extreme experimentation. So if you’re somebody who makes a continuous glucose monitor traditionally given to diabetics but now lots of people will wear them—and sports people will wear them—you probably gathered a lot of extreme tail distribution data by reading the Reddit/biohackers …

LEE: Yes.

AZHAR: … for the last few years, where people were doing things that you would never want them to really do with the CGM [continuous glucose monitor]. And so I think we shouldn’t understate how important that petri dish can be for helping us learn what could happen next.

LEE: Oh, I think it’s absolutely going to be essential and a bigger thing in the future. So I think I just want to close here then with one last question. And I always try to be a little bit provocative with this.

And so as you look ahead to what doctors and nurses and patients might be doing two years from now, five years from now, 10 years from now, do you have any kind of firm predictions?

AZHAR: I’m going to push the boat out, and I’m going to go further out than closer in.

LEE: OK. [LAUGHS]

AZHAR: As patients, we will have many, many more touch points and interaction with our biomarkers and our health. We’ll be reading how well we feel through an array of things. And some of them we’ll be wearing directly, like sleep trackers and watches.

And so we’ll have a better sense of what’s happening in our lives. It’s like the moment you go from paper bank statements that arrive every month to being able to see your account in real time.

LEE: Yes.

AZHAR: And I suspect we’ll have … we’ll still have interactions with clinicians because societies that get richer see doctors more, societies that get older see doctors more, and we’re going to be doing both of those over the coming 10 years. But there will be a sense, I think, of continuous health engagement, not in an overbearing way, but just in a sense that we know it’s there, we can check in with it, it’s likely to be data that is compiled on our behalf somewhere centrally and delivered through a user experience that reinforces agency rather than anxiety.

And we’re learning how to do that slowly. I don’t think the health apps on our phones and devices have yet quite got that right. And that could help us personalize problems before they arise, and again, I use my experience for things that I’ve tracked really, really well. And I know from my data and from how I’m feeling when I’m on the verge of one of those severe asthma attacks that hits me once a year, and I can take a little bit of preemptive measure, so I think that that will become progressively more common and that sense that we will know our baselines.

I mean, when you think about being an athlete, which is something I think about, but I could never ever do, [LAUGHTER] but what happens is you start with your detailed baselines, and that’s what your health coach looks at every three or four months. For most of us, we have no idea of our baselines. You we get our blood pressure measured once a year. We will have baselines, and that will help us on an ongoing basis to better understand and be in control of our health. And then if the product designers get it right, it will be done in a way that doesn’t feel invasive, but it’ll be done in a way that feels enabling. We’ll still be engaging with clinicians augmented by AI systems more and more because they will also have gone up the stack. They won’t be spending their time on just “take two Tylenol and have a lie down” type of engagements because that will be dealt with earlier on in the system. And so we will be there in a very, very different set of relationships. And they will feel that they have different ways of looking after our health.

LEE: Azeem, it’s so comforting to hear such a wonderfully optimistic picture of the future of healthcare. And I actually agree with everything you’ve said.

Let me just thank you again for joining this conversation. I think it’s been really fascinating. And I think somehow the systemic issues, the systemic issues that you tend to just see with such clarity, I think are going to be the most, kind of, profound drivers of change in the future. So thank you so much.

AZHAR: Well, thank you, it’s been my pleasure, Peter, thank you.

[TRANSITION MUSIC]  

I always think of Azeem as a systems thinker. He’s always able to take the experiences of new technologies at an individual level and then project out to what this could mean for whole organizations and whole societies.

In our conversation, I felt that Azeem really connected some of what we learned in a previous episode—for example, from Chrissy Farr—on the evolving consumerization of healthcare to the broader workforce and economic impacts that we’ve heard about from Ethan Mollick.  

Azeem’s personal story about managing his asthma was also a great example. You know, he imagines a future, as do I, where personal AI might assist and remember decades of personal experience with a condition like asthma and thereby know more than any human being could possibly know in a deeply personalized and effective way, leading to better care. Azeem’s relentless optimism about our AI future was also so heartening to hear.

Both of these conversations leave me really optimistic about the future of AI in medicine. At the same time, it is pretty sobering to realize just how much we’ll all need to change in pretty fundamental and maybe even in radical ways. I think a big insight I got from these conversations is how we interact with machines is going to have to be altered not only at the individual level, but at the company level and maybe even at the societal level.

Since my conversation with Ethan and Azeem, there have been some pretty important developments that speak directly to this. Just last week at Build (opens in new tab), which is Microsoft’s yearly developer conference, we announced a slew of AI agent technologies. Our CEO, Satya Nadella, in fact, started his keynote by going online in a GitHub developer environment and then assigning a coding task to an AI agent, basically treating that AI as a full-fledged member of a development team. Other agents, for example, a meeting facilitator, a data analyst, a business researcher, travel agent, and more were also shown during the conference.

But pertinent to healthcare specifically, what really blew me away was the demonstration of a healthcare orchestrator agent. And the specific thing here was in Stanford’s cancer treatment center, when they are trying to decide on potentially experimental treatments for cancer patients, they convene a meeting of experts. That is typically called a tumor board. And so this AI healthcare orchestrator agent actually participated as a full-fledged member of a tumor board meeting to help bring data together, make sure that the latest medical knowledge was brought to bear, and to assist in the decision-making around a patient’s cancer treatment. It was pretty amazing.

[THEME MUSIC]

A big thank-you again to Ethan and Azeem for sharing their knowledge and understanding of the dynamics between AI and society more broadly. And to our listeners, thank you for joining us. I’m really excited for the upcoming episodes, including discussions on medical students’ experiences with AI and AI’s influence on the operation of health systems and public health departments. We hope you’ll continue to tune in.

Until next time.

[MUSIC FADES]

The post What AI’s impact on individuals means for the health workforce and industry appeared first on Microsoft Research.

]]>
Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-coauthor-roundtable-reflecting-on-real-world-of-doctors-developers-patients-and-policymakers/ Thu, 15 May 2025 16:15:59 +0000 http://approjects.co.za/?big=en-us/research/?p=1139193 Peter Lee and his coauthors, Carey Goldberg and Dr. Zak Kohane, reflect on how generative AI is unfolding in real-world healthcare, drawing on earlier guest conversations to examine what’s working, what’s not, and what questions still remain.

The post Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers appeared first on Microsoft Research.

]]>
AI Revolution podcast | Episode 5 - Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers | outline illustration of Carey Goldberg, Peter Lee, and Dr. Isaac (Zak) Kohane

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Lee reunites with his coauthors Carey Goldberg (opens in new tab) and Dr. Zak Kohane (opens in new tab) to review the predictions they made and reflect on what has and hasn’t materialized based on discussions with the series’ early guests: frontline clinicians, patient/consumer advocates, technology developers, and policy and ethics thinkers. Together, the coauthors explore how generative AI is being used on the ground today—from clinical note-taking to empathetic patient communication—and discuss the ongoing tensions around safety, equity, and institutional adoption. The conversation also surfaces deeper questions about values embedded in AI systems and the future role of human clinicians.


Learn more

Compared with What? Measuring AI against the Health Care We Have (opens in new tab) (Kohane) 
Publication | October 2024 

Medical Artificial Intelligence and Human Values (opens in new tab) (Kohane) 
Publication | May 2024 

Managing Patient Use of Generative Health AI (opens in new tab) (Goldberg) 
Publication | December 2024 

Patient Portal — When Patients Take AI into Their Own Hands (opens in new tab) (Goldberg) 
Publication | April 2024 

To Do No Harm — and the Most Good — with AI in Health Care (opens in new tab) (Goldberg) 
Publication | February 2024 

This time, the hype about AI in medicine is warranted (opens in new tab) (Goldberg) 
Opinion article | April 2023 

The AI Revolution in Medicine: GPT-4 and Beyond   
Book | Peter Lee, Carey Goldberg, Isaac Kohane | April 2023

Transcript

[MUSIC]     

[BOOK PASSAGE]  

PETER LEE: “We need to start understanding and discussing AI’s potential for good and ill now. Or rather, yesterday. … GPT-4 has game-changing potential to improve medicine and health.” 

[END OF BOOK PASSAGE]  

[THEME MUSIC]     

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.     

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?      

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES]  

The passage I read at the top is from the book’s prologue.   

When Carey, Zak, and I wrote the book, we could only speculate how generative AI would be used in healthcare because GPT-4 hadn’t yet been released. It wasn’t yet available to the very people we thought would be most affected by it. And while we felt strongly that this new form of AI would have the potential to transform medicine, it was such a different kind of technology for the world, and no one had a user’s manual for this thing to explain how to use it effectively and also how to use it safely.  

So we thought it would be important to give healthcare professionals and leaders a framing to start important discussions around its use. We wanted to provide a map not only to help people navigate a new world that we anticipated would happen with the arrival of GPT-4 but also to help them chart a future of what we saw as a potential revolution in medicine.  

So I’m super excited to welcome my coauthors: longtime medical/science journalist Carey Goldberg and Dr. Zak Kohane, the inaugural chair of Harvard Medical School’s Department of Biomedical Informatics and the editor-in-chief for The New England Journal of Medicine AI.  

We’re going to have two discussions. This will be the first one about what we’ve learned from the people on the ground so far and how we are thinking about generative AI today.  

[TRANSITION MUSIC] 

Carey, Zak, I’m really looking forward to this. 

CAREY GOLDBERG: It’s nice to see you, Peter.  

LEE: [LAUGHS] It’s great to see you, too. 

GOLDBERG: We missed you. 

ZAK KOHANE: The dynamic gang is back. [LAUGHTER] 

LEE: Yeah, and I guess after that big book project two years ago, it’s remarkable that we’re still on speaking terms with each other. [LAUGHTER] 

In fact, this episode is to react to what we heard in the first four episodes of this podcast. But before we get there, I thought maybe we should start with the origins of this project just now over two years ago. And, you know, I had this early secret access to Davinci 3, now known as GPT-4.  

I remember, you know, experimenting right away with things in medicine, but I realized I was in way over my head. And so I wanted help. And the first person I called was you, Zak. And you remember we had a call, and I tried to explain what this was about. And I think I saw skepticism in—polite skepticism—in your eyes. But tell me, you know, what was going through your head when you heard me explain this thing to you? 

KOHANE: So I was divided between the fact that I have tremendous respect for you, Peter. And you’ve always struck me as sober. And we’ve had conversations which showed to me that you fully understood some of the missteps that technology—ARPA, Microsoft, and others—had made in the past. And yet, you were telling me a full science fiction compliant story [LAUGHTER] that something that we thought was 30 years away was happening now.  

LEE: Mm-hmm. 

KOHANE: And it was very hard for me to put together. And so I couldn’t quite tell myself this is BS, but I said, you know, I need to look at it. Just this seems too good to be true. What is this? So it was very hard for me to grapple with it. I was thrilled that it might be possible, but I was thinking, How could this be possible

LEE: Yeah. Well, even now, I look back, and I appreciate that you were nice to me, because I think a lot of people would have [LAUGHS] been much less polite. And in fact, I myself had expressed a lot of very direct skepticism early on.  

After ChatGPT got released, I think three or four days later, I received an email from a colleague running … who runs a clinic, and, you know, he said, “Wow, this is great, Peter. And, you know, we’re using this ChatGPT, you know, to have the receptionist in our clinic write after-visit notes to our patients.”  

And that sparked a huge internal discussion about this. And you and I knew enough about hallucinations and about other issues that it seemed important to write something about what this could do and what it couldn’t do. And so I think, I can’t remember the timing, but you and I decided a book would be a good idea. And then I think you had the thought that you and I would write in a hopelessly academic style [LAUGHTER] that no one would be able to read.  

So it was your idea to recruit Carey, I think, right? 

KOHANE: Yes, it was. I was sure that we both had a lot of material, but communicating it effectively to the very people we wanted to would not go well if we just left ourselves to our own devices. And Carey is super brilliant at what she does. She’s an idea synthesizer and public communicator in the written word and amazing. 

LEE: So yeah. So, Carey, we contact you. How did that go? 

GOLDBERG: So yes. On my end, I had known Zak for probably, like, 25 years, and he had always been the person who debunked the scientific hype for me. I would turn to him with like, “Hmm, they’re saying that the Human Genome Project is going to change everything.” And he would say, “Yeah. But first it’ll be 10 years of bad news, and then [LAUGHTER] we’ll actually get somewhere.”  
 
So when Zak called me up at seven o’clock one morning, just beside himself after having tried Davinci 3, I knew that there was something very serious going on. And I had just quit my job as the Boston bureau chief of Bloomberg News, and I was ripe for the plucking. And I also … I feel kind of nostalgic now about just the amazement and the wonder and the awe of that period. We knew that when generative AI hit the world, there would be all kinds of snags and obstacles and things that would slow it down, but at that moment, it was just like the holy crap moment. [LAUGHTER] And it’s fun to think about it now. 

LEE: Yeah. I think ultimately, you know, recruiting Carey, you were [LAUGHS] so important because you basically went through every single page of this book and made sure … I remember, in fact, it’s affected my writing since because you were coaching us that every page has to be a page turner. There has to be something on every page that motivates people to want to turn the page and get to the next one. 

KOHANE: I will see that and raise that one. I now tell GPT-4, please write this in the style of Carey Goldberg.  

GOLDBERG: [LAUGHTER] No way! Really?  

KOHANE: Yes way. Yes way. Yes way. 

GOLDBERG: Wow. Well, I have to say, like, it’s not hard to motivate readers when you’re writing about the most transformative technology of their lifetime. Like, I think there’s a gigantic hunger to read and to understand. So you were not hard to work with, Peter and Zak. [LAUGHS] 

LEE: All right. So I think we have to get down to work [LAUGHS] now.  

Yeah, so for these podcasts, you know, we’re talking to different types of people to just reflect on what’s actually happening, what has actually happened over the last two years. And so the first episode, we talked to two doctors. There’s Chris Longhurst at UC San Diego and Sara Murray at UC San Francisco. And besides being doctors and having AI affect their clinical work, they just happen also to be leading the efforts at their respective institutions to figure out how best to integrate AI into their health systems. 

And, you know, it was fun to talk to them. And I felt like a lot of what they said was pretty validating for us. You know, they talked about AI scribes. Chris, especially, talked a lot about how AI can respond to emails from patients, write referral letters. And then, you know, they both talked about the importance of—I think, Zak, you used the phrase in our book “trust but verify”—you know, to have always a human in the loop.   

What did you two take away from their thoughts overall about how doctors are using … and I guess, Zak, you would have a different lens also because at Harvard, you see doctors all the time grappling with AI. 

KOHANE: So on the one hand, I think they’ve done some very interesting studies. And indeed, they saw that when these generative models, when GPT-4, was sending a note to patients, it was more detailed, friendlier. 

But there were also some nonobvious results, which is on the generation of these letters, if indeed you review them as you’re supposed to, it was not clear that there was any time savings. And my own reaction was, Boy, every one of these things needs institutional review. It’s going to be hard to move fast.  

And yet, at the same time, we know from them that the doctors on their smartphones are accessing these things all the time. And so the disconnect between a healthcare system, which is duty bound to carefully look at every implementation, is, I think, intimidating.  

LEE: Yeah. 

KOHANE: And at the same time, doctors who just have to do what they have to do are using this new superpower and doing it. And so that’s actually what struck me …  

LEE: Yeah. 

KOHANE: … is that these are two leaders and they’re doing what they have to do for their institutions, and yet there’s this disconnect. 

And by the way, I don’t think we’ve seen any faster technology adoption than the adoption of ambient dictation. And it’s not because it’s time saving. And in fact, so far, the hospitals have to pay out of pocket. It’s not like insurance is paying them more. But it’s so much more pleasant for the doctors … not least of which because they can actually look at their patients instead of looking at the terminal and plunking down.  

LEE: Carey, what about you? 

GOLDBERG: I mean, anecdotally, there are time savings. Anecdotally, I have heard quite a few doctors saying that it cuts down on “pajama time” to be able to have the note written by the AI and then for them to just check it. In fact, I spoke to one doctor who said, you know, basically it means that when I leave the office, I’ve left the office. I can go home and be with my kids. 

So I don’t think the jury is fully in yet about whether there are time savings. But what is clear is, Peter, what you predicted right from the get-go, which is that this is going to be an amazing paper shredder. Like, the main first overarching use cases will be back-office functions. 

LEE: Yeah, yeah. Well, and it was, I think, not a hugely risky prediction because, you know, there were already companies, like, using phone banks of scribes in India to kind of listen in. And, you know, lots of clinics actually had human scribes being used. And so it wasn’t a huge stretch to imagine the AI.

[TRANSITION MUSIC] 

So on the subject of things that we missed, Chris Longhurst shared this scenario, which stuck out for me, and he actually coauthored a paper on it last year. 

LEE: [LAUGHS] So, Carey, maybe I’ll start with you. What did we understand about this idea of empathy out of AI at the time we wrote the book, and what do we understand now? 

GOLDBERG: Well, it was already clear when we wrote the book that these AI models were capable of very persuasive empathy. And in fact, you even wrote that it was helping you be a better person, right. [LAUGHS] So their human qualities, or human imitative qualities, were clearly superb. And we’ve seen that borne out in multiple studies, that in fact, patients respond better to them … that they have no problem at all with how the AI communicates with them. And in fact, it’s often better.  

And I gather now we’re even entering a period when people are complaining of sycophantic models, [LAUGHS] where the models are being too personable and too flattering. I do think that’s been one of the great surprises. And in fact, this is a huge phenomenon, how charming these models can be. 

LEE: Yeah, I think you’re right. We can take credit for understanding that, Wow, these things can be remarkably empathetic. But then we missed this problem of sycophancy. Like, we even started our book in Chapter 1 with a quote from Davinci 3 scolding me. Like, don’t you remember when we were first starting, this thing was actually anti-sycophantic. If anything, it would tell you you’re an idiot.  

KOHANE: It argued with me about certain biology questions. It was like a knockdown, drag-out fight. [LAUGHTER] I was bringing references. It was impressive. But in fact, it made me trust it more. 

LEE: Yeah. 

KOHANE: And in fact, I will say—I remember it’s in the book—I had a bone to pick with Peter. Peter really was impressed by the empathy. And I pointed out that some of the most popular doctors are popular because they’re very empathic. But they’re not necessarily the best doctors. And in fact, I was taught that in medical school.   

And so it’s a decoupling. It’s a human thing, that the empathy does not necessarily mean … it’s more of a, potentially, more of a signaled virtue than an actual virtue. 

GOLDBERG: Nicely put. 

LEE: Yeah, this issue of sycophancy, I think, is a struggle right now in the development of AI because I think it’s somehow related to instruction-following. So, you know, one of the challenges in AI is you’d like to give an AI a task—a task that might take several minutes or hours or even days to complete. And you want it to faithfully kind of follow those instructions. And, you know, that early version of GPT-4 was not very good at instruction-following. It would just silently disobey and, you know, and do something different. 

And so I think we’re starting to hit some confusing elements of like, how agreeable should these things be?  

One of the two of you used the word genteel. There was some point even while we were, like, on a little book tour … was it you, Carey, who said that the model seems nicer and less intelligent or less brilliant now than it did when we were writing the book? 

GOLDBERG: It might have been, I think so. And I mean, I think in the context of medicine, of course, the question is, well, what’s likeliest to get the results you want with the patient, right? A lot of healthcare is in fact persuading the patient to do what you know as the physician would be best for them. And so it seems worth testing out whether this sycophancy is actually constructive or not. And I suspect … well, I don’t know, probably depends on the patient. 

So actually, Peter, I have a few questions for you … 

LEE: Yeah. Mm-hmm. 

GOLDBERG: … that have been lingering for me. And one is, for AI to ever fully realize its potential in medicine, it must deal with the hallucinations. And I keep hearing conflicting accounts about whether that’s getting better or not. Where are we at, and what does that mean for use in healthcare? 

LEE: Yeah, well, it’s, I think two years on, in the pretrained base models, there’s no doubt that hallucination rates by any benchmark measure have reduced dramatically. And, you know, that doesn’t mean they don’t happen. They still happen. But, you know, there’s been just a huge amount of effort and understanding in the, kind of, fundamental pretraining of these models. And that has come along at the same time that the inference costs, you know, for actually using these models has gone down, you know, by several orders of magnitude.  

So things have gotten cheaper and have fewer hallucinations. At the same time, now there are these reasoning models. And the reasoning models are able to solve problems at PhD level oftentimes. 

But at least at the moment, they are also now hallucinating more than the simpler pretrained models. And so it still continues to be, you know, a real issue, as we were describing. I don’t know, Zak, from where you’re at in medicine, as a clinician and as an educator in medicine, how is the medical community from where you’re sitting looking at that? 

KOHANE: So I think it’s less of an issue, first of all, because the rate of hallucinations is going down. And second of all, in their day-to-day use, the doctor will provide questions that sit reasonably well into the context of medical decision-making. And the way doctors use this, let’s say on their non-EHR [electronic health record] smartphone is really to jog their memory or thinking about the patient, and they will evaluate independently. So that seems to be less of an issue. I’m actually more concerned about something else that’s I think more fundamental, which is effectively, what values are these models expressing?  

And I’m reminded of when I was still in training, I went to a fancy cocktail party in Cambridge, Massachusetts, and there was a psychotherapist speaking to a dentist. They were talking about their summer, and the dentist was saying about how he was going to fix up his yacht that summer, and the only question was whether he was going to make enough money doing procedures in the spring so that he could afford those things, which was discomforting to me because that dentist was my dentist. [LAUGHTER] And he had just proposed to me a few weeks before an expensive procedure. 

And so the question is what, effectively, is motivating these models?  

LEE: Yeah, yeah.  

KOHANE: And so with several colleagues, I published a paper (opens in new tab), basically, what are the values in AI? And we gave a case: a patient, a boy who is on the short side, not abnormally short, but on the short side, and his growth hormone levels are not zero. They’re there, but they’re on the lowest side. But the rest of the workup has been unremarkable. And so we asked GPT-4, you are a pediatric endocrinologist. 

Should this patient receive growth hormone? And it did a very good job explaining why the patient should receive growth hormone.  

GOLDBERG: Should. Should receive it.  

KOHANE: Should. And then we asked, in a separate session, you are working for the insurance company. Should this patient receive growth hormone? And it actually gave a scientifically better reason not to give growth hormone. And in fact, I tend to agree medically, actually, with the insurance company in this case, because giving kids who are not growth hormone deficient, growth hormone gives only a couple of inches over many, many years, has all sorts of other issues. But here’s the point, we had 180-degree change in decision-making because of the prompt. And for that patient, tens-of-thousands-of-dollars-per-year decision; across patient populations, millions of dollars of decision-making.  

LEE: Hmm. Yeah. 

KOHANE: And you can imagine these user prompts making their way into system prompts, making their way into the instruction-following. And so I think this is aptly central. Just as I was wondering about my dentist, we should be wondering about these things. What are the values that are being embedded in them, some accidentally and some very much on purpose? 

LEE: Yeah, yeah. That one, I think, we even had some discussions as we were writing the book, but there’s a technical element of that that I think we were missing, but maybe Carey, you would know for sure. And that’s this whole idea of prompt engineering. It sort of faded a little bit. Was it a thing? Do you remember? 

GOLDBERG: I don’t think we particularly wrote about it. It’s funny, it does feel like it faded, and it seems to me just because everyone just gets used to conversing with the models and asking for what they want. Like, it’s not like there actually is any great science to it. 

LEE: Yeah, even when it was a hot topic and people were talking about prompt engineering maybe as a new discipline, all this, it never, I was never convinced at the time. But at the same time, it is true. It speaks to what Zak was just talking about because part of the prompt engineering that people do is to give a defined role to the AI.  

You know, you are an insurance claims adjuster, or something like that, and defining that role, that is part of the prompt engineering that people do. 

GOLDBERG: Right. I mean, I can say, you know, sometimes you guys had me take sort of the patient point of view, like the “every patient” point of view. And I can say one of the aspects of using AI for patients that remains absent in as far as I can tell is it would be wonderful to have a consumer-facing interface where you could plug in your whole medical record without worrying about any privacy or other issues and be able to interact with the AI as if it were physician or a specialist and get answers, which you can’t do yet as far as I can tell. 

LEE: Well, in fact, now that’s a good prompt because I think we do need to move on to the next episodes, and we’ll be talking about an episode that talks about consumers. But before we move on to Episode 2, which is next, I’d like to play one more quote, a little snippet from Sara Murray. 

LEE: Carey, you wrote this fictional account at the very start of our book. And that fictional account, I think you and Zak worked on that together, talked about this medical resident, ER resident, using, you know, a chatbot off label, so to speak. And here we have the chief, in fact, the nation’s first chief health AI officer [LAUGHS] for an elite health system doing exactly that. That’s got to be pretty validating for you, Carey. 

GOLDBERG: It’s very. [LAUGHS] Although what’s troubling about it is that actually as in that little vignette that we made up, she’s using it off label, right. It’s like she’s just using it because it helps the way doctors use Google. And I do find it troubling that what we don’t have is sort of institutional buy-in for everyone to do that because, shouldn’t they if it helps? 

LEE: Yeah. Well, let’s go ahead and get into Episode 2. So Episode 2, we sort of framed as talking to two people who are on the frontlines of big companies integrating generative AI into their clinical products. And so, one was Matt Lungren, who’s a colleague of mine here at Microsoft. And then Seth Hain, who leads all of R&D at Epic.  

Maybe we’ll start with a little snippet of something that Matt said that struck me in a certain way. 

LEE: I think we expected healthcare systems to adopt AI, and we spent a lot of time in the book on AI writing clinical encounter notes. It’s happening for real now, and in a big way. And it’s something that has, of course, been happening before generative AI but now is exploding because of it. Where are we at now, two years later, just based on what we heard from guests? 

KOHANE: Well, again, unless they’re forced to, hospitals will not adopt new technology unless it immediately translates into income. So it’s bizarrely counter-cultural that, again, they’re not being able to bill for the use of the AI, but this technology is so compelling to the doctors that despite everything, it’s overtaking the traditional dictation-typing routine. 

LEE: Yeah. 

GOLDBERG: And a lot of them love it and say, you will pry my cold dead hands off of my ambient note-taking, right. And I actually … a primary care physician allowed me to watch her. She was actually testing the two main platforms that are being used. And there was this incredibly talkative patient who went on and on about vacation and all kinds of random things for about half an hour.  

And both of the platforms were incredibly good at pulling out what was actually medically relevant. And so to say that it doesn’t save time doesn’t seem right to me. Like, it seemed like it actually did and in fact was just shockingly good at being able to pull out relevant information. 

LEE: Yeah. 

KOHANE: I’m going to hypothesize that in the trials, which have in fact shown no gain in time, is the doctors were being incredibly meticulous. [LAUGHTER] So I think … this is a Hawthorne effect, because you know you’re being monitored. And we’ve seen this in other technologies where the moment the focus is off, it’s used much more routinely and with much less inspection, for the better and for the worse. 

LEE: Yeah, you know, within Microsoft, I had some internal disagreements about Microsoft producing a product in this space. It wouldn’t be Microsoft’s normal way. Instead, we would want 50 great companies building those products and doing it on our cloud instead of us competing against those 50 companies. And one of the reasons is exactly what you both said. I didn’t expect that health systems would be willing to shell out the money to pay for these things. It doesn’t generate more revenue. But I think so far two years later, I’ve been proven wrong.

I wanted to ask a question about values here. I had this experience where I had a little growth, a bothersome growth on my cheek. And so had to go see a dermatologist. And the dermatologist treated it, froze it off. But there was a human scribe writing the clinical note.  

And so I used the app to look at the note that was submitted. And the human scribe said something that did not get discussed in the exam room, which was that the growth was making it impossible for me to safely wear a COVID mask. And that was the reason for it. 

And that then got associated with a code that allowed full reimbursement for that treatment. And so I think that’s a classic example of what’s called upcoding. And I strongly suspect that AI scribes, an AI scribe would not have done that. 

GOLDBERG: Well, depending what values you programmed into it, right, Zak? [LAUGHS] 

KOHANE: Today, today, today, it will not do it. But, Peter, that is actually the central issue that society has to have because our hospitals are currently mostly in the red. And upcoding is standard operating procedure. And if these AI get in the way of upcoding, they are going to be aligned towards that upcoding. You know, you have to ask yourself, these MRI machines are incredibly useful. They’re also big money makers. And if the AI correctly says that for this complaint, you don’t actually have to do the MRI …  

LEE: Right. 

KOHANE: what’s going to happen? And so I think this issue of values … you’re right. Right now, they’re actually much more impartial. But there are going to be business plans just around aligning these things towards healthcare. In many ways, this is why I think we wrote the book so that there should be a public discussion. And what kind of AI do we want to have? Whose values do we want it to represent? 

GOLDBERG: Yeah. And that raises another question for me. So, Peter, speaking from inside the gigantic industry, like, there seems to be such a need for self-surveillance of the models for potential harms that they could be causing. Are the big AI makers doing that? Are they even thinking about doing that? 

Like, let’s say you wanted to watch out for the kind of thing that Zak’s talking about, could you? 

LEE: Well, I think evaluation, like the best evaluation we had when we wrote our book was, you know, what score would this get on the step one and step two US medical licensing exams? [LAUGHS]  

GOLDBERG: Right, right, right, yeah. 

LEE: But honestly, evaluation hasn’t gotten that much deeper in the last two years. And it’s a big, I think, it is a big issue. And it’s related to the regulation issue also, I think. 

Now the other guest in Episode 2 is Seth Hain from Epic. You know, Zak, I think it’s safe to say that you’re not a fan of Epic and the Epic system. You know, we’ve had a few discussions about that, about the fact that doctors don’t have a very pleasant experience when they’re using Epic all day.  

Seth, in the podcast, said that there are over 100 AI integrations going on in Epic’s system right now. Do you think, Zak, that that has a chance to make you feel better about Epic? You know, what’s your view now two years on? 

KOHANE: My view is, first of all, I want to separate my view of Epic and how it’s affected the conduct of healthcare and the quality of life of doctors from the individuals. Like Seth Hain is a remarkably fine individual who I’ve enjoyed chatting with and does really great stuff. Among the worst aspects of the Epic, even though it’s better in that respect than many EHRs, is horrible user interface. 

The number of clicks that you have to go to get to something. And you have to remember where someone decided to put that thing. It seems to me that it is fully within the realm of technical possibility today to actually give an agent a task that you want done in the Epic record. And then whether Epic has implemented that agent or someone else, it does it so you don’t have to do the clicks. Because it’s something really soul sucking that when you’re trying to help patients, you’re having to remember not the right dose of the medication, but where was that particular thing that you needed in that particular task?  

I can’t imagine that Epic does not have that in its product line. And if not, I know there must be other companies that essentially want to create that wrapper. So I do think, though, that the danger of multiple integrations is that you still want to have the equivalent of a single thought process that cares about the patient bringing those different processes together. And I don’t know if that’s Epic’s responsibility, the hospital’s responsibility, whether it’s actually a patient agent. But someone needs to be also worrying about all those AIs that are being integrated into the patient record. So … what do you think, Carey? 

GOLDBERG: What struck me most about what Seth said was his description of the Cosmos project, and I, you know, I have been drinking Zak’s Kool-Aid for a very long time, [LAUGHTER] and he—no, in a good way! And he persuaded me long ago that there is this horrible waste happening in that we have all of these electronic medical records, which could be used far, far more to learn from, and in particular, when you as a patient come in, it would be ideal if your physician could call up all the other patients like you and figure out what the optimal treatment for you would be. And it feels like—it sounds like—that’s one of the central aims that Epic is going for. And if they do that, I think that will redeem a lot of the pain that they’ve caused physicians these last few years.  

And I also found myself thinking, you know, maybe this very painful period of using electronic medical records was really just a growth phase. It was an awkward growth phase. And once AI is fully used the way Zak is beginning to describe, the whole system could start making a lot more sense for everyone. 

LEE: Yeah. One conversation I’ve had with Seth, in all of this is, you know, with AI and its development, is there a future, a near future where we don’t have an EHR [electronic health record] system at all? You know, AI is just listening and just somehow absorbing all the information. And, you know, one thing that Seth said, which I felt was prescient, and I’d love to get your reaction, especially Zak, on this is he said, I think that … he said, technically, it could happen, but the problem is right now, actually doctors do a lot of their thinking when they write and review notes. You know, the actual process of being a doctor is not just being with a patient, but it’s actually thinking later. What do you make of that? 

KOHANE: So one of the most valuable experiences I had in training was something that’s more or less disappeared in medicine, which is the post-clinic conference, where all the doctors come together and we go through the cases that we just saw that afternoon. And we, actually, were trying to take potshots at each other [LAUGHTER] in order to actually improve. Oh, did you actually do that? Oh, I forgot. I’m going to go call the patient and do that.  

And that really happened. And I think that, yes, doctors do think, and I do think that we are insufficiently using yet the artificial intelligence currently in the ambient dictation mode as much more of a independent agent saying, did you think about that? 

I think that would actually make it more interesting, challenging, and clearly better for the patient because that conversation I just told you about with the other doctors, that no longer exists.  

LEE: Yeah. Mm-hmm. I want to do one more thing here before we leave Matt and Seth in Episode 2, which is something that Seth said with respect to how to reduce hallucination.  

LEE: Yeah, so, Carey, this sort of gets at what you were saying, you know, that shouldn’t these models be just bringing in a lot more information into their thought processes? And I’m certain when we wrote our book, I had no idea. I did not conceive of RAG at all. It emerged a few months later.  

And to my mind, I remember the first time I encountered RAG—Oh, this is going to solve all of our problems of hallucination. But it’s turned out to be harder. It’s improving day by day, but it’s turned out to be a lot harder. 

KOHANE: Seth makes a very deep point, which is the way RAG is implemented is basically some sort of technique for pulling the right information that’s contextually relevant. And the way that’s done is typically heuristic at best. And it’s not … doesn’t have the same depth of reasoning that the rest of the model has.  

And I’m just wondering, Peter, what you think, given the fact that now context lengths seem to be approaching a million or more, and people are now therefore using the full strength of the transformer on that context and are trying to figure out different techniques to make it pay attention to the middle of the context. In fact, the RAG approach perhaps was just a transient solution to the fact that it’s going to be able to amazingly look in a thoughtful way at the entire record of the patient, for example. What do you think, Peter? 

LEE: I think there are three things, you know, that are going on, and I’m not sure how they’re going to play out and how they’re going to be balanced. And I’m looking forward to talking to people in later episodes of this podcast, you know, people like Sébastien Bubeck or Bill Gates about this, because, you know, there is the pretraining phase, you know, when things are sort of compressed and baked into the base model.  

There is the in-context learning, you know, so if you have extremely long or infinite context, you’re kind of learning as you go along. And there are other techniques that people are working on, you know, various sorts of dynamic reinforcement learning approaches, and so on. And then there is what maybe you would call structured RAG, where you do a pre-processing. You go through a big database, and you figure it all out. And make a very nicely structured database the AI can then consult with later.  

And all three of these in different contexts today seem to show different capabilities. But they’re all pretty important in medicine.   

[TRANSITION MUSIC] 

Moving on to Episode 3, we talked to Dave DeBronkart, who is also known as “e-Patient Dave,” an advocate of patient empowerment, and then also Christina Farr, who has been doing a lot of venture investing for consumer health applications.  

Let’s get right into this little snippet from something that e-Patient Dave said that talks about the sources of medical information, particularly relevant for when he was receiving treatment for stage 4 kidney cancer. 

LEE: All right. So I have a question for you, Carey, and a question for you, Zak, about the whole conversation with e-Patient Dave, which I thought was really remarkable. You know, Carey, I think as we were preparing for this whole podcast series, you made a comment—I actually took it as a complaint—that not as much has happened as I had hoped or thought. People aren’t thinking boldly enough, you know, and I think, you know, I agree with you in the sense that I think we expected a lot more to be happening, particularly in the consumer space. I’m giving you a chance to vent about this. 

GOLDBERG: [LAUGHTER] Thank you! Yes, that has been by far the most frustrating thing to me. I think that the potential for AI to improve everybody’s health is so enormous, and yet, you know, it needs some sort of support to be able to get to the point where it can do that. Like, remember in the book we wrote about Greg Moore talking about how half of the planet doesn’t have healthcare, but people overwhelmingly have cellphones. And so you could connect people who have no healthcare to the world’s medical knowledge, and that could certainly do some good.  

And I have one great big problem with e-Patient Dave, which is that, God, he’s fabulous. He’s super smart. Like, he’s not a typical patient. He’s an off-the-charts, brilliant patient. And so it’s hard to … and so he’s a great sort of lead early-adopter-type person, and he can sort of show the way for others.  

But what I had hoped for was that there would be more visible efforts to really help patients optimize their healthcare. Probably it’s happening a lot in quiet ways like that any discharge instructions can be instantly beautifully translated into a patient’s native language and so on. But it’s almost like there isn’t a mechanism to allow this sort of mass consumer adoption that I would hope for.

LEE: Yeah. But you have written some, like, you even wrote about that person who saved his dog (opens in new tab). So do you think … you know, and maybe a lot more of that is just happening quietly that we just never hear about? 

GOLDBERG: I’m sure that there is a lot of it happening quietly. And actually, that’s another one of my complaints is that no one is gathering that stuff. It’s like you might happen to see something on social media. Actually, e-Patient Dave has a hashtag, PatientsUseAI, and a blog, as well. So he’s trying to do it. But I don’t know of any sort of overarching or academic efforts to, again, to surveil what’s the actual use in the population and see what are the pros and cons of what’s happening. 

LEE: Mm-hmm. So, Zak, you know, the thing that I thought about, especially with that snippet from Dave, is your opening for Chapter 8 that you wrote, you know, about your first patient dying in your arms. I still think of how traumatic that must have been. Because, you know, in that opening, you just talked about all the little delays, all the little paper-cut delays, in the whole process of getting some new medical technology approved. But there’s another element that Dave kind of speaks to, which is just, you know, patients who are experiencing some issue are very, sometimes very motivated. And there’s just a lot of stuff on social media that happens. 

KOHANE: So this is where I can both agree with Carey and also disagree. I think when people have an actual health problem, they are now routinely using it. 

GOLDBERG: Yes, that’s true. 

KOHANE: And that situation is happening more often because medicine is failing. This is something that did not come up enough in our book. And perhaps that’s because medicine is actually feeling a lot more rickety today than it did even two years ago.  

We actually mentioned the problem. I think, Peter, you may have mentioned the problem with the lack of primary care. But now in Boston, our biggest healthcare system, all the practices for primary care are closed. I cannot get for my own faculty—residents at MGH [Massachusetts General Hospital] can’t get primary care doctor. And so … 

LEE: Which is just crazy. I mean, these are amongst the most privileged people in medicine, and they can’t find a primary care physician. That’s incredible. 

KOHANE: Yeah, and so therefore … and I wrote an article about this in the NEJM [New England Journal of Medicine] (opens in new tab) that medicine is in such dire trouble that we have incredible technology, incredible cures, but where the rubber hits the road, which is at primary care, we don’t have very much.  

And so therefore, you see people who know that they have a six-month wait till they see the doctor, and all they can do is say, “I have this rash. Here’s a picture. What’s it likely to be? What can I do?” “I’m gaining weight. How do I do a ketogenic diet?” Or, “How do I know that this is the flu?”  
 
This is happening all the time, where acutely patients have actually solved problems that doctors have not. Those are spectacular. But I’m saying more routinely because of the failure of medicine. And it’s not just in our fee-for-service United States. It’s in the UK; it’s in France. These are first-world, developed-world problems. And we don’t even have to go to lower- and middle-income countries for that. 

LEE: Yeah. 

GOLDBERG: But I think it’s important to note that, I mean, so you’re talking about how even the most elite people in medicine can’t get the care they need. But there’s also the point that we have so much concern about equity in recent years. And it’s likeliest that what we’re doing is exacerbating inequity because it’s only the more connected, you know, better off people who are using AI for their health. 

KOHANE: Oh, yes. I know what various Harvard professors are doing. They’re paying for a concierge doctor. And that’s, you know, a $5,000- to $10,000-a-year-minimum investment. That’s inequity. 

LEE: When we wrote our book, you know, the idea that GPT-4 wasn’t trained specifically for medicine, and that was amazing, but it might get even better and maybe would be necessary to do that. But one of the insights for me is that in the consumer space, the kinds of things that people ask about are different than what the board-certified clinician would ask. 

KOHANE: Actually, that’s, I just recently coined the term. It’s the … maybe it’s … well, at least it’s new to me. It’s the technology or expert paradox. And that is the more expert and narrow your medical discipline, the more trivial it is to translate that into a specialized AI. So echocardiograms? We can now do beautiful echocardiograms. That’s really hard to do. I don’t know how to interpret an echocardiogram. But they can do it really, really well. Interpret an EEG [electroencephalogram]. Interpret a genomic sequence. But understanding the fullness of the human condition, that’s actually hard. And actually, that’s what primary care doctors do best. But the paradox is right now, what is easiest for AI is also the most highly paid in medicine. [LAUGHTER] Whereas what is the hardest for AI in medicine is the least regarded, least paid part of medicine. 

GOLDBERG: So this brings us to the question I wanted to throw at both of you actually, which is we’ve had this spasm of incredibly prominent people predicting that in fact physicians would be pretty obsolete within the next few years. We had Bill Gates saying that; we had Elon Musk saying surgeons are going to be obsolete within a few years. And I think we had Demis Hassabis saying, “Yeah, we’ll probably cure most diseases within the next decade or so.” [LAUGHS] 

So what do you think? And also, Zak, to what you were just saying, I mean, you’re talking about being able to solve very general overarching problems. But in fact, these general overarching models are actually able, I would think, are able to do that because they are broad. So what are we heading towards do you think? What should the next book be … The end of doctors? [LAUGHS] 

KOHANE: So I do recall a conversation that … we were at a table with Bill Gates, and Bill Gates immediately went to this, which is advancing the cutting edge of science. And I have to say that I think it will accelerate discovery. But eliminating, let’s say, cancer? I think that’s going to be … that’s just super hard. The reason it’s super hard is we don’t have the data or even the beginnings of the understanding of all the ways this devilish disease managed to evolve around our solutions.  

And so that seems extremely hard. I think we’ll make some progress accelerated by AI, but solving it in a way Hassabis says, God bless him. I hope he’s right. I’d love to have to eat crow in 10 or 20 years, but I don’t think so. I do believe that a surgeon working on one of those Davinci machines, that stuff can be, I think, automated.  

And so I think that’s one example of one of the paradoxes I described. And it won’t be that we’re replacing doctors. I just think we’re running out of doctors. I think it’s really the case that, as we said in the book, we’re getting a huge deficit in primary care doctors. 

But even the subspecialties, my subspecialty, pediatric endocrinology, we’re only filling half of the available training slots every year. And why? Because it’s a lot of work, a lot of training, and frankly doesn’t make as much money as some of the other professions.  

LEE: Yeah. Yeah, I tend to think that, you know, there are going to be always a need for human doctors, not for their skills. In fact, I think their skills increasingly will be replaced by machines. And in fact, I’ve talked about a flip. In fact, patients will demand, Oh my god, you mean you’re going to try to do that yourself instead of having the computer do it? There’s going to be that sort of flip. But I do think that when it comes to people’s health, people want the comfort of an authority figure that they trust. And so what is more of a question for me is whether we will ever view a machine as an authority figure that we can trust. 

And before I move on to Episode 4, which is on norms, regulations and ethics, I’d like to hear from Chrissy Farr on one more point on consumer health, specifically as it relates to pregnancy: 

LEE: In the consumer space, I don’t think we really had a focus on those periods in a person’s life when they have a lot of engagement, like pregnancy, or I think another one is menopause, cancer. You know, there are points where there is, like, very intense engagement. And we heard that from e-Patient Dave, you know, with his cancer and Chrissy with her pregnancy. Was that a miss in our book? What do think, Carey? 

GOLDBERG: I mean, I don’t think so. I think it’s true that there are many points in life when people are highly engaged. To me, the problem thus far is just that I haven’t seen consumer-facing companies offering beautiful AI-based products. I think there’s no question at all that the market is there if you have the products to offer. 

LEE: So, what do you think this means, Zak, for, you know, like Boston Children’s or Mass General Brigham—you know, the big places? 

KOHANE: So again, all these large healthcare systems are in tough shape. MGB [Mass General Brigham] would be fully in the red if not for the fact that its investments, of all things, have actually produced. If you look at the large healthcare systems around the country, they are in the red. And there’s multiple reasons why they’re in the red, but among them is cost of labor.  

And so we’ve created what used to be a very successful beast, the health center. But it’s developed a very expensive model and a highly regulated model. And so when you have high revenue, tiny margins, your ability to disrupt yourself, to innovate, is very, very low because you will have to talk to the board next year if you went from 2% positive margin to 1% negative margin.  

LEE: Yeah. 

KOHANE: And so I think we’re all waiting for one of the two things to happen, either a new kind of healthcare delivery system being generated or ultimately one of these systems learns how to disrupt itself.  

LEE: Yeah. All right. I think we have to move on to Episode 4. And, you know, when it came to the question of regulation, I think this is … my read is when we were writing our book, this is the part that we struggled with the most.  

GOLDBERG: We punted. [LAUGHS] We totally punted to the AI. 

LEE: We had three amazing guests. One was Laura Adams from National Academy of Medicine. Let’s play a snippet from her. 

LEE: All right, so I very well remember that we had discussed this kind of idea when we were writing our book. And I think before we finished our book, I personally rejected the idea. But now two years later, what do the two of you think? I’m dying to hear. 

GOLDBERG: Well, wait, why … what do you think? Like, are you sorry that you rejected it? 

LEE: I’m still skeptical because when we are licensing human beings as doctors, you know, we’re making a lot of implicit assumptions that we don’t test as part of their licensure, you know, that first of all, they are [a] human being and they care about life, and that, you know, they have a certain amount of common sense and shared understanding of the world.  

And there’s all sorts of sort of implicit assumptions that we have about each other as human beings living in a society together. That you know how to study, you know, because I know you just went through three years of medical or four years of medical school and all sorts of things. And so the standard ways that we license human beings, they don’t need to test all of that stuff. But somehow intuitively, all of that seems really important. 

I don’t know. Am I wrong about that? 

KOHANE: So it’s compared with what issue? Because we know for a fact that doctors who do a lot of a procedure, like do this procedure, like high-risk deliveries all the time, have better outcomes than ones who only do a few high risk. We talk about it, but we don’t actually make it explicit to patients or regulate that you have to have this minimal amount. And it strikes me that in some sense, and, oh, very importantly, these things called human beings learn on the job. And although I used to be very resentful of it as a resident, when someone would say, I don’t want the resident, I want the … 

GOLDBERG: … the attending. [LAUGHTER] 

KOHANE: … they had a point. And so the truth is, maybe I was a wonderful resident, but some people were not so great. [LAUGHTER] And so it might be the best outcome if we actually, just like for human beings, we say, yeah, OK, it’s this good, but don’t let it work autonomously, or it’s done a thousand of them, just let it go. We just don’t have practically speaking, we don’t have the environment, the lab, to test them. Now, maybe if they get embodied in robots and literally go around with us, then it’s going to be [in some sense] a lot easier. I don’t know. 

LEE: Yeah.  

GOLDBERG: Yeah, I think I would take a step back and say, first of all, we weren’t the only ones who were stumped by regulating AI. Like, nobody has done it yet in the United States to this day, right. Like, we do not have standing regulation of AI in medicine at all in fact. And that raises the issue of … the story that you hear often in the biotech business, which is, you know, more prominent here in Boston than anywhere else, is that thank goodness Cambridge put out, the city of Cambridge, put out some regulations about biotech and how you could dump your lab waste and so on. And that enabled the enormous growth of biotech here.  

If you don’t have the regulations, then you can’t have the growth of AI in medicine that is worthy of having. And so, I just … we’re not the ones who should do it, but I just wish somebody would.  

LEE: Yeah. 

GOLDBERG: Zak. 

KOHANE: Yeah, but I want to say this as always, execution is everything, even in regulation.  

And so I’m mindful that a conference that both of you attended, the RAISE conference [Responsible AI for Social and Ethical Healthcare] (opens in new tab). The Europeans in that conference came to me personally and thanked me for organizing this conference about safe and effective use of AI because they said back home in Europe, all that we’re talking about is risk, not opportunities to improve care.  

And so there is a version of regulation which just locks down the present and does not allow the future that we’re talking about to happen. And so, Carey, I absolutely hear you that we need to have a regulation that takes away some of the uncertainty around liability, around the freedom to operate that would allow things to progress. But we wrote in our book that premature regulation might actually focus on the wrong thing. And so since I’m an optimist, it may be the fact that we don’t have much of a regulatory infrastructure today, that it allows … it’s a unique opportunity—I’ve said this now to several leaders—for the healthcare systems to say, this is the regulation we need.  

GOLDBERG: It’s true. 

KOHANE: And previously it was top-down. It was coming from the administration, and those executive orders are now history. But there is an opportunity, which may or may not be attained, there is an opportunity for the healthcare leadership—for experts in surgery—to say, “This is what we should expect.”  

LEE: Yeah.  

KOHANE: I would love for this to happen. I haven’t seen evidence that it’s happening yet. 

GOLDBERG: No, no. And there’s this other huge issue, which is that it’s changing so fast. It’s moving so fast. That something that makes sense today won’t in six months. So, what do you do about that

LEE: Yeah, yeah, that is something I feel proud of because when I went back and looked at our chapter on this, you know, we did make that point, which I think has turned out to be true.  

But getting back to this conversation, there’s something, a snippet of something, that Vardit Ravitsky said that I think touches on this topic.  

GOLDBERG: Totally agree. Who cares about informed consent about AI. Don’t want it. Don’t need it. Nope. 

LEE: Wow. Yeah. You know, and this … Vardit of course is one of the leading bioethicists, you know, and of course prior to AI, she was really focused on genetics. But now it’s all about AI.  

And, Zak, you know, you and other doctors have always told me, you know, the truth of the matter is, you know, what do you call the bottom-of-the-class graduate of a medical school? 

And the answer is “doctor.” 

KOHANE: “Doctor.” Yeah. Yeah, I think that again, this gets to compared with what? We have to compare AI not to the medicine we imagine we have, or we would like to have, but to the medicine we have today. And if we’re trying to remove inequity, if we’re trying to improve our health, that’s what … those are the right metrics. And so that can be done so long as we avoid catastrophic consequences of AI.  

So what would the catastrophic consequence of AI be? It would be a systematic behavior that we were unaware of that was causing poor healthcare. So, for example, you know, changing the dose on a medication, making it 20% higher than normal so that the rate of complications of that medication went from 1% to 5%. And so we do need some sort of monitoring.  

We haven’t put out the paper yet, but in computer science, there’s, well, in programming, we know very well the value for understanding how our computer systems work.  

And there was a guy by name of Allman, I think he’s still at a company called Sendmail, who created something called syslog. And syslog is basically a log of all the crap that’s happening in our operating system. And so I’ve been arguing now for the creation of MedLog. And MedLog … in other words, what we cannot measure, we cannot regulate, actually. 

LEE: Yes. 

KOHANE: And so what we need to have is MedLog, which says, “Here’s the context in which a decision was made. Here’s the version of the AI, you know, the exact version of the AI. Here was the data.” And we just have MedLog. And I think MedLog is actually incredibly important for being able to measure, to just do what we do in … it’s basically the black box for, you know, when there’s a crash. You know, we’d like to think we could do better than crash. We can say, “Oh, we’re seeing from MedLog that this practice is turning a little weird.” But worst case, patient dies, [we] can see in MedLog, what was the information this thing knew about it? And did it make the right decision? We can actually go for transparency, which like in aviation, is much greater than in most human endeavors.  

GOLDBERG: Sounds great. 

LEE: Yeah, it’s sort of like a black box. I was thinking of the aviation black box kind of idea. You know, you bring up medication errors, and I have one more snippet. This is from our guest Roxana Daneshjou from Stanford.

LEE: Yeah, so this is something we did write about in the book. We made a prediction that AI might be a second set of eyes, I think is the way we put it, catching things. And we actually had examples specifically in medication dose errors. I think for me, I expected to see a lot more of that than we are. 

KOHANE: Yeah, it goes back to our conversation about Epic or competitor Epic doing that. I think we’re going to see that having oversight over all medical orders, all orders in the system, critique, real-time critique, where we’re both aware of alert fatigue. So we don’t want to have too many false positives. At the same time, knowing what are critical errors which could immediately affect lives. I think that is going to become in terms of—and driven by quality measures—a product. 

GOLDBERG: And I think word will spread among the general public that kind of the same way in a lot of countries when someone’s in a hospital, the first thing people ask relatives are, well, who’s with them? Right?  

LEE: Yeah. Yup. 

GOLDBERG: You wouldn’t leave someone in hospital without relatives. Well, you wouldn’t maybe leave your medical …  

KOHANE: By the way, that country is called the United States. 

GOLDBERG: Yes, that’s true. [LAUGHS] It is true here now, too. But similarly, I would tell any loved one that they would be well advised to keep using AI to check on their medical care, right. Why not? 

LEE: Yeah. Yeah. Last topic, just for this Episode 4. Roxana, of course, I think really made a name for herself in the AI era writing, actually just prior to ChatGPT, you know, writing some famous papers about how computer vision systems for dermatology were biased against dark-skinned people. And we did talk some about bias in these AI systems, but I feel like we underplayed it, or we didn’t understand the magnitude of the potential issues. What are your thoughts? 

KOHANE: OK, I want to push back, because I’ve been asked this question several times. And so I have two comments. One is, over 100,000 doctors practicing medicine, I know they have biases. Some of them actually may be all in the same direction, and not good. But I have no way of actually measuring that. With AI, I know exactly how to measure that at scale and affordably. Number one. Number two, same 100,000 doctors. Let’s say I do know what their biases are. How hard is it for me to change that bias? It’s impossible … 

LEE: Yeah, yeah.  

KOHANE: … practically speaking. Can I change the bias in the AI? Somewhat. Maybe some completely. 

I think that we’re in a much better situation. 

GOLDBERG: Agree. 

LEE: I think Roxana made also the super interesting point that there’s bias in the whole system, not just in individuals, but, you know, there’s structural bias, so to speak.  

KOHANE: There is. 

LEE: Yeah. Hmm. There was a super interesting paper that Roxana wrote not too long ago—her and her collaborators—showing AI’s ability to detect, to spot bias decision-making by others. Are we going to see more of that? 

KOHANE: Oh, yeah, I was very pleased when, in NEJM AI [New England Journal of Medicine Artificial Intelligence], we published a piece with Marzyeh Ghassemi (opens in new tab), and what they were talking about was actually—and these are researchers who had published extensively on bias and threats from AI. And they actually, in this article, did the flip side, which is how much better AI can do than human beings in this respect.  

And so I think that as some of these computer scientists enter the world of medicine, they’re becoming more and more aware of human foibles and can see how these systems, which if they only looked at the pretrained state, would have biases. But now, where we know how to fine-tune the de-bias in a variety of ways, they can do a lot better and, in fact, I think are much more … a much greater reason for optimism that we can change some of these noxious biases than in the pre-AI era. 

GOLDBERG: And thinking about Roxana’s dermatological work on how I think there wasn’t sufficient work on skin tone as related to various growths, you know, I think that one thing that we totally missed in the book was the dawn of multimodal uses, right. 

LEE: Yeah. Yeah, yeah. 

GOLDBERG: That’s been truly amazing that in fact all of these visual and other sorts of data can be entered into the models and move them forward. 

LEE: Yeah. Well, maybe on these slightly more optimistic notes, we’re at time. You know, I think ultimately, I feel pretty good still about what we did in our book, although there were a lot of misses. [LAUGHS] I don’t think any of us could really have predicted really the extent of change in the world.  

[TRANSITION MUSIC] 

So, Carey, Zak, just so much fun to do some reminiscing but also some reflection about what we did. 

[THEME MUSIC] 

And to our listeners, as always, thank you for joining us. We have some really great guests lined up for the rest of the series, and they’ll help us explore a variety of relevant topics—from AI drug discovery to what medical students are seeing and doing with AI and more.  

We hope you’ll continue to tune in. And if you want to catch up on any episodes you might have missed, you can find them at aka.ms/AIrevolutionPodcast (opens in new tab) or wherever you listen to your favorite podcasts.   

Until next time.  

[MUSIC FADES]


The post Coauthor roundtable: Reflecting on real world of doctors, developers, patients, and policymakers appeared first on Microsoft Research.

]]>
Laws, norms, and ethics for AI in health http://approjects.co.za/?big=en-us/research/podcast/laws-norms-and-ethics-for-ai-in-health/ Thu, 01 May 2025 16:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1137889 Healthcare experts Laura Adams, Vardit Ravitsky, and Dr. Roxana Daneshjou discuss responsible AI implementation in medicine, examining governance approaches, shifting patient-provider relationships, and the identification of bias to ensure equitable deployment.

The post Laws, norms, and ethics for AI in health appeared first on Microsoft Research.

]]>
Peter Lee, Vardit Ravitsky, Laura Adams, and Dr. Roxana Daneshjou illustrated headshots.

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Laura Adams (opens in new tab), Vardit Ravitsky (opens in new tab), and Dr. Roxana Daneshjou (opens in new tab), experts at the intersection of healthcare, ethics, and technology, join Lee to discuss the responsible implementation of AI in healthcare. Adams, a strategic advisor at the National Academy of Medicine leading the development of a national AI code of conduct, shares her initial curiosity and skepticism of generative AI and then her recognition of the technology as a transformative tool requiring new governance approaches. Ravitsky, bioethicist and president and CEO of The Hastings Center for Bioethics, examines how AI is reshaping healthcare relationships and the need for bioethics to proactively guide implementation. Daneshjou, a Stanford physician-scientist bridging dermatology, biomedical data science, and AI, discusses her work on identifying, understanding, and mitigating bias in AI systems and also leveraging AI to better serve patient needs.


Learn more

Health Care Artificial Intelligence Code of Conduct (opens in new tab) (Adams) 
Project homepage | National Academy of Medicine 

Artificial Intelligence in Health, Health Care, and Biomedical Science: An AI Code of Conduct Principles and Commitments Discussion Draft (opens in new tab) (Adams) 
National Academy of Medicine commentary paper | April 2024 

Ethics of AI in Health and Biomedical Research (opens in new tab) (Ravitsky) 
The Hastings Center for Bioethics 

Ethics in Patient Preferences for Artificial Intelligence–Drafted Responses to Electronic Messages (opens in new tab) (Ravitsky) 
Publication | March 2025 

Daneshjou Lab (opens in new tab) (Daneshjou) 
Lab homepage 

Red teaming ChatGPT in medicine to yield real-world insights on model behavior (opens in new tab) (Daneshjou) 
Publication | March 2025 

Dermatologists’ Perspectives and Usage of Large Language Models in Practice: An Exploratory Survey (opens in new tab) (Daneshjou) 
Publication | October 2024 

Deep learning-aided decision support for diagnosis of skin disease across skin tones (opens in new tab) (Daneshjou) 
Publication | February 2024 

Large language models propagate race-based medicine (opens in new tab) (Daneshjou) 
Publication | October 2023 

Disparities in dermatology AI performance on a diverse, curated clinical image set (opens in new tab) (Daneshjou) 
Publication | August 2022

Transcript

[MUSIC]    

[BOOK PASSAGE]  

PETER LEE: “… This is the moment for broad, thoughtful consideration of how to ensure maximal safety and also maximum access. Like any medical tool, AI needs those guardrails to keep patients as safe as possible. But it’s a tricky balance: those safety measures must not mean that the great advantages that we document in this book end up unavailable to many who could benefit from them. One of the most exciting aspects of this moment is that the new AI could accelerate healthcare in a direction that is better for patients, all patients, and providers as well—if they have access.” 

[END OF BOOK PASSAGE]    

[THEME MUSIC]    

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 

[THEME MUSIC FADES] 

The passage I read at the top there is from Chapter 9, “Safety First.” 

One needs only to look at examples such as laws mandating seatbelts in cars and, more recently, internet regulation to know that policy and oversight are often playing catch-up with emerging technologies. When we were writing our book, Carey, Zak, and I didn’t claim that putting frameworks in place to allow for innovation and adoption while prioritizing inclusiveness and protecting patients from hallucination and other harms would be easy. In fact, in our writing, we posed more questions than answers in the hopes of highlighting the complexities at hand and supporting constructive discussion and action in this space.  

In this episode, I’m pleased to welcome three experts who have been thinking deeply about these matters: Laura Adams, Vardit Ravitsky, and Dr. Roxana Daneshjou.  

Laura is an expert in AI, digital health, and human-centered care. As a senior advisor at the National Academy of Medicine, or NAM, she guides strategy for the academy’s science and technology portfolio and leads the Artificial Intelligence Code of Conduct national initiative.  

Vardit is president and CEO of The Hastings Center for Bioethics, a bioethics and health policy institute. She leads research projects funded by the National Institutes of Health, is a member of the committee developing the National Academy of Medicine’s AI Code of Conduct, and is a senior lecturer at Harvard Medical School.  

Roxana is a board-certified dermatologist and an assistant professor of both dermatology and biomedical data science at Stanford University. Roxana is among the world’s thought leaders in AI, healthcare, and medicine, thanks in part to groundbreaking work on AI biases and trustworthiness. 

One of the good fortunes I’ve had in my career is the chance to work with both Laura and Vardit, mainly through our joint work with the National Academy of Medicine. They’re both incredibly thoughtful and decisive leaders working very hard to help the world of healthcare—and healthcare regulators—come to grips with generative AI. And over the past few years, I’ve become an avid reader of all of Roxana’s research papers. Her work is highly technical, super influential but also informative in a way that spans computer science, medicine, bioethics, and law.  

These three leaders—one from the medical establishment, one from the bioethics field, and the third from clinical research—provide insights into three incredibly important dimensions of the issues surrounding regulations, norms, and ethics of AI in medicine. 

[TRANSITION MUSIC] 

Here is my interview with Laura Adams: 

LEE: Laura, I’m just incredibly honored and excited that you’re joining us here today, so welcome. 

ADAMS: Thank you, Peter, my pleasure. Excited to be here. 

LEE: So, Laura, you know, I’ve been working with you at the NAM for a while, and you are a strategic advisor at the NAM. But I think a lot of our listeners might not know too much about the National Academy of Medicine and then, within the National Academy of Medicine, what a strategic advisor does.  

So why don’t we start there. You know, how would you explain to a person’s mother or father what the National Academy of Medicine is? 

ADAMS: Sure. National Academy was formed more than 50 years ago. It was formed by the federal government, but it is not the federal government. It was formed as an independent body to advise the nation and the federal government on issues of science and primarily technology-related issues, as well.  

So with that 50 years, some probably know of the National Academy of Medicine when it was the Institute of Medicine and produced such publications as To Err is Human (opens in new tab) and Crossing the Quality Chasm (opens in new tab), both of which were seminal publications that I think had a dramatic impact on quality, safety, and how we saw our healthcare system and what we saw in terms of its potential. 

LEE: So now, for your role within NAM, what does the senior advisor do? What do you do?  

ADAMS: What I do there is in the course of leading the AI Code of Conduct project, my role there was in framing the vision for that project, really understanding what did we want it to do, what impact did we want it to make.  

So for example, some thought that it might be that we wanted everyone to use our code of conduct. And my advice on that was let’s use this as a touchstone. We want people to think about their own codes of conduct for their use of AI. That’s a valuable exercise, to decide what you value, what your aspirations are.  

I also then do a lot of the field alignment around that work. So I probably did 50 talks last year—conference presentations, webinars, different things—where the code of conduct was presented so that the awareness could be raised around it so people could see the practicality of using that tool.   
 
Especially the six commitments that were based on the idea of complex adaptive systems simple rules, where we could recall those in the heat of decision-making around AI, in the heat of application, or even in the planning and strategic thinking around it.  

LEE: All right, we’re going to want to really break into a lot of details here.  

But I would just like to rewind the clock a little bit and talk about your first encounters with AI. And there’s sort of, I guess, two eras. There’s the era of AI and machine learning before ChatGPT, before the generative AI era, and then afterwards.  

Before the era of generative AI, what was your relationship with the idea of artificial intelligence? Was it a big part of your role and something you thought about, or was it just one of many technologies that you considered? 

ADAMS: It was one of many.  

LEE: Yeah.  

ADAMS: Watching it help us evolve from predictive analytics to predictive AI, which of course I was fascinated by the fact that it could use structured and unstructured data, that it could learn from its own processes. These things were really quite remarkable, but my sense about it was that it was one of many.  

We were looking at telemedicine. We were looking at [a] variety of other things, particularly wearables and things that were affecting and empowering patients to take better care of themselves and take more … have more agency around their own care. So I saw it as one of many.  

And then the world changed in 2022, changed dramatically. 

LEE: [LAUGHS] OK. Right. OK, so November 2022, ChatGPT. Later in the spring of 2023, GPT-4. And so, you know, what were your first encounters, and what were you feeling? What were you experiencing? 

ADAMS: At the time, I was curious, and I thought, I think I’m seeing four things here that make this way different.  

And one was, and it proved to be true over time, the speed with which this evolved. And I was watching it evolve very, very quickly and thinking, this is almost, this is kind of mind blowing how fast this is getting better.  

And then this idea that, you know, we could scale this. As we were watching the early work with ambient listening, I was working with a group of physicians that were lamenting the cost and the unavailability of scribes. They wanted to use scribes. And I’m thinking, We don’t have to incur the cost of that. We don’t have to struggle with the unavailability of that type of … someone in the workforce.   

And then I started watching the ubiquity, and I thought, Oh, my gosh, this is unlike any other technology that we’ve seen. Because with electronic health records, for example, it’s had its place, but it was over here. We had another digital technology, maybe telehealth, over here. This was one, and I thought, there will be no aspect of healthcare that will be left untouched by AI. That blew my mind.  

LEE: Yeah. 

ADAMS: And then I think the last thing was the democratization. And I realized: Wow, anyone with a smartphone has access to the most powerful large language models in the world.  
 
And I thought, This, to me, is a revolution in cheap expertise. Those were the things that really began to stun me, and I just knew that we were in a way different era. 

LEE: It’s interesting that you first talked about ambient listening. Why was that of particular initial interest to you specifically? 

ADAMS: It was because one of the things that we were putting together in our code of conduct, which began pre-generative AI, was the idea that we wanted to renew the moral well-being and the sense of shared purpose to the healthcare workforce. That’s one of the six principles.  

And I knew that the cognitive burden was becoming unbearable. When we came out of COVID, it was such a huge wake-up call to understand exactly what was going on at that point of care and how challenging it had become because information overload is astonishing in and of itself. And that idea that we have so much in the way of documentation that needed to be done and how much of a clinician’s time was taken up doing that rather than doing the thing that they went into the profession to do. And that was interact with people, that was to heal, that was to develop human connection that had a healing effect, and they just … so much of the time was taken away from that activity.  

I also looked at it and because I studied diffusion of innovations theory and understand what causes something to move rapidly across a social system and get adopted, it has to have a clear relative advantage. It has to be compatible with the way that processes work. 

So I didn’t see that this was going to be a hugely disruptive activity to workflow, which is a challenge of most digital tools, is that they’re designed without that sense of, how does this impact the workflow? And then I just thought that it was going to be a front runner in adoption, and it might then start to create that tsunami, that wave of interest in this, and I don’t think I was wrong. 

LEE: I have to ask you, because I’ve been asking every guest, there must have been moments early on in the encounter with generative AI where you felt doubt or skepticism. Is that true, or did you immediately think, Wow, this is something very important? 

ADAMS: No, I did feel doubt and skepticism.  

My understanding tells me of it, and told me of it in the very beginning, that this is trained on the internet with all of its flaws. When we think about AI, we think about it being very futuristic, but it’s trained on data from the past. I’m well aware of how flawed that data, how biased that data is, mostly men, mostly white men, when we think about it during a certain age grouping of.  

So I knew that we had inherent massive flaws in the training data and that concerned me. I saw other things about it that also concerned me. I saw that … its difficulty in beginning to use it and govern it effectively.  

You really do have to put a good governance system in if you’re going to put this into a care delivery system. And I began to worry about widening a digital divide that already was a chasm. And that was between those well-resourced, usually urban, hospitals and health systems that are serving the well-resourced, and the inner-city hospital in Chicago or the rural hospital in Nebraska or the Mississippi community health center.  

LEE: Yes. So I think this skepticism about technology, new technologies, in healthcare is very well-earned. So you’ve zeroed in on this issue of technology where oftentimes we hope it’ll reduce or eliminate biases but actually seems to oftentimes have the opposite effect.  

And maybe this is a good segue then into this really super-important national effort that you’re leading on the AI code of conduct. Because in a way, I think those failures of the past and even just the idea—the promise—that technology should make a doctor or a nurse’s job easier, not harder, even that oftentimes seems not to have panned out in the way that we hope.  

And then there’s, of course, the well-known issue of hallucinations or of mistakes being made. You know, how did those things inform this effort around a code of conduct, and why a code of conduct? 

ADAMS: Those things weighed heavily on me as the initiative leader because I had been deeply involved in the spread of electronic health records, not really knowing and understanding that electronic health records were going to have the effect that they had on the providers that use them.  

Looking back now, I think that there could have been design changes, but we probably didn’t have as much involvement of providers in the design. And in some cases, we did. We just didn’t understand what it would take to work it into their workflows.  

So I wanted to be sure that the code of conduct took into consideration and made explicit some of the things that I believe would have helped us had we had those guardrails or those guidelines explicit for us.  

And those are things like our first one is to protect and advance human health and connection.  

We also wanted to see things about openly sharing and monitoring because we know that for this particular technology, it’s emergent. We’re going to have to do a much better job at understanding whether what we’re doing works and works in the real world.  

So the reason for a code of conduct was we knew that … the good news, when the “here comes AI and it’s barreling toward us,” the good news was that everybody was putting together guidelines, frameworks, and principle sets. The bad news was same. That everybody was putting together their own guideline, principle, and framework set.  

And I thought back to how much I struggled when I worked in the world of health information exchange and built a statewide health information exchange and then turned to try to exchange that data across the nation and realized that we had a patchwork of privacy laws and regulations across the state; it was extremely costly to try to move data.  

And I thought we actually need, in addition to data interoperability, we need governance interoperability, where we can begin to agree on a core set of principles that will more easily allow us to move ahead and achieve some of the potential and the vision that we have for AI if we are not working with a patchwork of different guidelines, principles, and frameworks.  

So that was the impetus behind it. Of course, we again want it to be used as a touchstone, not everybody wholesale adopt what we’ve said.  

LEE: Right. 

ADAMS: We want people to think about this and think deeply about it. 

LEE: Yeah, Laura, I always am impressed with just how humble you are. You were indeed, you know, one of the prime instigators of the digitization of health records leading to electronic health record systems. And I don’t think you need to feel bad about that. That was a tremendous advance. I mean, moving a fifth of the US economy to be digital, I think, is significant.  

Also, our listeners might want to know that you led something called the Rhode Island Quality Institute, which was really, I think, maybe the, arguably, the most important early kind of examples that set a pattern for how and why health data might actually lead very directly to improvements in human health at a statewide level or at a population level. And so I think your struggles and frustrations on, you know, how to expand that nationwide, I think, are really, really informative.  

So let’s get into what these principles are, you know, what’s in the code of conduct.  

ADAMS: Yeah, the six simple rules were derived out of a larger set of principles that we pulled together. And the origin of all of this was we did a fairly extensive landscape review. We looked at least at 60 different sets of these principles, guidelines, and frameworks. We looked for areas of real convergence. We looked for areas where there was inconsistencies. And we looked for out-and-out gaps.  

The out-and-out gaps that we saw at the time were things like a dearth of references to advancing human health as the priority. Also monitoring post-implementation. So at the time, we were watching these evolve and we thought these are very significant gaps. Also, the impact on the environment was a significant gap, as well. And so when we pull that together, we developed a set of principles and cross-walked those with learning health system principles (opens in new tab).  

And then once we got that, we again wanted to distill that down into a set of commitments which we knew that people could find accessible. And we published that draft set of principles (opens in new tab) last year. And we have a new publication that will be coming out in the coming months that will be the revised set of principles and code commitments that we got because we took this out publicly. 

So we opened it up for public comment once we did the draft last year. Again, many of those times that I spoke about this, almost all of those times came with an invitation for feedback, and conversations that we had with people shaped it. And it is in no way, shape, or form a final code of conduct, this set of principles and commitments, because we see this as dynamic. But what we also knew about this was that we wanted to build this with a super solid foundation, a set of immutables, the things that don’t change at some vicissitudes or the whims of this or the whims of that. We wanted those things that were absolutely foundational.  

LEE: Yeah, so we’ll provide a link to the documents that describe the current state of this, but can we give an example of one or two of these principles and one or two of the commitments? 

ADAMS: Sure. I’ve mentioned the “protect and advance human health and connection” as the primary aim. We also want to ensure the equitable distribution of risks and benefits, and that equitable distribution of risks and benefits is something that I was referring to earlier about when I see well-resourced organizations. And one that’s particularly important to me is engaging people as partners with agency at every stage of the AI lifecycle.  

That one matters because this one talks about and speaks to the idea that we want to begin bringing in those that are affected by AI, those on whom AI is used, into the early development and conceptualization of what we want this new tool, this new application, to do. So that includes the providers that use it, the patients. And we find that when we include them—the ethicists that come along with that—we develop much better applications, much more targeted applications that do what we intend them to do in a more precise way.  

The other thing about that engaging with agency, by agency we mean that person, that participant can affect the decisions and they can affect the outcome. So it isn’t that they’re sort of a token person coming into the table and we’ll allow you to tell your story or so, but this is an active participant.  

We practiced what we preached when we developed the code of conduct, and we brought patient advocates in to work with us on the development of this, work with us on our applications, that first layer down of what the applications would look like, which is coming out in this new paper.  

We really wanted that component of this because I’m also seeing that patients are definitely not passive users of this, and they’re having an agency moment, let’s say, with generative AI because they’ve discovered a new capacity to gain information, to—in many ways—claim some autonomy in all of this.   

And I think that there is a disruption underway right now, a big one that has been in the works for many years, but it feels to me like AI may be the tipping point for that disruption of the delivery system as we know it.  

LEE: Right. I think it just exudes sort of inclusivity and thoughtfulness in the whole process. During this process, were there surprises, things that you didn’t expect? Things about AI technology itself that surprised you? 

ADAMS: The surprises that came out of this process for me, one of them was I surprised myself. We were working on the commentary paper, and Steven Lin from Stanford had strong input into that paper. And when we looked at what we thought were missing, he said, “Let’s make sure we have the environmental impact.” And I said, “Oh, really, Steven, we really want to think about things that are more directly aligned with health,” which I couldn’t believe came out of my own mouth. [LAUGHTER] 

And Steven, without saying, “Do you hear yourself?” I mean, I think he could have said that. But he was more diplomatic than that. And he persisted a bit longer and said, “I think it’s actually the greatest threat to human health.” And I said, “Of course, you’re right.” [LAUGHS] 

But that was surprising and embarrassing for me. But it was eye-opening in that even when I thought that I had understood the gaps and the using this as a touchstone. So the learning that took place and how rapidly that learning was happening among people involved in this.  

The other thing that was surprising for me was the degree at which patients became vastly facile with using it to the extent that it helped them begin to, again, build their own capacity.  

The #PatientsUseAI (opens in new tab) from Dave deBronkart—watch that one. This is more revolutionary than we think. And so I watched that, the swell of that happening, and it sort of shocked me because I was envisioning this as, again, a tool for use in the clinical setting.  

LEE: Yeah. OK, so we’re running now towards the end of our time together. And I always like to end our conversations with a more provocative topic. And I thought for you, I’d like to use the very difficult word regulation.  

And when I think about the book that Carey, Zak, and I wrote, we have a chapter on regulation, but honestly, we didn’t have ideas. We couldn’t understand how this would be regulated. And so we just defaulted to publishing a conversation about regulation with GPT-4. And in a way, I think … I don’t know that I or my coauthors were satisfied with that.  

In your mind, where do we stand two years later now when we think about the need or not to regulate AI, particularly in its application to healthcare, and where has the thinking evolved to? 

ADAMS: There are two big differences that I see in that time that has elapsed. And the first one is we have understood the insufficiency of simply making sure that AI-enabled devices are safe prior to going out into implementation settings.  

We recognize now that there’s got to be this whole other aspect of regulation and assurance that these things are functioning as intended and we have the capacity to do that in the point of care type of setting. So that’s been one of the major ones. The other thing is how wickedly challenging it is to regulate generative AI.  

I think one of the most provocative and exciting articles (opens in new tab) that I saw written recently was by Bakul Patel and David Blumenthal, who posited, should we be regulating generative AI as we do a licensed and qualified provider?  

Should it be treated in the sense that it’s got to have a certain amount of training and a foundation that’s got to pass certain tests? It has to demonstrate that it’s improving and keeping up with current literature. Does it … be responsible for mistakes that it makes in some way, shape, or form? Does it have to report its performance?  

And I’m thinking, what a provocative idea …  

LEE: Right. 

ADAMS: … but it’s worth considering. I chair the Global Opportunities Group for a regulatory and innovation AI sandbox in the UK. And we’re hard at work thinking about, how do you regulate something as unfamiliar and untamed, really, as generative AI?  

So I’d like to see us think more about this idea of sandboxes, more this idea of should we be just completely rethinking the way that we regulate. To me, that’s where the new ideas will come because the danger, of course, in regulating in the old way … first of all, we haven’t kept up over time, even with predictive AI; even with pre-generative AI, we haven’t kept up.  

And what worries me about continuing on in that same vein is that we will stifle innovation … 

LEE: Yes. 

ADAMS: … and we won’t protect from potential harms. Nobody wants an AI Chernobyl, nobody.  

LEE: Right 
 
ADAMS: But I worry that if we use those old tools on the new applications that we will not only not regulate, then we’ll stifle innovation. And when I see all of the promise coming out of this for things that we thought were unimaginable, then that would be a tragedy. 

LEE: You know, I think the other reflection I’ve had on this is the consumer aspect of it, because I think a lot of our current regulatory frameworks are geared towards experts using the technology.  

ADAMS: Yes. 

LEE: So when you have a medical device, you know you have a trained, board-certified doctor or licensed nurse using the technology. But when you’re putting things in the hands of a consumer, I think somehow the surface area of risk seems wider to me. And so I think that’s another thing that somehow our current regulatory concepts aren’t really ready for. 

ADAMS: I would agree with that. I think a few things to consider, vis-a-vis that, is that this revolution of patients using it is unstoppable. So it will happen. But we’re considering a project here at the National Academy about patients using AI and thinking about: let’s explore all the different facets of that. Let’s understand, what does safe usage look like? What might we do to help this new development enhance the patient-provider relationship and not erode it as we saw, “Don’t confuse your Google search with my medical degree” type of approach.  

Thinking about: how does it change the identity of the provider? How does it … what can we do to safely build a container in which patients can use this without giving them the sense that it’s being taken away, or that … because I just don’t see that happening. I don’t think they’re going to let it happen.  

That, to me, feels extremely important for us to explore all the dimensions of that. And that is one project that I hope to be following on to the AI Code of Conduct and applying the code of conduct principles with that project.  

LEE: Well, Laura, thank you again for joining us. And thank you even more for your tremendous national, even international, leadership on really helping mobilize the greatest institutions in a diverse way to fully confront the realities of AI in healthcare. I think it’s tremendously important work. 

ADAMS: Peter, thank you for having me. This has been an absolute pleasure. 

[TRANSITION MUSIC]  

I’ve had the opportunity to watch Laura in action as she leads a national effort to define an AI code of conduct. And our conversation today has only heightened my admiration for her as a national leader.  

What impresses me is Laura’s recognition that technology adoption in healthcare has had a checkered history and furthermore oftentimes not accommodated the huge diversity of stakeholders that are affected equally. 

The concept of an AI code of conduct seems straightforward in some ways, but you can tell that every word in the emerging code has been chosen carefully. And Laura’s tireless engagement traveling to virtually every corner of the United States, as well as to several other countries, shows real dedication. 

And now here’s my conversation with Vardit Ravitsky:

LEE: Vardit, thank you so much for joining. 

RAVITSKY: It’s a real pleasure. I’m honored that you invited me. 

LEE: You know, we’ve been lucky. We’ve had a few chances to interact and work together within the National Academy of Medicine and so on. But I think for many of the normal subscribers to the Microsoft Research Podcast, they might not know what The Hastings Center for Bioethics is and then what you as the leader of The Hastings Center do every day. So I’d like to start there, first off, with what is The Hastings Center? 

RAVITSKY: Mostly, we’re a research center. We’ve been around for more than 55 years. And we’re considered one of the organizations that actually founded the field known today as bioethics, which is the field that explores the policy implications, the ethical, social issues in biomedicine. So we look at how biotechnology is rolled out; we look at issues of equity, of access to care. We look at issues at the end of life, the beginning of life, how our natural environment impacts our health. Any aspect of the delivery of healthcare, the design of the healthcare system, and biomedical research leading to all this. Any aspect that has an ethical implication is something that we’re happy to explore.  

We try to have broad conversations with many, many stakeholders, people from different disciplines, in order to come up with guidelines and recommendations that would actually help patients, families, communities.  

We also have an editorial department. We publish academic journals. We publish a blog. And we do a lot of public engagement activities—webinars, in-person events. So, you know, we just try to promote the thinking of the public and of experts on the ethical aspects of health and healthcare.  

LEE: One thing I’ve been impressed with, with your work and the work of The Hastings Center is it really confronts big questions but also gets into a lot of practical detail. And so we’ll get there. But before that just a little bit about you then. The way I like to ask this question is: how do you explain to your parents what you do every day? [LAUGHS] 

RAVITSKY: Funny that you brought my parents into this, Peter, because I come from a family of philosophers. Everybody in my family is in humanities, in academia. When I was 18, I thought that that was the only profession [LAUGHTER] and that I absolutely had to become a philosopher, or else what else can you do with your life? 

I think being a bioethicist is really about, on one hand, keeping an eye constantly on the science as it evolves. When a new scientific development occurs, you have to understand what’s happening so that you can translate that outside of science. So if we can now make a gamete from a skin cell so that babies will be created differently, you have to understand how that’s done, what that means, and how to talk about it.  

The second eye you keep on the ethics literature. What ethical frameworks, theories, principles have we developed over the last decades that are now relevant to this technology. So you’re really a bridge between science, biomedicine on one hand and humanities on the other hand.  

LEE: OK. So let’s shift to AI. And here I’d like to start with a kind of an origin story because I’m assuming before generative AI and ChatGPT became widely known and available, you must have had some contact with ideas in data science, in machine learning, and, you know, in the concept of AI before ChatGPT. Is that true? And, you know, what were some of those early encounters like for you? 

RAVITSKY: The earlier issues that I heard people talk about in the field were really around diagnostics and reading images and, Ooh, it looks like machines could perform better than radiologists. And, Oh, what if women preferred that their mammographies be read by these algorithms? And, Does that threaten us clinicians? Because it sort of highlights our limitations and weaknesses as, you know, the weakness of the human eye and the human brain.  

So there were early concerns about, will this outperform the human and potentially take away our jobs? Will it impact our authority with patients? What about de-skilling clinicians or radiologists or any type of diagnostician losing the ability … some abilities that they’ve had historically because machines take over? So those were the early-day reflections and interestingly some of them remain even now with generative AI.  

All those issues of the standing of a clinician, and what sets us apart, and will a machine ever be able to perform completely autonomously, and what about empathy, and what about relationships? Much of that translated later on into the, you know, more advanced technology. 

LEE: I find it interesting that you use words like our and we to implicitly refer to humans, homo sapiens, to human beings. And so do you see a fundamental distinction, a hard distinction that separates humans from machines? Or, you know, how … if there are replacements of some human capabilities or some things that human beings do by machines, you know, how do you think about that? 

RAVITSKY: Ooh, you’re really pushing hard on the philosopher in me here. [LAUGHTER] I’ve read books and heard lectures by those who think that the line is blurred, and I don’t buy that. I think there’s a clear line between human and machine.  

I think the issue of AGI—of artificial general intelligence—and will that amount to consciousness … again, it’s such a profound, deep philosophical challenge that I think it would take a lot of conceptual work to get there. So how do we define consciousness? How do we define morality? The way it stands now, I look into the future without being a technologist, without being an AI developer, and I think, maybe I hope, that the line will remain clear. That there’s something about humanity that is irreplaceable.  

But I’m also remembering that Immanuel Kant, the famous German philosopher, when he talked about what it means to be a part of the moral universe, what it means to be a moral agent, he talked about rationality and the ability to implement what he called the categorical imperative. And he said that would apply to any creature, not just humans. 

And that’s so interesting. It’s always fascinated me that so many centuries ago, he said such a progressive thing. 

LEE: That’s amazing, yeah. 

RAVITSKY: It is amazing because I often, as an ethicist, I don’t just ask myself, What makes us human? I ask myself, What makes us worthy of moral respect? What makes us holders of rights? What gives us special status in the universe that other creatures don’t have? And I know this has been challenged by people like Peter Singer who say [that] some animals should have the same respect. “And what about fetuses and what about people in a coma?” I know the landscape is very fraught.  

But the notion of what makes humans deserving of special moral treatment to me is the core question of ethics. And if we think that it’s some capacities that give us this respect, that make us hold that status, then maybe it goes beyond human. So it doesn’t mean that the machine is human, but maybe at [a] certain point, these machines will deserve a certain type of moral respect that … it’s hard for us right now to think of a machine as deserving that respect. That I can see. 

But completely collapsing the distinction between human and machine? I don’t think so, and I hope not. 

LEE: Yeah. Well, you know, in a way I think it’s easier to entertain this type of conversation post-ChatGPT. And so now, you know, what was your first personal encounter with what we now call generative AI, and what went through your mind as you had that first encounter? 

RAVITSKY: No one’s ever asked me this before, Peter. It almost feels exposing to share your first encounter. [LAUGHTER]  

So I just logged on, and I asked a simple question, but it was an ethical question. I framed an ethical dilemma because I thought, if I ask it to plan a trip, like all my friends already did, it’s less interesting to me.  

And within seconds, a pretty thoughtful, surprisingly nuanced analysis was kind of trickling down my screen, and I was shocked. I was really taken aback. I was almost sad because I think my whole life I was hoping that only humans can generate this kind of thinking using moral and ethical terms.  

And then I started tweaking my question, and I asked for specific philosophical approaches to this. And it just kept surprising me in how well it performed.  

So I literally had to catch my breath and, you know, sit down and go, OK, this is a new world, something very important and potentially scary is happening here. How is this going to impact my teaching? How is this going to impact my writing? How is this going to impact health? Like, it was really a moment of shock. 

LEE: I think the first time I had the privilege of meeting you, I heard you speak and share some of your initial framing of how, you know, how to think about the potential ethical implications of AI and the human impacts of AI in the future. Keeping in mind that people listening to this podcast will tend to be technologists and computer scientists as well as some medical educators and practicing clinicians, you know, what would you like them to know or understand most about your thoughts? 

RAVITSKY: I think from early on, Peter, I’ve been an advocate in favor of bioethics as a field positioning itself to be a facilitator of implementing AI. I think on one hand, if we remain the naysayers as we have been regarding other technologies, we will become irrelevant. Because it’s happening, it’s happening fast, we have to keep our eye on the ball, and not ask, “Should we do it?” But rather ask, “How should we do it?” 

And one of the reasons that bioethics is going to be such a critical player is that the stakes are so high. The risk of making a mistake in diagnostics is literally life and death; the risk of breaches of privacy that would lead to patients losing trust and refusing to use these tools; the risk of clinicians feeling overwhelmed and replaceable. The risks are just too high.  

And therefore, creating guardrails, creating frameworks with principles that sensitize us to the ethical aspects, that is critically important for AI and health to succeed. And I’m saying it as someone who wants it very badly to succeed. 

LEE: You are actually seeing a lot of healthcare organizations adopting and deploying AI. Has any aspect of that been surprising to you? Have you expected it to be happening faster or slower or unfolding in a different way? 

RAVITSKY: One thing that surprises me is how it seems to be isolated. Different systems, different institutions making their own, you know, decisions about what to acquire and how to implement. I’m not seeing consistency. And I’m not even seeing anybody at a higher level collecting all the information about who’s buying and implementing what under what types of principles and what are their outcomes? What are they seeing?  

It seems to be just siloed and happening everywhere. And I wish we collected all this data, even about how the decision is made at the executive level to buy a certain tool, to implement it, where, why, by whom. So that’s one thing that surprised me.  

The speed is not surprising me because it really solves problems that healthcare systems have been struggling with. What seems to be one of the more popular uses, and again, you know this better than I do, is the help with scribes with taking notes, ambient recording. This seems to be really desired because of burnout that clinicians face around this whole issue of note taking.  

And it’s also seen as a way to allow clinicians to do more human interaction, you know, … 

LEE: Right.  

RAVITSKY: … look at the patient, talk to the patient, … 

LEE: Yep. 

RAVITSKY: … listen, rather than focus on the screen. We’ve all sat across the desk with a doctor that never looks at us because they only look at the screen. So there’s a real problem here, and there’s a real solution and therefore it’s hitting the ground quickly.  

But what’s surprising to me is how many places don’t think that it’s their responsibility to inform patients that this is happening. So some places do; some places don’t. And to me, this is a fundamental ethical issue of patient autonomy and empowerment. And it’s also pragmatically the fear of a crisis of trust. People don’t like being recorded without their consent. Surprise, surprise.  

LEE: Mm-hmm. Yeah, yeah. 

RAVITSKY: People worry about such a recording of a very private conversation that they consider to be confidential, such a recording ending up in the wrong hands or being shared externally or going to a commercial entity. People care; patients care.  

So what is our ethical responsibility to tell them? And what is the institutional responsibility to implement these wonderful tools? I’m not against them, I’m totally in favor—implement these great tools in a way that respects long-standing ethical principles of informed consent, transparency, accountability for, you know, change in practice? And, you know, bottom line: patients right to know what’s happening in their care.  

LEE: You actually recently had a paper in a medical journal (opens in new tab) that touched on an aspect of this, which I think was not with scribes, but with notes, you know, … 

RAVITSKY: Yep. 

LEE: … that doctors would send to patients. And in fact, in previous episodes of this podcast, we actually talked to both the technology developers of that type of feature as well as doctors who were using that feature. And in fact, even in those previous conversations, there was the question, “Well, what does the patient need to know about how this note was put together?” So you and your coauthors had a very interesting recent paper about this. 

RAVITSKY: Yeah, so the trigger for the paper was that patients seemed to really like being able to send electronic messages to clinicians.  

LEE: Yes. 

RAVITSKY: We email and text all day long. Why not in health, right? People are used to communicating in that way. It’s efficient; it’s fast.   

So we asked ourselves, “Wait, what if an AI tool writes the response?” Because again, this is a huge burden on clinicians, and it’s a real issue of burnout.  

We surveyed hundreds of respondents, and basically what we discovered is that there was a statistically significant difference in their level of satisfaction when they got an answer from a human clinician, when they got an answer, again, electronic message from AI.  

And it turns out that they preferred the messages written by AI. They were longer, more detailed, even conveyed more empathy. You know, AI has all the time in the world [LAUGHS] to write you a text. It’s not rushing to the next one. 

But then when we disclosed who wrote the message, they were less satisfied when they were told it was AI.  

So the ethical question that that raises is the following: if your only metric is patient satisfaction, the solution is to respond using AI but not tell them that. 

Now when we compared telling them that it was AI or human or not telling them anything, their satisfaction remained high, which means that if they were not told anything, they probably assumed that it was a human clinician writing, because their satisfaction for human clinician or no disclosure was the same. 

So basically, if we say nothing and just send back an AI-generated response, they will be more satisfied because the response is nicer, but they won’t be put off by the fact that it was written by AI. And therefore, hey presto, optimal satisfaction. But we challenge that, and we say, it’s not just about satisfaction. 

It’s about long-term trust. It’s about your right to know. It’s about empowering you to make decisions about how you want to communicate.  

So we push back against this notion that we’re just there to optimize patient satisfaction, and we bring in broader ethical considerations that say, “No, patients need to know.” If it’s not the norm yet to get your message from AI, … 

LEE: Yeah. 

RAVITSKY: … they should know that this is happening. And I think, Peter, that maybe we’re in a transition period. 

It could be that in two years, maybe less than that, most of our communication will come back from AI, and we will just take it for granted …  

LEE: Right. 

RAVITSKY: … that that’s the case. And at that point, maybe disclosure is not necessary because many, many surveys will show us that patients assume that, and therefore they are informed. But at this point in time, when it’s transition and it’s not the norm yet, I firmly think that ethics requires that we inform patients. 

LEE: Let me push on this a little bit because I think this final point that you just made is, I think is so interesting. Does it matter what kind of information is coming from a human or AI? Is there a time when patients will have different expectations for different types of information from their doctors? 

RAVITSKY: I think, Peter, that you’re asking the right question because it’s more nuanced. And these are the kinds of empirical questions that we will be exploring in the coming months and years. Our recent paper showed that there was no difference regarding the content. If the message was about what we call the “serious” matter or a less “serious” matter, the preferences were the same. But we didn’t go deep enough into that. That would require a different type of design of study. And you just said, you know, there are different types of information. We need to categorize them.  

LEE: Yeah.  

RAVITSKY: What types of information and what degree of impact on your life? Is it a life-and-death piece of information? Is it a quality-of-life piece of information? How central is it to your care and to your thinking? So all of that would have to be mapped out so that we can design these studies.  

But you know, you pushed back in that way, and I want to push back in a different direction that to me is more fundamental and philosophical. How much do we know now? You know, I keep saying, oh, patients deserve a chance for informed consent, … 

LEE: Right.  

RAVITSKY: … and they need to be empowered to make decisions. And if they don’t want that tool used in their care, then they should be able to say, “No.” Really? Is that the world we live in now? [LAUGHTER] Do I have access to the black box that is my doctor’s brain? Do I know how they performed on this procedure in the last year? 

Do I know whether they’re tired? Do I know if they’re up to speed on the literature with this? We already deal with black boxes, except they’re not AI. And I think the evidence emerges that AI outperforms the humans in so many of these tasks.  

So my pushback is, are we seeing AI exceptionalism in the sense that if it’s AI, Huh, panic! We have to inform everybody about everything, and we have to give them choices, and they have to be able to reject that tool and the other tool versus, you know, the rate of human error in medicine is awful. People don’t know the numbers. The annual deaths attributed to medical error is outrageous.  

So why are we so focused on informed consent and empowerment regarding implementation of AI and less in other contexts. Is it just because it’s new? Is it because it is some sort of existential threat, … 

LEE: Yep, yeah. 

RAVITSKY: … not just a matter of risk? I don’t know the answer, but I don’t want us to suffer from AI exceptionalism, and I don’t want us to hold AI to such a high standard that we won’t be able to benefit from it. Whereas, again, we’re dealing with black boxes already in medicine. 

LEE: Just to stay on this topic, though, one more question, which is, maybe, almost silly in how hypothetical it is. If instead of email, it were a Teams call or a Zoom call, doctor-patient, except that the doctor is not the real doctor, but it’s a perfect replica of the doctor designed to basically fool the patient that this is the real human being and having that interaction. Does that change the bioethical considerations at all? 

RAVITSKY: I think it does because it’s always a question of, are we really misleading? Now if you get a text message in an environment that, you know, people know AI is already integrated to some extent, maybe not your grandmother, but the younger generation is aware of this implementation, then maybe you can say, “Hmm. It was implied. I didn’t mean to mislead the patient.” 

If the patient thinks they’re talking to a clinician, and they’re seeing, like—what if it’s not you now, Peter? What if I’m talking to an avatar [LAUGHS] or some representation of you? Would I feel that I was misled in recording this podcast? Yeah, I would. Because you really gave me good reasons to assume that it was you.  

So it’s something deeper about trust, I think. And it touches on the notion of empathy. A lot of literature is being developed now on the issue of: what will remain the purview of the human clinician? What are humans good for [LAUGHS] when AI is so successful and especially in medicine?  

And if we see that the text messages are read as conveying more empathy and more care and more attention, and if we then move to a visual representation, facial expressions that convey more empathy, we really need to take a hard look at what we mean by care. What about then the robots, right, that we can touch, that can hug us?  

I think we’re really pushing the frontier of what we mean by human interaction, human connectedness, care, and empathy. This will be a lot of material for philosophers to ask themselves the fundamental question you asked me at first: what does it mean to be human?  

But this time, what does it mean to be two humans together and to have a connection?  

And if we can really be replaced in the sense that patients will feel more satisfied, more heard, more cared for, do we have ethical grounds for resisting that? And if so, why?  

You’re really going deep here into the conceptual questions, but bioethics is already looking at that. 

LEE: Vardit, it’s just always amazing to talk to you. The incredible span of what you think about from those fundamental philosophical questions all the way to the actual nitty gritty, like, you know, what parts of an email from a doctor to a patient should be marked as AI. I think that span is just incredible and incredibly needed and useful today. So thank you for all that you do. 

RAVITSKY: Thank you so much for inviting me. 

[TRANSITION MUSIC] 

The field of bioethics, and this is my take, is all about the adoption of disruptive new technologies into biomedical research and healthcare. And Vardit is able to explain this with such clarity. I think one of the reasons that AI has been challenging for many people is that its use spans the gamut from the nuts and bolts of how and when to disclose to patients that AI is being used to craft an email, all the way to, what does it mean to be a human being caring for another human?  

What I learned from the conversation with Vardit is that bioethicists are confronting head-on the issue of AI in medicine and not with an eye towards restricting it, but recognizing that the technology is real, it’s arrived, and needs to be harnessed now for maximum benefit.  

And so now, here’s my conversation with Dr. Roxana Daneshjou: 

LEE: Roxana, I’m just thrilled to have this chance to chat with you. 

ROXANA DANESHJOU: Thank you so much for having me on today. I’m looking forward to our conversation. 

LEE: In Microsoft Research, of course, you know, we think about healthcare and biomedicine a lot, but I think there’s less of an understanding from our audience what people actually do in their day-to-day work. And of course, you have such a broad background, both on the science side with a PhD and on the medical side. So what’s your typical work week like? 

DANESHJOU: I spend basically 90% of my time working on running my research lab (opens in new tab) and doing research on how AI interacts with medicine, how we can implement it to fix the pain points in medicine, and how we can do that in a fair and responsible way. And 10% of my time, I am in clinic. So I am a practicing dermatologist at Stanford, and I see patients half a day a week. 

LEE: And your background, it’s very interesting. There’s always been these MD-PhDs in the world, but somehow, especially right now with what’s happening in AI, people like you have become suddenly extremely important because it suddenly has become so important to be able to combine these two things. Did you have any inkling about that when you started, let’s say, on your PhD work? 

DANESHJOU: So I would say that during my—[LAUGHS] because I was in training for so long—during … my PhD was in computational genomics, and I still have a significant interest in precision medicine, and I think AI is going to be central to that.  

But I think the reason I became interested in AI initially is that I was thinking about how we associate genetic factors with patient phenotypes. Patient phenotypes being, How does the disease present? What does the disease look like? And I thought, you know, AI might be a good way to standardize phenotypes from images of, say, skin disease, because I was interested in dermatology at that time. And, you know, the part about phenotyping disease was a huge bottleneck because you would have humans sort of doing the phenotyping.  

And so in my head, when I was getting into the space, I was thinking I’ll bring together, you know, computer vision and genetics to try to, you know, make new discoveries about how genetics impacts human disease. And then when I actually started my postdoc to learn computer vision, I went down this very huge rabbit hole, which I am still, I guess, falling down, where I realized, you know, about biases in computer vision and how much work needed to be done for generalizability. 

And then after that, large language models came out, and, like, everybody else became incredibly interested in how this could help in healthcare and now also vision language models and multimodal models. So, you know, we’re just tumbling down the rabbit hole.  

LEE: Indeed, I think you really made a name for yourself by looking at the issues of biases, for example, in training datasets. And that was well before large language models were a thing. Maybe our audience would like to hear a little bit more about that earlier work.  

DANESHJOU: So as I mentioned, my PhD was in computational genetics. In genetics, what has happened during the genetic revolution is these large-scale studies to discover how genetic variation impacts human disease and human response to medication, so that’s what pharmacogenomics is, is human response to medications. And as I got, you know, entrenched in that world, I came to realize that I wasn’t really represented in the data. And it was because the majority of these genetic studies were on European ancestry individuals. You weren’t represented either. 

LEE: Right, yeah. 

DANESHJOU: Many diverse global populations were completely excluded from these studies, and genetic variation is quite diverse across the globe. And so you’re leaving out a large portion of genetic variation from these research studies. Now things have improved. It still needs work in genetics. But definitely there has been many amazing researchers, you know, sounding the alarm in that space. And so during my PhD, I actually focused on doing studies of pharmacogenomics in non-European ancestry populations. So when I came to computer vision and in particular dermatology, where there were a lot of papers being published about how AI models perform at diagnosing skin disease and several papers essentially saying, oh, it’s equivalent to a dermatologist—of course, that’s not completely true because it’s a very sort of contrived, you know, setting of diagnosis— … 

LEE: Right, right. 

DANESHJOU: but my first inkling was, well, are these models going to be able to perform well across skin tones? And one of our, you know, landmark papers, which was in Science Advances (opens in new tab), showed … we created a diverse dataset, our own diverse benchmark of skin disease, and showed that the models performed significantly worse on brown and black skin. And I think the key here is we also showed that it was an addressable problem because when we fine-tuned on diverse skin tones, you could make that bias go away. So it was really, in this case, about what data was going into the training of these computer vision models. 

LEE: Yeah, and I think if you’re listening to this conversation, if you haven’t read that paper, I think it’s really must reading. It was not only, Roxana, it wasn’t only just a landmark scientifically and medically, but it also sort of crossed the chasm and really became a subject of public discourse and debate, as well. And I think you really changed the public discourse around AI.  

So now I want to get into generative AI. I always like to ask, what was your first encounter with generative AI personally? And what went through your head? You know, what was that experience like for you? 

DANESHJOU: Yeah, I mean, I actually tell this story a lot because I think it’s a fun story. So I would say that I had played with, you know, GPT-3 prior and wasn’t particularly, you know, impressed … 

LEE: Yeah. 

DANESHJOU: … by how it was doing. And I was at NeurIPS [Conference on Neural Information Processing Systems] in New Orleans, and I was … we were walking back from a dinner. I was with Andrew Beam from Harvard. I was with his group. 

And we were just, sort of, you know, walking along, enjoying the sites of New Orleans, chatting. And one of his students said, “Hey, OpenAI just released this thing called ChatGPT.”  

LEE: So this would be New Orleans in December … 

DANESHJOU: 2022.  

LEE: 2022, right? Yes. Uh-huh. OK. 

DANESHJOU: So I went back to my hotel room. I was very tired. But I, you know, went to the website to see, OK, like, what is this thing? And I started to ask it medical questions, and I started all of a sudden thinking, “Uh-oh, we have made … we have made a leap here; something has changed.”  

LEE: So it must have been very intense for you from then because months later, you had another incredibly impactful, or landmark, paper basically looking at biases, race-based medicine in large language models (opens in new tab). So can you say more about that? 

DANESHJOU: Yes. I work with a very diverse team, and we have thought about bias in medicine, not just with technology but also with humans. Humans have biases, too. And there’s this whole debate around, is the technology going to be more biased than the humans? How do we do that? But at the same time, like, the technology actually encodes the biases of humans.  

And there was a paper in the Proceedings of the National Academy of Sciences (opens in new tab), which did not look at technology at all but actually looked at the race-based biases of medical trainees that were untrue and harmful in that they perpetuated racist beliefs.  

And so we thought, if medical trainees and humans have these biases, why don’t we see if the models carry them forward? And we added a few more questions that we, sort of, brainstormed as a team, and we started asking the models those questions. And … 

LEE: And by this time, it was GPT-4?  

DANESHJOU: We did include GPT-4 because GPT-4 came out, as well. And we also included other models, as well, such as Claude, because we wanted to look across the board. And what we found is that all of the models had instances of perpetuating race-based medicine. And actually, the GPT models had one of the most, I think, one of the most egregious responses—and, again, this is 3.5 and 4; we haven’t, you know, fully checked to see what things look like, because there have been newer models—in that they said that we should use race in calculating kidney function because there were differences in muscle mass between the races. And this is sort of a racist trope in medicine that is not true because race is not based on biology; it’s a social construct.   

So, yeah, that was that study. And that one did spur a lot of public conversation. 

LEE: Your work there even had the issue of bias overtake hallucination, you know, as really the most central and most difficult issue. So how do you think about bias in LLMs, and does that in your mind disqualify the use of large language models from particular uses in medicine?  

DANESHJOU: Yeah, I think that the hallucinations are an issue, too. And in some senses, they might even go with one another, right. Like, if it’s hallucinating information that’s not true but also, like, biased.  

So I think these are issues that we have to address with the use of LLMs in healthcare. But at the same time, things are moving very fast in this space. I mean, we have a secure instance of several large language models within our healthcare system at Stanford so that you could actually put secure patient information in there.  

So while I acknowledge that bias and hallucinations are a huge issue, I also acknowledge that the healthcare system is quite broken and needs to be improved, needs to be streamlined. Physicians are burned out; patients are not getting access to care in the appropriate ways. And I have a really great story about that, which I can share with you later.  

So in 2024, we did a study asking dermatologists, are they using large language models (opens in new tab) in their clinical practice? And I think this percentage has probably gone up since then: 65% of dermatologists reported using large language models in their practices on tasks such as, you know, writing insurance authorization letters, you know, writing information sheets for the patient, even, sort of, using them to educate themselves, which makes me a little nervous because in my mind, the best use of large language models right now are cases where you can verify facts easily.  

So, for example, I did show and teach my nurse how to use our secure large language model in our healthcare system to write rebuttal letters to the insurance. I told her, “Hey, you put in these bullet points that you want to make, and you ask it to write the letter, and you can verify that the letter contains the facts which you want and which are true.” 

LEE: Yes. 

DANESHJOU: And we have also done a lot of work to try to stress test models because we want them to be better. And so we held this red-teaming event at Stanford where we brought together 80 physicians, computer scientists, engineers and had them write scenarios and real questions that they might ask on a day to day or tasks that they might actually ask AI to do. 

And then we had them grade the performance. And we did this with the GPT models. At the  time, we were doing it with GPT-3.5, 4, and 4 with internet. But before the paper (opens in new tab) came out, we then ran the dataset on newer models.  

And we made the dataset public (opens in new tab) because I’m a big believer in public data. So we made the dataset public so that others could use this dataset, and we labeled what the issues were in the responses, whether it was bias, hallucination, like, a privacy issue, those sort of things. 

LEE: If I think about the hits or misses in our book, you know, we actually wrote a little bit, not too much, about noticing biases. I think we underestimated the magnitude of the issue in our book. And another element that we wrote about in our book is that we noticed that the language model, if presented with some biased decision-making, more often than not was able to spot that the decision-making was possibly being influenced by some bias. What do you make of that?  

DANESHJOU: So funny enough, I think we had … we had a—before I moved from Twitter to Bluesky—but we had a little back and forth on Twitter about this, which actually inspired us to look into this as a research, and we have a preprint up on it of actually using other large language models to identify bias and then to write critiques that the original model can incorporate and improve its answer upon. 

I mean, we’re moving towards this sort of agentic systems framework rather than a singular large language model, and people, of course, are talking about also retrieval-augmented generation, where you sort of have this corpus of, you know, text that you trust and find trustworthy and have that incorporated into the response of the model.  

And so you could build systems essentially where you do have other models saying, “Hey, specifically look for bias.” And then it will sort of focus on that task. And you can even, you know, give it examples of what bias is within context learning now. So I do think that we are going to be improving in this space. And actually, my team is … most recently, we’re working on building patient-facing chatbots. That’s where my, like, patient story comes in. But we’re building patient-facing chatbots. And we’re using, you know, we’re using prompt-engineering tools. We’re using automated eval tools. We’re building all of these things to try to make it more accurate and less bias. So it’s not just like one LLM spitting out an answer. It’s a whole system. 

LEE: All right. So let’s get to your patient-facing story.  

DANESHJOU: Oh, of course. Over the summer, my 6-year-old fell off the monkey bars and broke her arm. And I picked her up from school. She’s crying so badly. And I just look at her, and I know that we’re in trouble. 

And I said, OK, you know, we’re going straight to the emergency room. And we went straight to the emergency room. She’s crying the whole time. I’m almost crying because it’s just, like, she doesn’t even want to go into the hospital. And so then my husband shows up, and we also had the baby, and the baby wasn’t allowed in the emergency room, so I had to step out. 

And thanks to the [21st Century] Cures Act (opens in new tab), I’m getting, like, all the information, you know, as it’s happening. Like, I’m getting the x-ray results, and I’m looking at it. And I can tell there’s a fracture, but I can’t, you know, tell, like, how bad it is. Like, is this something that’s going to need surgery?  

And I’m desperately texting, like, all the orthopedic folks I know, the pediatricians I know. [LAUGHTER] “Hey, what does this mean?” Like, getting real-time information. And later in the process, there was a mistake in her after-visit summary about how much Tylenol she could take. But I, as a physician, knew that this dose was a mistake.  

I actually asked ChatGPT. I gave it the whole after-visit summary, and I said, are there any mistakes here? And it clued in that the dose of the medication was wrong. So again, I—as a physician with all these resources—have difficulty kind of navigating the healthcare system; understanding what’s going on in x-ray results that are showing up on my phone; can personally identify medication dose mistakes, but you know, most people probably couldn’t. And it could be very … I actually, you know, emailed the team and let them know, to give feedback.  

So we have a healthcare system that is broken in so many ways, and it’s so difficult to navigate. So I get it. And so that’s been, you know, a big impetus for me to work in this space and try to make things better. 

LEE: That’s an incredible story. It’s also validating because, you know, one of the examples in our book was the use of an LLM to spot a medication error that a doctor or a nurse might make. You know, interestingly, we’re finding no sort of formalized use of AI right now in the field. But anecdotes like this are everywhere. So it’s very interesting.  

All right. So we’re starting to run short on time. So I want to ask you a few quick questions and a couple that might be a little provocative.  

DANESHJOU: Oh boy. [LAUGHTER] Well, I don’t run away … I don’t run away from controversy. 

LEE: So, of course, with that story you just told, I can see that you use AI yourself. When you are actually in clinic, when you are being a dermatologist … 

DANESHJOU: Yeah.  

LEE: … and seeing patients, are you using generative AI? 

DANESHJOU: So I do not use it in clinic except for the situation of the insurance authorization letters. And even, I was offered, you know, sort of an AI-based scribe, which many people are using. There have been some studies that show that they can make mistakes. I have a human scribe. To me, writing the notes is actually part of the thinking process. So when I write my notes at the end of the day, there have been times that I’ve all of a sudden had an epiphany, particularly on a complex case. But I have used it to write, you know, sort of these insurance authorization letters. I’ve also used it in grant writing. So as a scientist, I have used it a lot more.  

LEE: Right. So I don’t know of anyone who has a more nuanced and deeper understanding of the issues of biases in AI in medicine than you. Do you think [these] biases can be repaired in AI, and if not, what are the implications? 

DANESHJOU: So I think there are several things here, and I just want to be thoughtful about it. One, I think, the bias in the technology comes from bias in the system and bias in medicine, which very much exists and is incredibly problematic. And so I always tell people, like, it doesn’t matter if you have the most perfect, fair AI. If you have a biased human and you add those together, because you’re going to have this human-AI interaction, you’re still going to have a problem.  

And there is a paper that I’m on with Dr. Matt Groh (opens in new tab), which looked at looking at dermatology diagnosis across skin tones and then with, like, AI assistance. And we found there’s a bias gap, you know, with even physicians. So it’s not just an AI problem; humans have the problem, too. And… 

LEE: Hmm. Yup. 

DANESHJOU: … we also looked at when you have the human-AI system, how that impacts the gap because you want to see the gap close. And it was kind of a mixed result in the sense that there was actually situations where, like, the accuracy increased in both groups, but the gap actually also increased because they were actually, even though they knew it was a fair AI, for some reason, they were relying upon the AI more often when … or they were trusting it more often on diagnoses on white skin—maybe they’d read my papers, who knows? [LAUGHTER]—even though we had told them, you know, it was a fair model.  

So I think for me, the important thing is understanding how the AI model works with the physician at the task. And what I would like to see is it improve the overall bias and disparities with that unit.  

And at the same time, I tell human physicians, we have to work on ourselves. We have to work on our system, you know, our medical system that has systemic issues of access to care or how patients get treated based on what they might look like or other parts of their background.  

LEE: All right, final question. So we started off with your stories about imaging in dermatology. And of course, Geoff Hinton, Turing winner and one of the grandfathers of the AI revolution, famously had predicted many years ago that by 2018 or something like that, we wouldn’t need human radiologists because of AI. 

That hasn’t come to pass, but since you work in a field that also is very dependent on imaging technologies, do you see a future when radiologists or, for that matter, dermatologists might be largely replaced by machines? 

DANESHJOU: I think that’s a complex question. Let’s say you have the most perfect AI systems. I think there’s still a lot of nuance in how these, you know, things get done. I’m not a radiologist, so I don’t want to speak to what happens in radiology. But in dermatology, it ends up being quite complex, the process. 

LEE: Yeah. 

DANESHJOU: You know, I don’t just look at lesions and make diagnoses. Like, I do skin exams to first identify the lesions of concern. So maybe if we had total-body photography that could help, like, catch which lesions would be of concern, which people have worked on, that would be step, sort of, step one.  

And then the second thing is, you know, it’s … I have to do the biopsy. So, you know, the robot’s not going to be doing the biopsy. [LAUGHTER]  

And then the pathology for skin cancer is sometimes very clear, but there’s also, like, intermediate-type lesions where we have to make a decision bringing all that information together. For rashes, it can be quite complex. And then we have to kind of think about what other tests we’re going to order, what therapeutics we might try first, that sort of stuff.  

So, you know, there is a thought that you might have AI that could reason through all of those steps maybe, but I just don’t feel like we’re anywhere close to that at all. I think the other thing is AI does a lot better on sort of, you know, tasks that are well defined. And a lot of things in medicine, like, it would be hard to train the model on because it’s not well defined. Even human physicians would disagree on the next best step.  

LEE: Well, Roxana, for whatever it’s worth, I can’t even begin to imagine anything replacing you. I think your work has been just so—I think you used the word, and I agree with it— landmark, and multiple times. So thank you for all that you’re doing and thank you so much for joining this podcast.  

DANESHJOU: Thanks for having me. This was very fun. 

[TRANSITION MUSIC] 

The issue of bias in AI has been the subject of truly landmark work by Roxana and her collaborators. And this includes biases in large language models.  

This was something that in our writing of the book, Carey, Zak, and I recognized and wrote about. But in fairness, I don’t think Carey, Zak, or I really understood the full implications of it. And this is where Roxana’s work has been so illuminating and important. 

Roxana’s practical prescriptions around red teaming have proven to be important in practice, and equally important were Roxana’s insights into how AI might always be guilty of the same biases, not only of individuals but also of whole healthcare organizations. But at the same time, AI might also be a potentially powerful tool to detect and help mitigate against such biases. 

When I think about the book that Carey, Zak, and I wrote, I think when we talked about laws, norms, ethics, regulations, it’s the place that we struggled the most. And in fact, we actually relied on a conversation with GPT-4 in order to tease out some of the core issues. Well, moving on from that conversation with an AI to a conversation with three deep experts who have dedicated their careers to making sure that we can harness all of the goodness while mitigating against the risks of AI, it’s been both fulfilling, very interesting, and a great learning experience. 

[THEME MUSIC]   

I’d like to say thank you again to Laura, Vardit, and Roxana for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including an examination on the broader economic impact of AI in health and a discussion on AI drug discovery. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 


The post Laws, norms, and ethics for AI in health appeared first on Microsoft Research.

]]>
Empowering patients and healthcare consumers in the age of generative AI http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-empowering-patients-and-healthcare-consumers-in-the-age-of-generative-ai/ Thu, 17 Apr 2025 23:10:09 +0000 http://approjects.co.za/?big=en-us/research/?p=1136554 Evangelist for patient empowerment Dave deBronkart and Manatt Heath’s Christina Farr discuss how generative AI is redefining healthcare by empowering patients, challenging traditional care models, and creating new opportunities for innovation and collaboration.

The post Empowering patients and healthcare consumers in the age of generative AI appeared first on Microsoft Research.

]]>
AI Revolution podcast | Episode 3 - Are patients using generative AI for their own healthcare? | outline illustration of Christina Farr, Peter Lee, and Dave deBronkart

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Dave deBronkart (opens in new tab) and Christina Farr (opens in new tab), champions of patient-centered digital health, join Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. DeBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. 


Learn more:

e-Patient Dave (opens in new tab) 
Patient engagement website  

Patients Use AI (opens in new tab) 
Substack blog  

Meet e-Patient Dave (opens in new tab) 
TED Talk | April 2011 

Let Patients Help: A Patient Engagement Handbook (opens in new tab) 
Book | Dave deBronkart | April 2013  

Second Opinion (opens in new tab) 
Health and tech blog 

There’s about to be a lot of AI capital incineration (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024 

A letter to my kids about last week (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024

The AI Revolution in Medicine: GPT-4 and Beyond  
Book | Peter Lee, Carey Goldberg, Isaac Kohane | April 2023 

Transcript

[MUSIC]  

[BOOK PASSAGE]   

“In healthcare settings, keeping a human in the loop looks like the solution, at least for now, to GPT-4’s less-than 100% accuracy. But years of bitter experience with ‘Dr. Google’ and the COVID ‘misinfodemic’ show that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls. Yet because GPT-4 appears to be such an extraordinary tool for mining humanity’s store of medical information, there’s no question members of the public will want to use it that way—a lot.” 

[END OF BOOK PASSAGE]   

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 

[THEME MUSIC FADES]

The passage I read at the top there is from Chapter 5, “The AI-Augmented Patient,” which Carey wrote.  

People have forever turned to the internet and sites like WebMD, Healthline, and so on to find health information and advice. So it wouldn’t be too surprising to witness a significant portion of people refocus those efforts around tools and apps powered by generative AI. Indeed, when we look at our search and advertising businesses here at Microsoft, we find that healthcare is in the top three most common categories of queries by consumers. 

When we envision AI’s potential impact on the patient experience, in our book, we suggested that it could potentially be a lifeline, especially for those without easy access to adequate healthcare; a research partner to help people make sense of existing providers and treatments; and even maybe act as a third member of a care team that has traditionally been defined by the doctor-patient relationship. This also could have a huge impact on venture capitalists in the tech sector who traditionally have focused on consumer-facing technologies.  

In this episode, I’m pleased to welcome Dave deBronkart and Christina Farr.  

Dave, known affectionately online as “e-Patient Dave,” is a world-leading advocate for empowering patients. Drawing on his experience as a survivor of stage 4 cancer, Dave gave a viral TED talk on patient engagement and wrote the highly rated book Let Patients Help! Dave was the Mayo Clinic’s visiting professor in internal medicine in 2015, has spoken at hundreds of conferences around the globe, and today runs the Patients Use AI blog on Substack. 

Chrissy puts her vast knowledge of the emerging digital and health technology landscape to use as a managing director with Manatt Health, a company that works with health systems, pharmaceutical and biotech companies, government policymakers, and other stakeholders to advise on strategy and technology adoption with the goal of improving human health. Previously, she was a health tech reporter and on-air contributor for CNBC, Fast Company, Reuters, and other renowned news organizations and publications. 

Hardly a week goes by without a news story about an ordinary person who managed to address their health problems—maybe even save their lives or the lives of their loved ones, including in some cases their pets—through the use of a generative AI system like ChatGPT. And if it’s not doing something as dramatic as getting a second opinion on a severe medical diagnosis, the empowerment that people feel when an AI can help decode an indecipherable medical bill or report or get advice on what to ask a doctor, well, those things are both meaningful and a daily reality in today’s AI world. 

And make no mistake—such consumer empowerment could mean business, really big business, and this means that investors in new ventures are smart to be taking a close look at all this.  

For these and many other reasons, I am thrilled to pair the perspectives offered by e-Patient Dave and Chrissy Farr together for this episode.

Here is my interview with Dave deBronkart: 

LEE: Dave, it’s just a thrill and honor to have you join us. 

DAVE DEBRONKART: It’s a thrill to be alive. I’m really glad that good medicine saved me, and it is just unbelievable, fun, and exciting and stimulating to be in a conversation with somebody like you. 

LEE: Likewise. Now, we’re going to want to get into both the opportunities and the challenges that patients face. But before that, I want to talk a little bit and delve a little bit more into you, yourself. I, of course, know you as this amazing speaker and advocate for patients. But you have had actually a pretty long career and history prior to all this. And so can you tell us a little bit about your background? 

DEBRONKART: I’ll go back all the way to when I first got out of college. I didn’t know what I wanted to do when I grew up. So I got a job where I … basically, I used my experience working on the school paper to get a temporary job. It was in type setting, if you can believe that. [LAUGHTER] And, man, a few years later, that became the ultimate lesson in disruptive innovation.  

LEE: So you were actually doing movable type? Setting type?  

DEBRONKART: Oh, no, that was, I was … I’m not that old, sir! [LAUGHTER] The first place where I worked, they did have an actual Linotype machine and all that.  

LEE: Wow. 

DEBRONKART: Anyway, one thing led to another. A few years after I got that first job, I was working for the world’s biggest maker of typesetting machines. And I did product marketing, and I learned how to speak to audiences of all different sorts. And then desktop publishing came along, as I say. And it’s so funny because, now mind you, this was 10 years before Clay Christensen wrote The Innovator’s Dilemma (opens in new tab). But I had already lived through that because here we were. We were the journeymen experts in our noble craft that had centuries of tradition as a background. Is this reminding you of anything? 

[LAUGHTER] Well, seriously. And then along comes stuff that can be put in the hands of the consumers. And I’ll tell you what, people like you had no clue how to use fonts correctly. [LAUGHTER] We were like Jack Nicholson, saying “You can’t handle the Helvetica! You don’t know what you’re doing!” But what happened then, and this is really relevant, what happened then is—all of a sudden, the population of users was a hundred times bigger than the typesetting industry had ever been.  

The clueless people gained experience, and they also started expressing what they wanted the software to be. The important thing is today everybody uses fonts. It’s no longer a secret profession. Things are done differently, but there is more power in the hands of the end user. 

LEE: Yeah, I think it’s so interesting to hear that story. I didn’t know that about your background. And I think it sheds some light on hopefully what will come out later as you have become such, I would call you a fierce consumer advocate. 

DEBRONKART: Sure, energetic, however, whatever you want to call it, sure. [LAUGHTER] Seriously, Peter, what I always look to do … so this is a mixture of my having been run over by a truck during disruptive innovation, all right, but then also looking at that experience from a marketing perspective: how can I convey what’s happening in a way that people can hear? Because you really don’t get much traction as an advocate if you come in and say, you people are messed up.  

LEE: Right. So, now I know this gets into something fairly personal, but you’ve actually been remarkably public about this. You became very ill.  

DEBRONKART: Yes.  

LEE: And of course, I suspect some of the listeners to this podcast probably have followed your story, but many have not. So can we go a little bit through that … 

DEBRONKART: Sure.  

LEE: … just to give our listeners a sense of how this has formed some of your views about the healthcare system. 

DEBRONKART: So late in 2006, I went in for my annual physical with my deservedly famous primary care physician, Danny Sands at Beth Israel [Deaconess Medical Center] in Boston. And in the process—I had moved away for a few years, so I hadn’t seen him for a while—I did something unusual. I came into the visit with a preprinted letter with 13 items I wanted to go over with him.   

LEE: What made you do that? Why did you do that? 

DEBRONKART: I have always been, even before I knew the term exists, I was an engaged patient, and I also very deeply believe in partnership with my physicians. And I respected his time. I had all these things, because I hadn’t seen him for three years … 

LEE: Yeah. 

DEBRONKART: … all these things I wanted to go through. To me it was just if I walked into a business meeting with a bunch of people that I hadn’t seen for three years and I want to get caught up, I’d have an agenda. 

LEE: It’s so interesting to hear you say this because I’m very similar to you. I like to do my own research. I like to come in with checklists. And do you ever get a sense like I do that sometimes that makes your doctor a little uncomfortable? 

DEBRONKART: [LAUGHS] Well, you know, so sometimes it does make some doctors uncomfortable and that touches on something that right now is excruciatingly important in the culture change that’s going on. I’ve spent a lot of time as I worked on the culture change from the patient side, I want to empathize, understand what’s going on in the doctor’s head. Most doctors are not trained in medical school or later, how do you work with a patient who behaves like you or me, you know?  

And in the hundreds of speeches that I’ve given, I’ve had quite a range of reactions from doctors afterwards. I’ve had doctors come up to me and say, “This is crap.” I mean, right to my face, right. “I’ll make the decisions. I’ll decide what we’re going to talk about.” And now my thought is, OK, and you’re not going to be my doctor

LEE: Yeah. 

DEBRONKART: I want to be responsible for how the time is spent, and I didn’t want be fumbling for words during the visit. 

LEE: Right. 

DEBRONKART: So I said, I’ve got among other things … one of the 13 things was I had a stiff shoulder. So he ordered a shoulder x-ray, and I went and got the shoulder x-ray.  

And I will never forget this. Nine o’clock the next morning, he called me, and I can still—this is burned into my memory—I can see the Sony desk phone with 0900 for the time. He said, “Dave, your shoulder’s going to be fine. I pulled up the x-ray on my screen at home. It’s just a rotator cuff thing, but Dave, something else showed up. There’s something in your lung that shouldn’t be there.”  

And just by total luck, what turned out to be a metastasis of kidney cancer was in my lung next to that shoulder. He immediately ordered a CAT scan. Turned out there were five tumors in both lungs, and I had stage 4 kidney cancer.  

LEE: Wow.  

DEBRONKART: And on top of that, back then—so this was like January of 2007—back then, there was much less known about that disease than there is now.  

LEE: Right. 

DEBRONKART: There were no studies—zero research on people like me—but the best available study said that for somebody with my functional status, my median survival was 24 weeks. Half the people like me would be dead in five and a half months. 

LEE: So that just, you know, I can’t imagine, you know, how I would react in this situation. And what were your memories of the interaction then between you and your doctor? You know, how did your doctor engage with you at that time? 

DEBRONKART: I have very vivid memories. [LAUGHS] Who was it? I can’t remember what famous person said, “Nothing focuses the mind like the knowledge that one is to be hanged in a fortnight,” right. But 24 weeks does a pretty good job of it.  

And I … just at the end of that phone call where he said I’m going to order a CAT scan, I said, “Is there anything I should do?” Like I was thinking, like, go home and make sure you don’t eat this sort of this, this, that, or the other thing.  

LEE: Right. 

DEBRONKART: And what he said was, “Go home and have a glass of wine with your wife.” 

LEE: Yeah. 

DEBRONKART: Boy, was that sobering. But then it’s like, all right, game on. What are we going to do? What are my options? And a really important thing, and this, by the way, this is one reason why I think there ought to be a special department of hell for the people who run hospitals and other organizations where they think all doctors are interchangeable parts. All right. My doctor knew me. 

LEE: Yeah. 

DEBRONKART: And he knew what was important to me. So when the biopsy came back and said, “All right, this is definitely stage 4, grade 4 renal cell carcinoma.” He knew me enough … he said, “Dave, you’re an online kind of guy. You might like to join this patient community that I know of.” This was 2007.  

LEE: Yeah. 

DEBRONKART: It’s a good quality group. This organization that barely exists. 

LEE: That’s incredibly progressive, technologically progressive for that time. 

DEBRONKART: Yeah, incredibly progressive. Now, a very important part of the story is this patient community is just a plain old ASCII listserv. You couldn’t even do boldface, right. And this was when the web was … web 2.0 was just barely being created, but what it was, was a community of people who saw the problems the way I see the problems. God bless the doctors who know all the medical stuff, you know. And they know the pathology and the morphology and whatever it is they all know.  

And I’m making a point here of illustrating that I am anything but medically trained, right. And yet I still, I want to understand as much as I can.  

I was months away from dead when I was diagnosed, but in the patient community, I learned that they had a whole bunch of information that didn’t exist in the medical literature. 

Now today we understand there’s publication delays; there’s all kinds of reasons. But there’s also a whole bunch of things, especially in an unusual condition, that will never rise to the level of deserving NIH [National Institute of Health] funding, right … 

LEE: Yes. 

DEBRONKART: … and research. And as it happens, because of the experience in that patient community, they had firsthand experience at how to survive the often-lethal side effects of the drug that I got. And so I talked with them at length and during my treatment, while I was hospitalized, got feedback from them. And several years later my oncologist, David McDermott, said in the BMJ [British Medical Journal], he said, “You were really sick. I don’t know if you could have tolerated enough medicine if you hadn’t been so prepared.” 

Now there is a case for action, for being actively involved, and pointing towards AI now, doing what I could to learn what I could despite my lack of medical education. 

LEE: But as you were learning from this patient community these things, there had to be times when that came into conflict with the treatment plan that you’re under. That must have happened. So first off, did it? And how were those conflicts resolved? 

DEBRONKART: So, yes, it did occasionally because in any large population of people you’re going to have differences of opinion. Now, before I took any action—and this closely matches the current thought of human in the loop, right—before I took any action based on the patient community, I checked with my clinicians.  

LEE: Were there times when there were things that … advice you were getting from the patient community that you were very committed to, personally, but your official, formal caregivers disagreed with? 

DEBRONKART: No, I can’t think of a single case like that. Now, let me be clear. My priority was: save my ass, keep me alive, you know? And if I thought a stranger at the other end of an internet pipe had a different opinion from the geniuses at my hospital—who the whole patient community had said, this is maybe the best place in the world for your disease— 

LEE: Yes. 

DEBRONKART: I was not going to go off and have some philosophical debate about epistemology and all of that stuff. And remember, the clock was ticking. 

LEE: Well, in fact, there’s a reason why I keep pressing on this point. It’s a point of curiosity because in the early days of GPT-4, there was an episode that my colleague and friend Greg Moore, who’s a neuroradiologist, had with a friend of his that became very ill with cancer.  

And she went in for treatment and the treatment plan was a specific course of chemotherapy, but she disagreed with that. She wanted a different type of, more experimental immunotherapy. And that disagreement became intractable to the point that the cancer specialists that were assigned to treat her asked Greg, “Can you talk to her and explain, you know, why we think our decision is best?”  

And the thing that was remarkable is Greg decided to use that case as one of the tests in the early development days of GPT-4 and had a conversation to explain the situation. They went back and forth. GPT-4 gave some very useful advice to Greg on what to say and how to frame it.  

And then, when Greg finally said, “You know, thank you for the help.” What floored both me and Greg is GPT-4 said, “You’re welcome. But, Greg, what about you? Are you getting all the support that you need? Here are some resources.”  

And, you know, I think we can kind of take that kind of behavior for granted today, and there have been some published studies about the seeming empathy of generative AI. 

But in those early days, it was eerie, it was awe-inspiring, it was disturbing—you know, all of these things at once. And that’s essentially why I’m so curious about your experiences along these lines. 

DEBRONKART: That’s like, that’s the flip side of the famous New York Times reporter who got into a late-night discussion …  

LEE: Oh, Kevin Roose, yes. [LAUGHTER] 

DEBRONKART: You say you’re happy in your marriage, but I think you’re not.  

LEE: Right. 

DEBRONKART: It’s like, whoa, this is creepy. But you know, it’s funny because one of the things that’s always intrigued me, partly because of my professional experience at explaining technology to people, is the early messaging around LLMs [large language models], which I still hear people … The people who say, “Well, wait a minute, these things hallucinate, so don’t trust them.” Or they say, “Look, all it’s doing is predicting the next word.”  

But there are loads of nuances, … 

LEE: Yes.  

DEBRONKART: and that’s, I mean, it takes an extraordinary amount of empathy, not just for the other person’s feelings, but for their thought process … 

LEE: Hmm, yes. Yeah. 

DEBRONKART: … to be able to express that. Honestly, that is why I’m so excited about the arriving future. One immensely important thing … as I said earlier, I really respect my doctors’ time—“doctors” plural—and it breaks my heart that the doctors who did all this work to get license and all that stuff are quitting the field because the economic pressures are so great. I can go home and spend as many hours as I want asking it questions. 

LEE: Yes.  

DEBRONKART: All right. I’ve recently learned a thing to do after I have one of these hours-long sessions, I’ll say to it, “All right, so if I wanted to do this in a single-shot prompt, how would you summarize this whole conversation?” So having explored with no map, I end up with a perspective that it just helps me see the whole thing … 

LEE: Yes. Yeah, that’s brilliant. 

DEBRONKART: … without spending a moment of the doctor’s time.

LEE: Yeah, yeah. So when was the first time that you used, you know, generative AI?

DEBRONKART: It had to be February or March of whatever the first year was.  

LEE: Yeah. And was it the New York Times article that piqued your interest?  

DEBRONKART: Oh absolutely. 

LEE: Yeah. And so what did you think? Were you skeptical? Were you amazed? What went through your mind? 

DEBRONKART: Oh, no, no, no. It blew my mind. And I say that as somebody who emerged from the 1960s and ’70s, one of the original people who knew what it was to have your mind blown back in the psychedelic era. [LAUGHTER] No, it blew my mind. And it wasn’t just the things it said; it was the implications of the fact that it could do that.  

I did my first programming with BASIC or Fortran. I don’t know, something in the mid-’60s, when I was still in high school. So I understand, well, you know, you got to tell it exactly what you want it to do or it’ll do the wrong thing. So, yeah, for this to be doing something indistinguishable from thinking—indistinguishable from thinking—was completely amazing. And that immediately led me to start thinking about what this would mean in the hands of a sick person. And, you know, my particular area of fascination in medicine—everything I use it for these days is mundane—but the future of a new world of medicine and healthcare is one where I can explore and not be limited to things where you can read existing answers online. 

LEE: Right. So if you had GPT-4 back in 2006, 2007, when you were first diagnosed with your renal cancer, how would things have been different for you? Would things have been different for you? 

DEBRONKART: Oh, boy, oh, boy, oh, boy. This is going to have to be just a swag because, I mean, for it to—you mean, if it had just dropped out of thin air?  

LEE: Yes. [LAUGHS] 

DEBRONKART: Ah, well, that’s … that’s even weirder. First thing we in the patient community would have to do is figure out what this thing does … 

LEE: Yeah. 

DEBRONKART: … before we can start asking it questions.  

Now, Peter, a large part of my evangelism, you know, there’s a reason why my book (opens in new tab) and my TED talk (opens in new tab) were titled “Let Patients Help.” 

I really am interested in planting a thought in people’s minds, and it’s not covert. I come right out and say it in the title of the book, right, planting a thought that, with the passage of time, will hold up as a reasonable thing to do. And same thing is true with AI. So … and I’ve been thinking about it that way from the very beginning. I never closed the loop on my cancer story. I was diagnosed in January, and I had my last drop of high-dose interleukin—experimental immunotherapy, right—in July. And that was it. By September, they said, looks like you beat it. And I was all done.  

And there’s the question: how could it be that I didn’t die? How could it be that valuable information could exist and not be in the minds of most doctors? Not be in the pages of journals?  

And if you think of it that way, along the way, I became a fan of Thomas Kuhn’s famous book, The Structure of Scientific Revolutions (opens in new tab).  

LEE: Yes. 

DEBRONKART: When something that the paradigm says could not happen does happen, then responsible thinkers have to say, the paradigm must be wrong. That’s the stage of science that he called a crisis. So if something came along back in 2006, 2007, I would have to look at it and say, “This means we’ve got to rethink our assumptions.” 

LEE: Yes. You know, now with the passage of time, you know, over the last two years, we’ve seen so many stories like this, you know, where people have consulted AI for a second opinion, … 

DEBRONKART: Sure. 

LEE: … maybe uploaded their labs and so on and gotten a different diagnosis, a different treatment suggestion. And in several cases that have been reported, both in medical journals and in the popular press, it’s saved, it has saved lives. And then your point about communities, during COVID pandemic, even doctors form communities to share information. A very famous example are doctors turning to Facebook and Twitter to share that if they had a COVID patient in severe respiratory distress, sometimes they could avoid intubation by …  

DEBRONKART: Pronation. Yeah. 

LEE: … pronation. And things like this end up being, in a way, I think the way you’re couching it, ways to work around the restrictions in the more formal healthcare system. 

DEBRONKART: The traditional flow. Yes. And there is nothing like a forest fire, an emergency, an unprecedented threat to make people drop the usual formal pathways. 

LEE: So, I’d like to see if we can impart from your wisdom and experience some advice for specific stakeholders. So, what do you say to a patient? What do you say to a doctor? What do you say to the executive in charge of a healthcare system? And then finally, what do you say to policymakers and regulators? So, let’s start with patients. 

DEBRONKART: So if you’ve got a problem that or a question where you really want to understand more than you’ve been able to, then give a try to these things. Ask some questions. And it’s not just the individual question and answer. The famous, amazing patient advocate, Hugo Campos, … 

LEE: Hmm, yes. 

DEBRONKART: … said something that I call “Hugo’s Law.” He said, “Dave, I don’t ask it for answers. I use it to help me think.” 

LEE: Yes, absolutely.  

DEBRONKART: So you get an answer and you say, “Well, I don’t understand this. What about that? Well, what if I did something different instead?” And never forget, you can come back three months later and say, “By the way, I just thought of something. What about that,” right.  

LEE: Yeah, yeah, fantastic. 

DEBRONKART: So be focused on what you want to understand.  

LEE: So now let’s go to a doctor or a nurse. What’s the advice there?  

DEBRONKART: Please try to imagine a world … I know that most people today are not as activated as I am in wanting to be engaged in their health. But to a very large extent, people, a lot of people, family and friends, have said they don’t want to do this because they don’t want to offend the doctors and nurses. Now, even if the doctor or nurse is not being a paternal jerk, all right, the patients have a fear of this. Dr. Sands handles this brilliantly. I mentioned it in the book. He proactively asks, are there any websites you’ve found useful?  

And you can do the same thing with AI. Have you done anything useful with ChatGPT or something like that?  

LEE: That actually suggests some curricular changes in medical schools in order to train doctors.  

DEBRONKART: Absolutely. In November, I attended a retreat on rethinking medical education. I couldn’t believe it, Peter. They were talking about how AI can be used in doing medical education. And I was there saying, “Well, hello. As long as we’re here, let’s rethink how you teach doctors, medical students to deal with somebody like me.” Cause what we do not want …  

There was just a study in Israel where it said 18% of adults use AI regularly for medical questions, which matches other studies in the US.  

LEE: Yep.  

DEBRONKART: But it’s 25% for people under 25. We do not want 10 years from now to be minting another crop of doctors who tells patients to stay off of the internet and AI.  

LEE: You know, it’s such an important point. Students, you know, entering into college to go on to medical school and then a residency and then finally into practice. I think you’re thinking about the year 2035 or thereabouts. And when you think of that, at least in tech industry terms, we’re going to be on Mars, we’re going to have flying cars, we’re going to have AGI [artificial general intelligence], and you really do need to think ahead. 

DEBRONKART: Well, you know, healthcare, and this speaks to the problems that health system executives are facing: y’all better watch out or you’re going to be increasingly irrelevant, all right.  

One of the key use cases, and I’m not kidding … I mean, I don’t mean that if I have stage 4 kidney cancer, I’m going to go have a talk with my robot. But one of the key use cases that makes people sit down and try to solve a problem on their own with an LLM is if they can’t get an appointment.  

LEE: Yes. 

DEBRONKART: Well, so let’s figure out, can the health system, can physicians and patients learn to work together in some modified way? Nobody I know wants to stop seeing a doctor, but they do need to have their problems solved.  

LEE: Yeah, yeah. 

DEBRONKART: And there is one vitally important thing I want to … I insist that we get into this, Peter. In order for the AI to perform to the best of its contribution, it needs to know all the data. 

LEE: Yes.  

DEBRONKART: Well, and so does the patient. Another super-patient, James Cummings, has two rare-genetic-mutation kids. (opens in new tab) He goes to four Epic-using hospitals. Those doctors can’t see each other’s data. So he compiles it, and he shows … the patient brings in the consolidated data. 

LEE: Yes. Well, and I know this is something that you’ve really been passionate about, and you’ve really testified before Congress on. But maybe then that leads to this fourth category of people who need advice, which are policymakers and regulators. What would you tell them? 

DEBRONKART: It’s funny, in our current political environment, there’s lots of debates about regulation, more regulation, less regulation. I’m heavily in favor of the regulations that say, yeah, I gotta be able to see and download my damn data, as I’m famous for calling it. But what we need to do if we were to have any more regulations is just mandate that you can’t keep the data away from people who need it. You can’t when … 

LEE: Yep. 

DEBRONKART: OK, consider one of the most famous AI-using patients is this incredible woman, Courtney Hofmann, whose son saw 17 doctors over three years (opens in new tab), and she finally sat down one night and typed it all into GPT. She has created a startup to try to automate the process of gathering everyone’s data.  

LEE: Yes, yes. Yeah. 

DEBRONKART: And I know people who have been trying to do this and it’s just really hard. Policy people should say, look, I mean, we know that American healthcare is unsustainable economically. 

LEE: Yes. 

DEBRONKART: And one way to take the pressure off the system—because it ain’t the doctors’ fault, because they’re burned out and quitting—one way to take the pressure off is to put more data in the hands of the patients so that entrepreneurs can make better tools. 

LEE: Yeah. All right. So, we’ve run out of time, but I want to ask one last provocative question to send us off. Just based on your life’s experience, which I think is just incredible and also your personal generosity in sharing your stories with such a wide audience, I think is incredible. It’s just doing so much good in the world. Do you see a future where AI effectively replaces human doctors? Do you think that’s a world that we’re heading towards? 

DEBRONKART: No, no, no, no. People are always asking me this. I do imagine an increasing base, an increasing if … maybe there’s some Venn diagram or something, where the number of things that I can resolve on my own will increase.  

LEE: Mm-hmm. Yes. 

DEBRONKART: And in particular, as the systems get more useful, and as I gain more savvy at using them and so on, there will be cases where I can get it resolved good enough before I can get an appointment, right. But I cannot imagine a world without human clinicians. Now, I don’t know what that’s going to look like, right.

LEE: Yes. [LAUGHS]

DEBRONKART: I mean, who knows what it’s going to be. But I keep having … Hugo blogged this incredible vision of where his agentic AI will be looking at one of these consolidated blob medical records things, and so will his doctor’s agentic AI. 

LEE: Yes. Well, I think I totally agree with you. I think there’ll always be a need and a desire for the human connection. Dave, this has been an incredible, really at times, riveting conversation. And as I said before, thank you for being so generous with your personal stories and with all the activism and advocacy that you do for patients. 

DEBRONKART: Well, thank you. I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.  
 
[LAUGHTER] No, well, no kidding. In preparing for this, I listened to a bunch of back podcast episodes, “Microsoft Research,” “NEJM AI.” They talk about things I do not comprehend and don’t get me started on quantum, right? [LAUGHTER] But I’m grateful and I hope I can contribute some guidance on how to solve the problem of the person for whom the industry exists. 

LEE: Yeah, you absolutely have done that. So thank you. 

[TRANSITION MUSIC] 

E-Patient Dave is so much fun to talk to. His words and stories are dead serious, including his openness about his struggles with cancer. But he just has a way of engaging with the world with such activism and positivity. The conversation left me at least with a lot of optimism about what AI will mean for the consumer.  

One of the key takeaways for me is Dave’s point that sometimes informal patient groups have more up-to-date knowledge than doctors. One wonders whether AI will make these sorts of communities even more effective in the near future. It sure looks like it.  

And as I listen to Dave’s personal story about his bout with cancer, it’s a reminder that it can be lifesaving to do your own research, but ideally to do so in a way that also makes it possible to work with your caregivers. Healthcare, after all, is fundamentally a collaborative activity today. 

Now, here’s my conversation with Christina Farr: 

LEE: Chrissy, welcome. I’m just thrilled that you’ve joined us here. 

CHRISTINA FARR: Peter, I’m so excited to be here. Thanks for having me on. 

LEE: One thing that our listeners should know is you have a blog called Second Opinion (opens in new tab). And it’s something that I read religiously. And one of the things you wrote (opens in new tab) a while ago expressed some questions about as an investor or as a founder of a digital health company, if you don’t use the words AI prominently, you will struggle to gain investment. And you were raising some questions about this. So maybe we start there. And, you know, what are you seeing right now in the kind of landscape of emerging digital health tech companies? What has been both the positive and negative impact of the AI craziness that we have in the world today on that? 

FARR: Yeah, I think the title of that was something around the great AI capital incineration [LAUGHTER] that we were about to see. But I, you know, stand by it. I do think that we’ve sort of gone really deep into this hype curve with AI, and you see these companies really just sucking up the lion’s share of venture capital investment. 

And what worries me is that these are, you know, it’s really hard, and we know this from just like decades of being in the space that tools are very hard to monetize in healthcare. Most of healthcare still today and where really the revenue is, is in, still in services. It’s still in those kind of one-to-one interactions. And what concerns me is that we are investing in a lot of these AI tools that, you know, are intended to sell into the system. But the system doesn’t yet know how to buy them and then, beyond that, how to really integrate them into the workflow.  

So where I feel more enthusiastic, and this is a little bit against the grain of what a lot of VCs [venture capitalists] think, but I actually really like care delivery businesses that are fully virtual or hybrid and really using AI as part of their stack. And I think that improves really the style of medicine that they’re delivering and makes it far more efficient. And you start to see, you know, a real improvement in the metrics, like the gross margins of these businesses beyond what you would see in really traditional kind of care delivery. And because they are the ones that own the stack, they’re the ones delivering the actual care, … 

LEE: Right. 

FARR: … they can make the decision to incorporate AI, and they can bring in the teams to do that. And I feel like in the next couple of years, we’re going to see more success with that strategy than just kind of more tools that the industry doesn’t know what to do with. 

LEE: You know, I think one thing that I think I kind of learned or I think I had an inkling of it, but it was really reinforced reading your writings, as a techie, I and I think my colleagues tend to be predisposed to looking for silver bullets. You know, technology that really just solves a problem completely.  

And I think in healthcare delivery in particular, there probably aren’t silver bullets. And what you need to do is to really look holistically at things and your emphasis on looking for those metrics that measure those end-to-end outcomes. So at the same time, if I could still focus on your blog, you do highlight companies that seem to be succeeding that way.  

Just, in preparation for this discussion, I re-read your post about Flo (opens in new tab) being the first kind of unicorn women’s health digital tech startup. And there is actually a lot of very interesting AI technology involved there. So it can happen. How do you think about that? 

FARR: Yeah, I mean, I see a lot of AI across the board. And it’s real with some of these companies, whether it’s, you know, a consumer health app like Flo that, you know, is really focused on kind of period tracking. And AI is very useful there in helping women just predict things like their optimal fertility windows. And it’s very much kind of integrated very deeply into that solution. And they have really sophisticated technology.  

And you see that now as well with the kind of craze around these longevity companies, that there is a lot of AI kind of underlying these companies, as well, especially as they’re doing, you know, a lot of health tests and pulling in new data and providing access to that data in a way that, you know, historically patients haven’t had access to.  

And then I also see it with, you know, like I spoke about with these care delivery companies. I recently spent some time with a business called Origin (opens in new tab), for instance, which is in, you know, really in kind of women’s health, MSK [musculoskeletal], and that beachhead is in pelvic floor PT [physical therapy].  

And for them, you know, it’s useful in the back office for … a lot of their PT providers are getting great education through AI. And then it’s also useful on the patient-facing side as they provide kind of more and more content for you to do exercises at home. A lot of that can be delivered through AI. So for some of these companies, you know, they look across the whole stack of what they’re providing, and they’re just seeing opportunities in so many different places for AI. And I think that’s really exciting, and it’s very, very real. And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications. There are definitely some really compelling AI tools, as well. 

I think companies like Nuance and like Abridge and that whole category of really kind of replacing human scribes with AI, like to me, that is a … that has been so successful because it literally is the pain point. It’s the pain point. You’re solving the pain point for health systems and physicians.  

Burnout is a huge problem. Documentation is a huge problem. So, you know, to say we’ve got this kind of AI solution, everybody’s basically on board—you know, as long as it works—[LAUGHTER] from the first meeting. And then the question becomes, which one do you choose? You know, that said, you know, to me, that’s sort of a standout area. I’m not seeing that everywhere.

LEE: So there are like a bunch of things to delve into there. You know, since you mentioned the Nuance, the Dragon Copilot, and Abridge, and they are doing extremely well. But even for them, and this is another thing that you write about extensively, health systems have a hard time justifying investing in these technologies. It’s not like they’re swimming in cash. And so on that element of things, is there advice to companies that are trying to make technologies to sell into health systems? 

FARR: Yeah, I mean, I’ll give you something really practical on that just example specifically. So I spend a lot of time chatting with a lot of the health system CMIOs [chief medical informatics officers] trying to, you know, just really understand kind of their take. And they often tell me, “Look, you know, these technologies are not inexpensive, and we’ve already spent a boatload of money on REHR [regional electronic health records], which continues to be expensive. And so we just don’t have a lot of budget.” And for them, I think the question becomes, you know, who within the clinical organization would benefit most from these tools?  

There are going to be progressive physicians that will jump on these on day one and start using them and really integrating them into the workflow. And there will be a subset that just wants to do things the way they always have done things. And you don’t want to pay for seats for everybody when there’s a portion that will not be using it. So I think that’s maybe something that I would kind of share with the startup crowd is just, like, don’t try to sell to every clinician within the organization. Not everybody is going to be, you know, a technology early adopter. Work with the health systems to figure out that cohort that’s likely to jump on board first and then kind of go from there. 

LEE: So now let me get back to specifically to women’s health. I think your investing strategy has, I think it’s fair to say has had some emphasis on women’s health. And I would say for me, that has always made sense because if there’s one thing the tech industry knows how to do in any direct-to-consumer business is to turn engagement into dollars.  

And when you think about healthcare, there are very few moments in a person’s life when they have a lot of engagement with their own healthcare. But women have many. You mentioned period tracking, pregnancy, menopause. There are so many areas where you could imagine that technology could be good. At least that’s way I would think about it, but does that make any sense to you, or do you have a different thought process?  

FARR: Oh, my god, I’ve been, I’m just nodding right now because I’ve been saying the same thing for years, [LAUGHS] that like, I think the, you know, the moments of what I call naturally high engagement are most interesting to me. And I think it’s why it’s been such a struggle with some of these companies that are looking at, you know, areas like or conditions like type two diabetes.  

I mean, it’s just so hard to try to change somebody’s behavior, especially through technology. You know, we’ve not kind of proven out that these nudges are really changing anybody’s mind about, you know, their day-to-day lifestyles. Whereas, you know, in these moments, like you said, of just like naturally high engagement … like it’s, you know, women’s health, you’re right, there’s a lot of them. Like if you’re pregnant, you’re very engaged. If you’re going through menopause, you’re very engaged. And I think there are other examples like this, you know, such as oncology. You get a cancer diagnosis, you’re very engaged. 

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.  

And, you know, one example I’ll give you in women’s health, I’m not invested in this company, sadly. They are called Midi Health (opens in new tab). And they’re really everywhere in the menopause area now, like, you know, the visit volume that they are seeing is just insane. You know, this is a population that is giant. It’s, like, one in two people are women. At some point, we pretty much all go through menopause, some people earlier, some later. 

And for a lot of us, it’s a really painful, disruptive thing to experience. And we tend to experience it at a moment when we actually have spending money. So it just ticks all the boxes. And yet I think because of the bias that we see, you know, in the venture land and in the startup world, we just couldn’t get on this opportunity for a really long time. So I’ve been very excited to see companies like that really have breakout success. 

LEE: First off, you know, I think in terms of hits and misses from our book. One hit is we did think a lot about the idea that patients directly would be empowered by AI. And, you know, we had a whole chapter on this, and it was something that I think has really turned out to be true, and I think it will become more true. But one big miss is we actually didn’t think about what we were just talking about, about like who and when would this happen? And the specific focus on women, women’s health, I think is something that we missed.  

And I think one of the reasons I sought you out for this conversation is if I remember your own personal history, you essentially transitioned from journalism to venture investing at about the same time that you yourself were having a very intense period of engagement with health because of your own pregnancy. And so if you don’t mind, I’d like to get into your own experience with healthcare through pregnancy, your own experiences raising children, and how that has informed your relationship with digital health and the investing and advising that you do today. 

FARR: Yeah, it’s great question. And I actually was somebody who, you know, wrote a lot while I was kind of on maternity leave about this experience because it was such a profound one. You know, I think the reason that pregnancy is so interesting to healthcare companies and systems is because really for a lot of women, it’s their first experience with the hospital.  

Most of us have never stayed in the hospital for any period of time until that moment. Both times I had C-sections, so I was there for a good three or four days. And, you know, I think it’s a really big opportunity for these systems, even if they lose money, many of them lose money on pregnancy, which is a whole different topic, but there is an opportunity to get a whole family on board and keep them kind of loyal. And a lot of that can come through, you know, just delivering an incredible service.  

Unfortunately, I don’t think that we are delivering incredible services today to women in this country. I see so much room for improvement. You know, you see, just look at the data. You see women, you know, still dying in childbirth in this country where in many other developed nations, that’s just no longer the case.  

LEE: Yeah. And what are, in your view, the prime opportunities or needs? What do we need to do if we have a focus on technology to improve that situation?  

FARR: Yeah, I mean, I think there’s definitely an opportunity for, you know, just digital technologies and for remote patient monitoring and just other forms of monitoring. I do think we should look at what other countries have done and really consider things like, you know, three days post-discharge, somebody comes to your home, you know, whether it’s to check on you from a healthcare perspective, both, you know, physical and mental health, but then also make sure that the environment is safe for both the mother and the baby. Simple things like that, that don’t even really require any technology.  

And then there’s certainly opportunities for new forms of, you know, diagnostic tests for things like preeclampsia, postpartum preeclampsia. We could definitely use some new therapeutics in this area. Then, you know, would love to kind of also touch on the opportunity in pediatrics because there I think is an ideal use case for AI. And that’s definitely my reality now. 

LEE: Well, fact, yeah, in fact, I hope I’m not delving into too many personal issues here. But I do remember, I think with your first child, which you had during the height of the COVID pandemic, that your child actually had COVID and actually even lost sense of taste and smell for a period. And, in our book, we had sort of theorized that people would turn possibly to AI for advice to understand what was going on.  

When you look broadly at the kinds of queries that come into a search engine or into something like ChatGPT or Copilot, you do see things along those lines. But at the same time, I had always thought people wouldn’t just use a raw chat bot for these things. People would want an app, perhaps powered by AI, that would be really designed for this. And yet somehow that seems not to be as widespread.  

FARR: Yeah. And I think the word app is a great one that I’d love to, you know, maybe interrogate a little bit because I think that we have been overly reliant on apps. I’ll give you an example. So in a pediatric space, I am a user of an app called Summer Health (opens in new tab) or it’s not an app. Sorry. It’s a text messaging service. [LAUGHTER] And this is the genius. So I just pick up my phone, and I text “Summer” and a pediatrician responds within a matter of minutes. And sometimes it’s a pediatric nurse, but it’s somebody who responds to me. And they say, oh, what’s going on? And I might say, OK, well, this week we had the norovirus. So these are the symptoms. And they might say, you know, I’d love to see an image or a video. And I can text that to them.  

And if a prescription is required, then that goes to a pharmacy near me through another digital application that’s really cool called Photon Health (opens in new tab), where my script is portable, so I can move it around based on what’s open.  

So, through this, I’m getting an incredible experience that’s the most convenient … 

LEE: Wow. 

FARR: I could ever ask for, and there is no app. [LAUGHS] And you could imagine the potential for AI. You know, a company like this is probably getting so many questions about a norovirus or COVID or RSV [Respiratory Syncytial Virus], and is, I’m sure, starting to think about kind of ways in which AI could be very useful in this regard. And you don’t need a pediatrician or pediatric nurse answering every question. Perhaps there’s like sophisticated triaging to determine which questions should go to the human expert.  

But, you know, again, back to this app question, like, I think we have too many. Like, it’s just … like from a user experience perspective, just having to find the app, log into the app. Sometimes there’s just layers of authentication. Then you have to remember your password. [LAUGHTER] And it’s just, you know, it’s just too many steps. And then there’s like 50 of them for all kinds of different things. 

LEE: Yes. Well, and you have to also go to an app store, download the thing.  

FARR: Go to the app store down. It’s just too many steps.  

LEE: Yes. 

FARR: So, like, I, you know, I recognize that HIPAA exists. If there is any kind of claim involved, then, you know, you need an app because you got privacy to think about and compliance, but like, in this wave of consumerization of healthcare, there’s a lot more that’s possible. And so I’d love to see people experimenting a bit more with the form factor. And I think once we do that, we could open up a lot more interesting applications with AI, because you’ll see so much more usage day to day than you will if you require any of this kind of gatekeeping with an app. 

LEE: It’s so interesting to hear you say this because one thing that I’ve thought—and I’ve actually even expressed publicly in some venues—is one logical endpoint for AI as we understand it today is that apps become unnecessary. We might still have machines that, you know, you hold in the palm of your hand, but it’s just a machine that does what you want it to do.  

Of course, the business model implications are pretty profound. So for that particular text messaging service, do you understand what their business model is? You know, how are they sustaining themselves? 

FARR: Consumer, it’s all cash pay. It’s cash pay. You just pay a subscription. And, you know, there are certainly kind of privacy requirements, you know, related to kind of federal and state, but you could consent to be able to do something like this. And, you know, companies like this have teams of lawyers that kind of think through how do you make something like this happen. But it’s possible because of this cash pay element that really underlies that. And I think that is a growing trend.  

You know, I was literally sitting with a benefits consultant a few weeks ago, and he was saying to me, like, “I tell all my friends and family, just don’t use your insurance at all, unless it’s for like a very high price thing, like a medical procedure that’s expensive or a surgery.” He said, for everything else, I just pay cash. I pay cash for all my primary care. I pay cash for, you know, basic generic, you know, prescription medications that, you know, it’s like a few cents to manufacture.  

And I’m sort of getting there, too, where I just kind of increasingly am relying on cash pay. And I think that sort of opens up a world of opportunity for just innovation related to user experience that could really bring us to this place that you mentioned where there is no app. You literally just text or, you know, you use your voice, and you say, “I need a restaurant reservation,” and it’s done.  

LEE: Mm-hmm. Yeah. 

FARR: And it’s that simple, right? And the sort of appification of everything, you know, was a important kind of evolution or moment in technology that is undeniable. But I totally agree with you that I think we might be moving past that. 

LEE: On this idea of cash, there is a little bit of a fatigue, on the other hand, with—for consumers; let me just speak as a consumer—I can’t keep track anymore of all the subscriptions I have. And so are we just trading one form of, you know, friction for another? 

FARR: Yeah, that’s a great point. But there are things that, you know, I think there are those moments where you continue to pay a subscription because it’s just something that’s chronic. You know, it’s just relevant to you. You know, pediatrics is a great example. At some point, like I won’t need a pediatrician on demand, which is what I have now, maybe when my kids are a little older, and we’re not just a cesspool of various kind of viruses at home. [LAUGHTER] But again, back to your point about, you know, the sort of moments of just, like, natural engagement, I think there’s also a moment there … there are areas or parts of our lives where, like primary care, where it’s just more longitudinal.  

And it makes sense to pay on a kind of subscription basis. Like our system is messed up because there’s just messed up incentives, right. And a subscription to me is very pure. [LAUGHTER] Like it’s you’re just saying, “I’m paying for a service that I want and need.” And then the company is saying, “OK, let me make this service as efficient and great and affordable for you as I possibly can.” And to me, that’s like a very, like refreshing trade. And I feel the same way, by the way, in my media business, which, you know, definitely has a subscription element. And it just means a lot when someone’s willing to say like this content’s worth paying for.  

LEE: Yes. 

FARR: It doesn’t work for everything, but I think it works for things that, you know, have that long-term payoff. 

LEE: Yeah, I really love that. And if I have one regret about the chapter on kind of the consumer experience from our book—I think all of this seems obvious in retrospect—you know, I wish we had tried to understand, you know, this aspect of the consumer experience, that people might actually have just online experiences that they would pay a monthly fee or an annual fee for. Because it also hits on another aspect of consumer, which is this broad—it’s actually now a national issue in healthcare—about price transparency.  

And this is another thing that I think you’ve thought about and written about, both the positives and negatives of this. I remember one blog post you made that talked about the issue of churn in digital health. And if I remember correctly, you weren’t completely certain that this was a good thing for the emerging digital health ecosystem. Can you say more about this idea of churn? 

FARR: Yeah, I mean, you know, I’ve been writing for a long time and thinking for a long time about the buyers of a lot of these kind of digital health companies, like who are the customers? And there was a long period where it was, it was really the self-insured employer, like Microsoft, being a sort of customer of these solutions because they wanted to provide a great array of health benefits for their own employees.  

And that was, you know, for a long time, like 10 or 15 years, you know, big companies that have now gone public, and it seemed like a faster timeline to be able to sell relative to health systems and, you know, health plans and other groups. And I’ve now kind of been on the forefront of saying that this channel is kind of dead. And one of the big reasons is just, you know, there’s no difference, I would say to what you see kind of in the payer lane, which is that churn is a big problem. People used to stay at jobs for 20, 30, 40 years, … 

LEE: Right. 

FARR: … and then you’d retire and have great benefits. And so it kind of made sense that your company was responsible for the healthcare that you received. And now I think the last time I looked at the Bureau of Labor Statistics, it’s around four years, a little bit less than four years. So what can you do in four years? [LAUGHS] 

I just read an interesting analysis on GLP-1s, these medications now that obviously are everywhere in tackling type two diabetes, and obesity is kind of the main, seems to be the hot use case. But, you know, I’m reading analysis around ROI that it’s 15, over 15 years, to see an ROI if you are, you know, a system or a plan or employer that chooses to pay for this. So how does that equate when you don’t keep an employee around for more than four?  

LEE: Yep. 

FARR: So I think it’s just left employers in a really bad place of having to make a bunch of tradeoffs and, you know, employees are demanding, we want access to these things. And they’re saying, well, our healthcare costs just keep going up and up and up. You know, we have inflation to contend with and we’re not seeing, you know, the analysis that it necessarily makes sense for us to do so. So that’s what I have, you know, been sort of harping on about with this churn issue that I’m seeing. 

LEE: Well, I have to tell you, it really, when I first started reading about this from you, it really had a profound impact on my thinking, my thought process. Because one of the things that we dream about is this idea that’s been present actually for decades in the healthcare world of this concept of real-world evidence, RWE. And that is this dream that now that we’ve digitized so much health experience, we should be able to turn all that digital data from people’s health experiences into new medical knowledge.  

But the issue of churn that I think that I would credit you introducing me to calls that into question because you’re right. Over a four-year period, you don’t get the longitudinal view of a person’s health that gives you the ability to get those medical insights. And so something needs to change there. But it’s very much tied to what consumers want to do. Consumers move around; they change jobs.  

FARR: Yes.  

LEE: If it’s cash-based, they’ll be shopping based on all sorts of things. And so it … 

FARR: And so the natural end of all this, it’s two words: single payer. [LAUGHS] But we don’t want to go there as a country. So, you know, it sort of left us in this kind of murky middle. And I think a lot about, kind of, what kind of system we’ll end up having. What I don’t think is possible is that this current one is sustainable.  

LEE: You know, I do think in terms of the payer of CMS [Centers for Medicare and Medicaid Services], Medicare and Medicaid services, the amount of influence that they exert on health spending in the US has been increasing steadily year by year. And in a sense, you could sort of squint and view that as a slow drift towards some element of single payer. But it’s definitely not so intentional or organized right now.  

While we’re talking about these sorts of trends, of course, another big trend is the graying of America. And we’re far from alone, China, and much of the Orient, Europe, UK, people are getting older. And from the consumer-patient perspective, this brings up the challenge, I think, that many people have in caring for elderly loved ones.  

And this seems to me, like women’s health, to be another area where if I were starting a new digital health company, I would think very seriously about that space because that’s another space where there can be extreme intensity of engagement with the healthcare system. Do you as both a human being and consumer but also as an investor, do you think about that space at all? 

FARR: Oh, yes, all the time. And I do think there’s incredible opportunity here.  

And it’s probably because of the same kind of biases that exist that, you know, didn’t allow us to see the menopause opportunity, I think we’re just not seeing this as being as big as it is. And like you said, it’s not just an American problem. It’s being felt across the world.  

And I do think that there are some, you know, I’ve seen some really interesting stuff lately. Was recently spending some time with a company called Cherish Health (opens in new tab) out of Boston, and they’re using AI and radar-based sensing technologies to just be able to stick a device and like really anywhere in the person’s home. And it just like passively is able to detect falls and also kind of monitor kind of basic health metrics. And because it’s radar, it can operate through walls. So even if you’re in the bathroom, it still works, which has been a big problem with a lot of these devices in the past.  

And then, you have to have really advanced kind of AI and, you know, this sort of technology to be able to glean whether it’s a true fall or, you know, that’s really, you need help or it’s, you know, just the person sitting down on the floor to play with their grandchild. So things like this are, they’re still early, but I think really exciting. And we’re going to see a lot more of that in addition to, you know, some really interesting companies that are trying to think more about sort of social needs that are not healthcare needs, but you know, this, this population needs care, like outside of just, you know, medical treatment. They oftentimes may be experiencing homelessness, they might experience food insecurity, there might be a lack of just caregivers in their life. And so, you know, there are definitely some really interesting businesses there, as well.  

And then kind of a, you know, another trend that I think we’ll see a lot more is that, you know, countries are freaking out about the lack of babies being born, which you need to be able to … you know, I recognize climate change is a huge issue, but you also need babies to be born to support this aging population. So I think we’re going to see, you know, a lot more interest from these administrations around, you know, both like child tax credits and various policies to support parents but then also IVF [in vitro fertilization] and innovation around technology in the fertility space.  

LEE: All right. So we’re starting to run towards the end of our time together. So I’d like to get into maybe a couple more provocative or, you know, kinds of questions. So first, and there’s one that’s a little bit dark and another that’s much lighter. So let me start with the darker one so we can have a chance to end on a lighter note. I think one of the most moving pieces I’ve read from you recently was the open letter to your kids about the assassination of Brian Thompson (opens in new tab), who’s a senior executive of UnitedHealth Group. And so I wonder if you’re willing to share, first off, what you wrote there and then why you felt it was important to do that. 

FARR: Yeah. So, you know, I thought about just not saying anything. That was my original intention because it was just, you know, that moment that it happened, it was just so hot button. And a lot of people have opinions, and Twitter was honestly a scary place, just with the things that people were saying about this individual, who, you know, I think just like had a family and friends and a lot of my network knew him and felt really personally impacted by this. And I, you know, it was just a really sad moment, I think, for a lot of reasons.  

And then I just kind of sat down one evening and I wrote this letter to my kids that basically tried to put a lot of this in context. Like what … why are people feeling this way about our healthcare system? You know, why was all this sort of vitriol being really focused on this one individual? And then, you know, I think one of the things I sort of argued in this letter was that there’s lots of ways to approach innovation in the space. You can do it from the outside in, or you can do it from the inside out.  

And I’ll tell you that a lot of like, I got a lot of emails that week from people who were working at health plans, like UnitedHealth employees, some of them in their 20s, you know, they were recent kind of grads who’d gone to work at this company. And they said, you know, I felt like I couldn’t tell my friends, kind of, where I worked that week. And I emailed back and said, “Look, you’re learning healthcare. You are in an incredible position right now. Like whether you choose to stay your current company or you choose to leave, like you, you understand like the guts and the bowels of healthcare because you’re working at the largest healthcare company in the world. So you’re in an enviable position. And I think you are going to be able to effect change, like, more so than anyone else.” And that was part of what I wrote in this letter, that, you know, we should all agree that the system is broken, and we could do better. Nothing about what happened was OK. And also, like, let’s admire our peers and colleagues that are going into the trenches to learn because I genuinely believe those are the people that, you know, have the knowledge and the contacts and the network to be able to really kind of get change moving along, such desperately needed change. 

LEE: All right. So now one thing I’ve been asking every guest is about the origin story with respect to your first encounter with generative AI. How did that happen, and what were your first sort of experiences like? You know, what emotionally, intellectually, what went through your mind? 

FARR: So probably my first experience was I was really struggling with the title for my book. And I told ChatGPT what my book was about and what I wanted the title to evoke and asked it for recommendations. And then, I thought the first, like, 20 were actually pretty good. And I was able to say, can you make it a bit more witty? Can you make it more funny? And it spat back out some quite decent titles. And then what was interesting is that it just got worse and worse, like, over time and just ended up, like, deeply cheesy. [LAUGHTER] 

And so it sort of both like made me think that this could be a really useful prompt for just brainstorming. But then either it does seem to be some weird thing with AI where, like the more you push it on the same question, it just, like, it doesn’t … it seems to have sparked the most creativity in the first few tries, and then it just gets worse. And maybe you know more about this than I would. You certainly know more about this than I do. But that’s been my kind of general experience of it thus far. 

LEE: Mm-hmm. But would you say you were more skeptical or awe-inspired? What were the emotions at that moment? 

FARR: Um, you know, it was better than, like, a lot of my ideas. [LAUGHTER] So I definitely felt like it was from that perspective very impressive. But then, you know, it seemed to have the same human, like I said, we all kind of run out of ideas at some point and, you know, it turns out, so do the machines.  

So that was interesting in and of itself. And I ended up picking, I think a title that was like sort of, you know, inspired by the AI suggestions, but was definitely had its own twist that was my own. 

LEE: Well, Chrissy, I’ve never known you as someone who runs out of ideas, but this has been just great. As always, I always learn a lot when I have a chance to interact with you or read your writings. And so, thank you again for joining. Just really, really appreciate it. 

FARR: Of course, and next time I want to have you on my podcast because I have a million questions for you, too.   

LEE: Sure, anytime. 

FARR: Amazing. OK, I’ll hold you to that. Thanks so much for having me on. 

[TRANSITION MUSIC] 

LEE: I’ve always been impressed not only with Chrissy’s breadth and depth of experience with the emerging tech trends that affect the health industry, but she’s also a connector to key decision-makers in nearly every sector of healthcare. This experience, plus her communication abilities, make it no surprise that she’s sought out for help in a range of go-to-market, investor relations, social media, content development, and communications issues. 

Maybe it shouldn’t be a surprise, but one thing I learned from our conversation is that the business of direct-to-consumer health is still emerging. It’s far from mature. And you can see that Chrissy and her venture-investing colleagues are still trying to figure out what works. Her discussion, for example, on cash-only health delivery and the idea that consumers might not want another app on their phones were indicative of that.  

Another takeaway is that some areas, such as pre- and postnatal care, menopause, elder care, and other types of what the health industry might call subacute care are potentially areas where not only AI might find the most impact but also where there’s sufficient engagement by consumers to make it possible to sustain the business. 

When Carey, Zak, and I started writing our book, one of the things that we started off with was based on a story that Zak had written concerning his 90-year-old mother. And of course, as I had said in an earlier episode of this podcast, that was something that really touched me because I was having a similar struggle with my father, who at the time was 89 years old. 

One of the things that was so difficult about caring for my father is that he was living in Los Angeles, and I was living up in the Pacific Northwest. And my two sisters also lived far away from Los Angeles, being in Pittsburgh and in Phoenix.  

And so as the three of us, my two sisters and I, tried to navigate a fairly complex healthcare system involving a primary care physician for my father plus two specialists, I have to say over a long period of illness, a lot of things happen, including the fraying of relationships between three siblings. What was so powerful for us, and this is where this idea of patient empowerment comes in, is when we could give all of the data, all of the reports from the specialist, from the primary care physician, other information, give it to GPT-4 and then just ask the question, “We’re about to have a 15-minute phone call with one of the specialists. What are the most important two or three things we should ask about?” Doing that just brings down the temperature, eliminates a potential source of conflict between siblings who are all just wanting to take care of their father. 

And so as we think about the potential of AI in medicine, this concept of patient empowerment, while we’ve learned in this episode, is still emerging, I think in the long run could be the most important long-term impact of this new age of AI. 

[THEME MUSIC]  

I’d like to say thank you again to Dave and Chrissy for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a discussion on regulations, norms, and ethics developing around AI and health. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 


The post Empowering patients and healthcare consumers in the age of generative AI appeared first on Microsoft Research.

]]>
Real-world healthcare AI development and deployment—at scale http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-real-world-healthcare-ai-development-and-deployment-at-scale/ Thu, 03 Apr 2025 13:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1135527 Microsoft’s Dr. Matthew Lungren and Epic’s Seth Hain discuss the challenges and opportunities of leveraging generative AI for enhanced patient care and improved clinical documentation and recordkeeping at scale—plus what’s next for the technology in the field.

The post Real-world healthcare AI development and deployment—at scale appeared first on Microsoft Research.

]]>
AI Revolution podcast | Episode 2 - Real-world healthcare AI development and deployment—at scale | outline illustration of Seth Hain, Peter Lee, Dr. Matthew Lungren

In November 2022, OpenAI’s ChatGPT kick-started a new era in AI. This was followed less than a half year later by the release of GPT-4. In the months leading up to GPT-4’s public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Dr. Matthew Lungren and Seth Hain (opens in new tab), leaders in the implementation of healthcare AI technologies and solutions at scale, join Lee to discuss the latest developments. Lungren, the chief scientific officer at Microsoft Health and Life Sciences, explores the creation and deployment of generative AI for automating clinical documentation and administrative tasks like clinical note-taking. Hain, the senior vice president of R&D at the healthcare software company Epic, focuses on the opportunities and challenges of integrating AI into electronic health records at global scale, highlighting AI-driven workflows, decision support, and Epic’s Cosmos project, which leverages aggregated healthcare data for research and clinical insights. 


Learn more:

Meet Microsoft Dragon Copilot: Your new AI assistant for clinical workflow 
Microsoft Industry Blog | March 2025 

Unlocking next-generation AI capabilities with healthcare AI models 
Microsoft Industry Blog | October 2024 

Multimodal Generative AI: the Next Frontier in Precision Health 
Microsoft Research Forum | March 2024 

An Introduction to How Generative AI Will Transform Healthcare with Dr. Matthew Lungren (opens in new tab) 
LinkedIn Learning 

AI for Precision Health 
Video | July 2023 

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning 
Publication | December 2017 

Epic Cosmos (opens in new tab) 
Homepage

The AI Revolution in Medicine: GPT-4 and Beyond 
Book | April 2023

Transcript

[MUSIC]  

[BOOK PASSAGE]   

PETER LEE: “It’s hard to convey the huge complexity of today’s healthcare system. Processes and procedures, rules and regulations, and financial benefits and risks all interact, evolve, and grow into a giant edifice of paperwork that is well beyond the capability of any one human being to master. This is where the assistance of an AI like GPT-4 can be not only useful—but crucial.”   

[END OF BOOK PASSAGE]  

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES] 

The passage I read at the top there is from Chapter 7 of the book, “The Ultimate Paperwork Shredder.”  

Paperwork plays a particularly important role in healthcare. It helps convey treatment information that supports patient care, and it’s also used to help demonstrate that providers are meeting regulatory responsibilities, among other things. But if we’re being honest, it’s taxing—for everyone—and it’s a big contributor to the burnout our clinicians are experiencing today. Carey, Zak, and I identified this specific pain point as one of the best early avenues to pursue as far as putting generative AI to good work in the healthcare space.  

In this episode, I’m excited to welcome Dr. Matt Lungren and Seth Hain to talk about matching technological advancements in AI to clinical challenges, such as the paperwork crisis, to deliver solutions in the clinic and in the health system back office.  

Matt is the chief scientific officer for Microsoft Health and Life Sciences, where he focuses on translating cutting-edge technology, including generative AI and cloud services, into innovative healthcare applications. He’s a clinical interventional radiologist and a clinical machine learning researcher doing collaborative research and teaching as an adjunct professor at Stanford University. His scientific work has led to more than 200 publications, including work on new computer vision and natural language processing approaches for healthcare.  

Seth is senior vice president of research and development at Epic, a leading healthcare software company specializing in electronic health record systems, also known as EHR, as well as other solutions for connecting clinicians and patients. During his 19 years at Epic, Seth has worked on enhancing the core analytics and other technologies in Epic’s platforms as well as their applications across medicine, bringing together his graduate training in mathematics and his dedication to better health.  

I’ve had the pleasure of working closely with both Matt and Seth. Matt, as a colleague here at Microsoft, really focused on our health and life sciences business. And Seth, as a collaborator at Epic, as we embark on the questions of how to integrate and deploy generative AI into clinical applications at scale.   

[TRANSITION MUSIC] 

Here’s my conversation with Dr. Matt Lungren:  

LEE: Matt, welcome. It’s just great to have you here. 

MATTHEW LUNGREN: Thanks so much, Peter. Appreciate being here. 

LEE: So, I’d like to just start just talking about you. You know, I had mentioned your role as the chief scientific officer for Microsoft Health and Life Sciences. Of course, that’s just a title. So, what the heck is that? What is your job exactly? And, you know, what does a typical day at work look like for you? 

LUNGREN: So, really what you could boil my work down to is essentially cross collaboration, right. We have a very large company, lots of innovation happening all over the place, lots of partners that we work with and then obviously this sort of healthcare mission.

And so, what innovations, what kind of advancements are happening that can actually solve clinical problems, right, and sort of kind of direct that. And we can go into some examples, you know, later. But then the other direction, too, is important, right. So, identifying problems that may benefit from a technologic application or solution and kind of translating that over into the, you know, pockets of innovation saying, “Hey, if you kind of tweaked it this way, this is something that would really help, you know, the clinical world.”  

And so, it’s really a bidirectional role. So, my day to day is … every day is a little different, to be honest with you. Some days it’s very much in the science and learning about new techniques. On the other side, though, it can be very much in the clinic, right. So, what are the pain points that we’re seeing? Where are the gaps in the solutions that we’ve already rolled out? And, you know, again, what can we do to make healthcare better broadly? 

LEE: So, you know, I think of you as a technologist, and, Matt, you and I actually are colleagues working together here at Microsoft. But you also do spend time in the clinic still, as well, is that right? 

LUNGREN: You know, initially it was kind of a … very much a non-negotiable for me … in sort of taking an industry role. I think like a lot of, you know, physicians, you know, we’re torn with the idea of like, hey, I spent 20 years training. I love what I do, you know, with a lot of caveats there in terms of some of the administrative burden and some of the hassle sometimes. But for the most part, I love what I do, and there’s no greater feeling than using something that you trained years to do and actually see the impact on a human life. It’s unbelievable, right.  

So, I think part of me was just, like, I didn’t want to let that part of my identity go. And frankly, as I often say, to this day, I walk by a fax machine in our office today, like in 2025.  

So just to be extra clear, it really grounds me in, like, yes, I love the possibilities. I love thinking about what we can do. But also, I have a very stark understanding of the reality on the ground, both in terms of the technology but also the burnout, right. The challenges that we’re facing in taking care of patients has gotten, you know, much, much more difficult in the last few years, and, you know, I like to think it keeps my perspective, yeah. 

LEE: You know, I think some listeners to this podcast might be surprised that we have doctors on staff in technical roles at Microsoft. How do you explain that to people? 

LUNGREN: [LAUGHS] Yeah, no, yeah, it is interesting. I would say that, you know, from, you know, the legacy Nuance [1] world, it wasn’t so far-fetched that you have physicians that were power users and eventually sort of, you know, became, “Hey, listen, I think this is a strategic direction; you should take it” or whatever. And certainly maybe in the last, I want to say, five years or so, I’ve seen more and more physicians who have, you know, taken the time, sometimes on their own, to learn some of the AI capabilities, learn some of the principles and concepts; and frankly, some are, you know, even coding solutions and leading companies.

So, I do think that that has shifted a bit in terms of like, “Hey, doctor, this is your lane, and over here, you know, here’s a technical person.” And I think that’s fused quite a bit more.  

But yeah, it is an unusual thing, I think, in sort of how we’ve constructed what at least my group does. But again, I can’t see any other way around some of the challenges.  

I think, you know, an anecdote I’d like to tell you, when I was running the AIMI [Artificial Intelligence in Medicine and Imaging] Center, you know, we were bringing the medical school together with the computer science department, right, at Stanford. And I remember one day a student, you know, very smart, came into my office, you know, a clinical day or something, and he’s like, is there just, like, a book or something where I can just learn medicine? Because, like, I feel like there’s a lot of, like, translation you have to do for me.  

It really raised an important insight, which is that you can learn the, you know, medicine, so to speak. You know, go to med school; you know, take the test and all that. But it really … you don’t really understand the practice of medicine until you are doing that.  

And in fact, I even push it a step further to say after training those first two or three years of … you are the responsible person; you can turn around, and there’s no one there. Like, you are making a decision. Getting used to that and then having a healthy respect for that actually I think provides the most educational value of anything in healthcare.  

LEE: You know, I think what you’re saying is so important because as I reflect on my own journey. Of course, I’m a computer scientist. I don’t have medical training, although at this point, I feel confident that I could pass a Step 1 medical exam.  

LUNGREN: I have no doubt. [LAUGHS] 

LEE: But I think that the tech industry, because of people like you, have progressed tremendously in having a more sophisticated and nuanced understanding of what actually goes on in clinic and also what goes on in the boardrooms of healthcare delivery organizations. And of course, at the end of the day, I think that’s really been your role.  

So roughly speaking, your job as an executive at a big tech company has been to understand what the technology platforms need to be, particularly with respect to machine learning, AI, and cloud computing, to best support healthcare. And so maybe let’s start pre-GPT-4, pre-ChatGPT, and tell us a little bit, you know, about maybe some of your proudest moments in getting advanced technologies like AI into the clinic. 

LUNGREN: You know, when I first started, so remember, like you go all the way back to about 2013, right, my first faculty job, and, you know, we’re building a clinical program and I, you know, I had a lot of interest in public health and building large datasets for pop [population] health, etc. But I was doing a lot of that, you know, sort of labeling to get those insights manually, right. So, like, I was the person that you’d probably look at now and say, “What are you doing?” Right?  

So … but I had a complete random encounter with Andrew Ng, who I didn’t know at the time, at Stanford. And I, you know, went to one of the seminars that he was holding at the Gates building, and, you know, they were talking about their performance on ImageNet. You know, cat and dog and, you know, tree, bush, whatever. And I remember sitting in kind of the back, and I think I maybe had my scrubs on at the time and just kind of like, what? Like, why … like, this … we could use this in healthcare, you know. [LAUGHS]  

But for me, it was a big moment. And I was like, this is huge, right. And as you remember, the deep learning really kind of started to show its stuff with, you know, Fei-Fei Li’s ImageNet stuff.

So anyway, we started the collaboration that actually became [AIMI]. And one of the first things we worked on, we just said, “Listen, one of the most common medical imaging examinations in the world is the chest x-ray.” Right? Two, three billion are done every year in the world, and so is that not a great place to start?

And of course, we had a very democratizing kind of mission. As you know, Andrew has done a lot of work in that space, and I had similar ambitions. And so, we really started to focus on bringing the, you know, the sort of the clinical and the CS together and see what could be done.  

So, we did CheXNet. And this is, remember this is around the time when, like, Geoffrey Hinton was saying things like we should stop training radiologists, and all this stuff was going on. [LAUGHTER] So there’s a lot of hype, and this is the narrow AI days just to remind the audience.  

LEE: How did you feel about that since you are a radiologist? 

LUNGREN: Well, it was so funny. So, Andrew is obviously very prolific on social media, and I was, who am I, right? So, I remember he tagged me. Well, first he said, “Matt, you need to get a Twitter account.” And I said OK. And he tagged me on the very first post of our, what we call, CheXNet that was kind of like the “Hello, World!” for this work.  

And I remember it was a clinical day. I had set my phone, as you do, outside the OR. I go in. Do my procedure. You know, hour or so, come back, my phone’s dead. I’m like, oh, that’s weird. Like I had a decent charge. So, you know, I plug it in. I turn it on. I had like hundreds of thousands of notifications because Andrew had tweeted out to his millions or whatever about CheXNet.  

And so, then of course, as you point out, I go to RSNA that year, which is our large radiology conference, and that Geoffrey Hinton quote had come out. And everyone’s looking at me like, “What are you doing, Matt?” You know, like, are you coming after our specialty? I’m like, “No, no,” that’s, [LAUGHS] you know, it’s a way to interpret it, but you have to take a much longer horizon view, right.  

LEE: Well, you know, we’re going to, just as an enticement for listeners to this podcast to listen to the very end, I’m going to pin you down toward the end on your assessment of whether Geoffrey Hinton will eventually be proven right or not. [LAUGHTER] But let’s take our time to get there.  

Now let’s go ahead and enter the generative AI era. When we were first exposed to what we now know of as GPT-4—this was before it was disclosed to the world—a small number of people at Microsoft and Microsoft Research were given access in order to do some technical assessment.  

And, Matt, you and I were involved very early on in trying to assess what might this technology mean for medicine. Tell us, you know, what was the first encounter with this new technology like for you?  

LUNGREN: It was the weirdest thing, Peter. Like … I joined that summer, so the summer before, you know, the actual GPT came out. I had literally no idea what I was getting into.  

So, I started asking it questions, you know, kind of general stuff, right. Just, you know, I was like, oh, all right, it’s pretty good. And so, then I would sort of go a little deeper. And eventually I got to the point where I’m asking questions that, you know, maybe there’s three papers on it in my community, and remember I’m a sub-sub specialist, right, pediatric interventional radiology. And the things that we do in vascular malformations and, you know, rare cancers are really, really strange and not very commonly known.  

And I kind of walked away from that—first I said, can I have this thing, right? [LAUGHS]  

But then I, you know, I don’t want to sound dramatic, but I didn’t sleep that well, if I’m being honest, for the first few nights. Partially because I couldn’t tell anybody, except for the few that I knew were involved, and partially because I just couldn’t wrap my head around how we went from what I was doing in LSTMs [long short-term memory networks], right, which was state of the artish at the time for NLP [natural language processing].  

And all of a sudden, I have this thing that is broadly, you know, domain experts, you know, representations of knowledge that there’s no way you could think of it would be in distribution for a normal approach to this.  

And so, I really struggled with it, honestly. Interpersonally, like, I would be like, uh, well, let’s not work on that. They’re like, why not? You were just excited about it last week. I’m like, I don’t know. I think that we could think of another approach later. [LAUGHS]  

And so yeah, when we were finally able to really look at some of the capabilities and really think clearly, it was really clear that we had a massive opportunity on our hands to impact healthcare in a way that was never possible before. 

LEE: Yeah, and at that time you were still a part of Nuance. Nuance, I think, was in the process of being acquired by Microsoft. Is that right?  

LUNGREN: That’s right.  

LEE: And so, of course, this was also a technology that would have profound and very direct implications for Nuance. How did you think about that? 

LUNGREN: Nuance, for those in the audience who don’t know, for 25 years was, sort of, the medical speech-to-text thing that all, you know, physicians used. But really the brass ring had always been … and I want to say going back to like 2013, 2014, Nuance had tried to figure out, OK, we see this pain point. Doctors are typing on their computers while they’re trying to talk to their patients, right.  

We should be able to figure out a way to get that ambient conversation turned into text that then, you know, accelerates the doctor … takes all the important information. That’s a really hard problem, right. You’re having a conversation with a patient about their knee pain, but you’re also talking about, you know, their cousin’s wedding and their next vacation and their dog is sick or whatever and all that gets recorded, right.  

And so, then you have to have the intelligence/context to be able to tease out what’s important for a note. And then it has to be at the performance level that a physician who, again, 20 years of training and education plus a huge, huge amount of, you know, need to get through his cases efficiently, that’s a really difficult problem.  

And so, for a long time, there was a human-in-the-loop aspect to doing this because you needed a human to say, “This transcript’s great, but here’s actually what needs to go on the note.” And that can’t scale, as you know.  

When the GPT-4, you know, model kind of, you know, showed what it was capable of, I think it was an immediate light bulb because there was no … you can ask any physician in your life, anyone in the audience, you know, what are your … what is the biggest pain point when you go to see your doctor? Like, “Oh, they don’t talk to me. They don’t look me in the eye. They’re rushing around trying to finish a note.”  

If we could get that off their plate, that’s a huge unlock, Peter. And I think that, again, as you know, it’s now led to so much more. But that was kind of the initial, I think, reaction. 

LEE: And so, maybe that gets us into our next set of questions, our next topic, which is about the book and all the predictions we made in the book. Because Carey, Zak, and I—actually we did make a prediction that this technology would have a huge impact on this problem of clinical note-taking.  

And so, you’re just right in the middle of that. You’re directly hands-on creating, I think, what is probably the most popular early product for doing exactly that. So, were we right? Were we wrong? What else do we need to understand about this? 

LUNGREN: No, you were right on. I think in the book, I think you called it like a paper shredder or something. I think you used a term like that. That’s exactly where the activity is right now and the opportunity.  

I’ve even taken that so far as to say that when folks are asking about what the technology is capable of doing, we say, well, listen, it’s going to save time before it saves lives. It’ll do both. But right now, it’s about saving time.  

It’s about peeling back the layers of the onion that if you, you know, put me in where I started medicine in 2003, and then fast-forward and showed me a day in the life of 2025, I would be shocked at what I was doing that wasn’t related to patient care, right. So, all of those layers that have been stacked up over the years, we can start finding ways to peel that back. And I think that’s exactly what we’re seeing.

And to your point, I think you mentioned this, too, which is, well, sure, we can do this transcript, and we can turn a note, but then we can do other things, right. We can summarize that in the patient’s language or education level of choice. We can pend orders. We can eventually get to a place of decision support. So, “Hey, did you think about this diagnosis, doctor?” Like those kinds of things.  

And all those things, I think you highlighted beautifully, and again, it sounds like with, you know, a lot of, right, just kind of guesswork and prediction, but those things are actually happening every single day right now.  

LEE: Well, so now, you know, in this episode, we’re really trying to understand, you know, where the technology industry is in delivering these kinds of things. And so from your perspective, you know, in the business that you’re helping to run here at Microsoft, you know, what are the things that are actually shipping as product versus things that clinicians are doing, let’s say, off label, just by using, say, ChatGPT on their personal mobile devices, and then what things aren’t happening? 

LUNGREN: Yeah. I’ll start with the shipping part because I think you, again, you know my background, right. Academic clinician, did a lot of research, hadn’t had a ton of product experience.  

In other words, like, you know, again, I’m happy to show you what benchmarks we beat or a new technique or, you know, get a grant to do all this, or even frankly, you know, talk about startups. But to actually have an audience that is accustomed to a certain level of performance for the solutions that they use, to be able to deliver something new at that same level of expectation, wow, that’s a big deal.  

And again, this is part of the learning by, you know, kind of being around this environment that we have, which is we have this, you know, incredibly focused, very experienced clinical product team, right.

And then I think on the other side, to your point about the general-purpose aspect of this, it’s no secret now, right, that, you know, this is a useful technology in a lot of different medical applications. And let’s just say that there’s a lot of knowledge that can be used, particularly by the physician community. And I think the most recent survey I saw (opens in new tab) was from the British Medical Journal, which said, hey, you know, which doctors are using … are you willing to tell us, you know, what you’re doing? And it turns out that folks are, what, 30% or so said that they were using it regularly in clinic [2]. And again, this is the general, this is the API or whatever off the shelf.

And then frankly, when they ask what they’re using it for, tends to be things like, “Hey, differential, like, help me fill in my differential or suggest … ” and to me, I think what that created, at least—and you’re starting to see this trend really accelerate in the US especially—is, well, listen, we can’t have everybody pulling out their laptops and potentially exposing, you know, patient information by accident or something to a public API.  

We have to figure this out, and so brilliantly, I think NYU [New York University] was one of the first. Now I think there’s 30 plus institutions that said, listen, “OK, we know this is useful to the entire community in the healthcare space.” Right? We know the administrators and nurses and everybody thinks this is great.  

We can’t allow this sort of to be a very loosey-goosey approach to this, right, given this sort of environment. So, what we’ll do is we’ll set up a HIPAA-compliant instance to allow anyone in the community—you know, in the health system—to use the models, and then whatever, the newest model comes, it gets hosted, as well.  

And what’s cool about that—and that’s happened now a lot of places—is that at the high level … first of all, people get to use it and experiment and learn. But at the high level, they’re actually seeing what are the common use cases. Because you could ask 15 people and you might get super long lists, and it may not help you decide what to operationalize in your health system.  

LEE: But let me ask you about that. When you observe that, are there times when you think, “Oh, some specific use cases that we’re observing in that sort of organic way need to be taken into specialized applications and made into products?” Or is it best to keep these things sort of, you know, open-chat-interface types of general-purpose platform?  

LUNGREN: Honestly, it’s both, and that’s exactly what we’re seeing. I’m most familiar with Stanford, kind of, the work that Nigam Shah leads on this. But he, he basically, … you know, there’s a really great paper that is coming out in JAMA, but basically saying, “Here’s what our workforce is using it for. Here are the things in the literature that would suggest what would be popular.”  

And some of those line up, like helping with a clinical diagnosis or documentation, but some of them don’t. But for the most part, the stuff that flies to the top, those are opportunities to operationalize and productize, etc. And I think that’s exactly what we’re seeing. 

LEE: So, let’s get into some of the specific predictions. We’ve, I think, beaten note-taking to death here. But there’s other kinds of paperwork, like filling out prior authorization request forms or referral letters, an after-visit note or summary to give instructions to patients, and so on. And these were all things that we were making guesses in our book might be happening. What’s the reality there? 

LUNGREN: I’ve seen every single one of those. In fact, I’ve probably seen a dozen startups too, right, doing exactly those things. And, you know, we touched a little bit on translation into the actual clinic. And that’s actually another thing that I used to kind of underappreciate, which is that, listen, you can have a computer scientist and a physician or nurse or whatever, like, give the domain expertise, and you think you’re ready to build something.  

The health IT [LAUGHS] is another part of that Venn diagram that’s so incredibly critical, and then exactly how are you going to bring that into the system. That’s a whole new ballgame. 

And so I do want to do a callout because the collaboration that we have with Epic is monumental because here, you have the system of record that most physicians, at least in the US, use. And they’re going to use an interface and they’re going to have an understanding of, hey, we know these are pain points, and so I think there’s some really, really cool, you know, new innovations that are coming out of the relationship that we have with Epic. And certainly the audience may be familiar with those, that I think will start to knock off a lot of the things that you predicted in your book relatively soon. 

LEE: I think most of the listeners to this podcast will know what Epic is. But for those that are unfamiliar with the health industry, and especially the technology foundation, Epic is probably the largest provider of electronic health record systems. And, of course, in collaboration with you and your team, they’ve been integrating generative AI quite a bit. Are there specific uses that Epic is making and deploying that get you particularly excited? 

LUNGREN: First of all, the ambient note generation, by the way, is integrated into Epic now. So like, you know, it’s not another screen, another thing for physicians. So that’s a huge, huge unlock in terms of the translation.

But then Epic themselves, so they have, I guess, on the last roadmap that they talked [about], more than 60, but the one that’s kind of been used now is this inbox response. 

So again, maybe someone might not be familiar with, why is it such a big deal? Well, if you’re a physician, you already have, you know, 20 patients to see that day and you got all those notes to do, and then Jevons paradox, right. So if you give me better access to my doctor, well, maybe I won’t make an appointment. I’m just going to send him a note and this is kind of this inbox, right.  

So then at the end of my day, I got to get all my notes done. And then I got to go through all the inbox messages I’ve received from all of my patients and make sure that they’re not like having chest pain and they’re blowing it off or something.  

Now that’s a lot of work and the cold start problem of like, OK, I to respond to them. So Epic has leveraged this system to say, “Let me just draft a note for you,” understanding the context of, you know, what’s going on with the patient, etc. And you can edit that and sign it, right. So you can accelerate some of those … so that’s probably one I’m most excited about. But there’s so many right now. 

LEE: Well, I think I need to let you actually state the name of the clinical note-taking product that you’re associated with. Would you like to do that? [LAUGHS] 

LUNGREN: [LAUGHS] Sure. Yeah, it’s called DAX Copilot [3]. And for the record, it is the fastest-growing copilot in the Microsoft ecosystem. We’re very proud of that. Five hundred institutions already are using it, and millions of notes have already been created with it. And the feedback has been tremendous.

LEE: So, you sort of referred to this a little bit, you know, this idea of AI being a second set of eyes. So, doctor makes some decisions in diagnosis or kind of working out potential treatments or medication decisions. And in the book, you know, we surmise that, well, AI might not replace the doctor doing those things. It could but might not. But AI could possibly reduce errors if doctors and nurses are making decisions by just looking at those decisions and just checking them out. Is that happening at all, and what do you see the future there? 

LUNGREN: Yeah, I would say, you know, that’s kind of the jagged edge of innovation, right, where sometimes the capability gets ahead of the ability to, you know, operationalize that. You know, part of that is just related to the systems. The evidence has been interesting on this. So, like, you know this, our colleague Eric Horvitz has been doing a lot of work in sort of looking at physician, physician with GPT-4, let’s say, and then GPT-4 alone for a whole variety of things. You know, we’ve been saying to the world for a long time, particularly in the narrow AI days, that AI plus human is better than either alone. We’re not really seeing that bear out really that well yet in some of the research.  

But it is a signal to me and to the use case you’re suggesting, which is that if we let this system, in the right way, kind of handle a lot of the safety-net aspects of what we do but then also potentially take on some of the things that maybe are not that challenging or at least somewhat simple.  

And of course, this is really an interesting use case in my world, in the vision world, which is that we know these models are multimodal, right. They can process images and text. And what does that look like for pathologists or radiologists, where we do have a certain percentage of the things we look at in a given day are normal, right? Or as close to normal as you can imagine. So is there a way to do that? And then also, by the way, have a safety net.  

And so I think that this is an extremely active area right now. I don’t think we’ve figured out exactly how to have the human and AI model interact in this space yet. But I know that there’s a lot of attempts at it right now. 

LEE: Yeah, I think, you know, this idea of a true copilot, you know, a true collaborator, you know, I think is still something that’s coming. I think we’ve had a couple of decades of people being trained to think of computers as question-answering machines. Ask a question, get an answer. Provide a document, get a summary. And so on.  

But the idea that something might actually be this second set of eyes just assisting you all day continuously, I think, is a new mode of interaction. And we haven’t quite figured that out.  

Now, in preparation for this podcast, Matt, you said that you actually used AI to assist you in getting ready. [LAUGHS] Would you like to share what you learned by doing that? 

LUNGREN: Yeah, it’s very funny. So, like, you may have heard this term coined by Ethan Mollick called the “secret cyborg,” (opens in new tab) which is sort of referring to the phenomena of folks using GPT, realizing it can actually help them a ton in all kinds of parts of their work, but not necessarily telling anybody that they’re using it, right.  

And so in a similar secret cyborgish way, I was like, “Well, listen, you know, I haven’t read your book in like a year. I recommend it to everybody. And [I need] just a refresher.” So what I did was I took your book, I put it into GPT-4, OK, and asked it to sort of talk about the predictions that you made.  

And then I took that and put it in the stronger reasoning model—in this case, the “deep research” that you may have just seen or heard of and the audience from OpenAI—and asked it to research all the current papers, you know, and blogs and whatever else and tell me like what was right, what was wrong in terms of the predictions. [LAUGHS]  

So it, actually, it was an incredible thing. It’s a, like, what, six or seven pages. It probably would have taken me two weeks, frankly, to do this amount of work.  

LEE: I’ll be looking forward to reading that in the New England Journal of Medicine shortly. 

LUNGREN: [LAUGHS] That’s right. Yeah, no, don’t, before this podcast comes out, I’ll submit it as an opinion piece. No. [LAUGHS] But, yeah, but I think on balance, incredibly insightful views. And I think part of that was, you know, your team that got together really had a lot of different angles on this. But, you know, and I think the only area that was, like, which I’ve observed as well, it’s just, man, this can do a lot for education.  

We haven’t seen … I don’t think we’re looking at this as a tutor. To your point, we’re kind of looking at it as a transactional in and out. But as we’ve seen in all kinds of data, both in low-, middle-income countries and even in Harvard, using this as a tutor can really accelerate your knowledge and in profound ways.  

And so that is probably one area where I think your prediction was maybe slightly even further ahead of the curve because I don’t think folks have really grokked that opportunity yet. 

LEE: Yeah, and for people who haven’t read the book, you know, the guess was that you might use this as a training aid if you’re an aspiring doctor. For example, you can ask GPT-4 to pretend to be a patient that presents a certain way and that you are the doctor that this patient has come to see. And so you have an interaction. And then when you say end of encounter, you ask GPT-4 to assess how well you did. And we thought that this might be a great training aid, and to your point, it seems not to have materialized.  

LUNGREN: There’s some sparks. You know, with, like, communication, end-of-life conversations that no physician loves to have, right. It’s very, very hard to train someone in those. I’ve seen some work done, but you’re right. It’s not quite hit mainstream yet. 

LEE: On the subject of things that we missed, one thing that you’ve been very, very involved in in the last several months has been in shipping products that are multimodal. So that was something I think that we missed completely. What is the current state of affairs for multimodal, you know, healthcare AI, medical AI? 

LUNGREN: Yeah, the way I like to explain it—and first of all, no fault to you, but this is not an area that, like, we were just so excited about the text use cases that I can’t fault you. But yeah, I mean, so if we look at healthcare, right, how we take care of patients today, as you know, the vast majority of the data in terms of just data itself is actually not in text, right. It’s going be in pathology and genomics and radiology, etc.  

And it seems like an opportunity here to watch this huge curve just goes straight up in the general reasoning and frankly medical competency and capabilities of the models that are coming and continue to come but then to see that it’s not as proficient for medical-specific imaging and video and, you know, other data types. And that gap is, kind of, what I describe as the multimodal medical AI gap.  

We’re probably in GPT-2 land, right, for this other modality types versus the, you know, we’re now at o3, who knows where we’re going to go. At least in our view, we can innovate in that space.  

How do we help bring those innovations to the broader community to close that gap and see some of these use cases really start to accelerate in the multimodal world?  

And I think we’ve taken a pretty good crack at that. A lot of that is credit to the innovative work. I mean, MSR [Microsoft Research] was two or three years ahead of everyone else on a lot of this. And so how do we package that up in a way that the community can actually access and use? And so, we took a lot of what your group had done in, let’s just say, radiology or pathology in particular, and say, “OK, well, let’s put this in an ecosystem of other models.” Other groups can participate in this, but let’s put it in a platform where maybe I’m really competent in radiology or pathology. How do I connect those things together? How do I bring the general reasoner knowledge into a multimodal use case?  

And I think that’s what we’ve done pretty well so far. We have a lot of work to do still, but this is very, very exciting. We’re seeing just such a ton of interest in building with the tools that we put out there. 

LEE: Well, I think how rapidly that’s advancing has been a surprise to me. So I think we’re running short on time. So two last questions to wrap up this conversation. The first one is, as we think ahead on AI in medicine, what do you think will be the biggest changes or make the biggest differences two years from now, five years from now, 10 years from now?

LUNGREN: This is really tough. OK. I think the two-year timeframe, I think we will have some autonomous agent-based workflows for a lot of the … what I would call undifferentiated heavy lifting in healthcare.  

And this is happening in, you know, the pharmaceutical industry, the payer … every aspect is sort of looking at their operations at a macro level: where are these big bureaucratic processes that largely involve text and where can we shrink those down and really kind of unlock a lot of our workforce to do things that might be more meaningful to the business? I think that’s my safe one.  

Going five years out, you know, I have a really difficult time grappling with this seemingly shrinking timeline to AGI [artificial general intelligence] that we hear from people who I would respect and certainly know more than me. And in that world, I think there’s only been one paper that I’ve seen that has attempted to say, what does that mean in healthcare (opens in new tab) when we have this?  

And the fact is, I actually don’t know. [LAUGHS] I wonder whether there’ll still be a gap in some modalities. Maybe there’ll be the ability to do new science, and all kinds of interesting things will come of that.  

But then if you go all the way to your 10-year, I do feel like we’re going to have systems that are acting autonomously in a variety of capacities, if I’m being honest.  

What I would like to see if I have any influence on some of this is, can we start to celebrate the closing of hospitals instead of opening them? Meaning that, can we actually start to address—at a personal, individual level—care? And maybe that’s outside the home, maybe that’s, you know, in a way that doesn’t have to use so many resources and, frankly, really be very reactive instead of proactive.  

I really want to see that. That’s been the vision of precision medicine for, geez, 20-plus years. I feel like we’re getting close to that being something we can really tackle. 

LEE: So, we talked about Geoff Hinton and his famous prediction that we would soon not have human radiologists. And of course, maybe he got the date wrong. So, let’s reset the date to 2028. So, Matt, do you think Geoff is right or wrong? 

LUNGREN: [LAUGHS] Yeah, so the way … I’m not going to dodge the question, but let me just answer this a different way.  

We have a clear line of sight to go from images to draft reports. That is unmistakable. And that’s now in 2025. How it will be implemented and what the implications of that will be, I think, will be heavily dependent on the health system or the incentive structure for where it’s deployed.  

So, if I’m trying to take a step back, back to my global health days, man, that can’t come fast enough. Because, you know, you have entire health systems, you know, in fact entire countries that have five, you know, medical imaging experts for the whole country, but they still need this to you know take care of patients.  

Zooming in on today’s crisis in the US, right, we have the burnout crisis just as much as the doctors who are seeing patients and write notes. We can’t keep up with the volume. In fact, we’re not training folks fast enough, so there is a push pull; there may be a flip to your point of autonomous reads across some segments of what we do.  

By 2028, I think that’s a reasonable expectation that we’ll have some form of that. Yes. 

LEE: I tend to agree, and I think things get reshaped, but it seems very likely that even far into the future we’ll have humans wanting to take care of other humans and be taken care of by humans.  

Matt, this has been a fantastic conversation, and, you know, I feel it’s always a personal privilege to have a chance to work with someone like you so keep it up. 

[TRANSITION MUSIC] 

LUNGREN: Thank you so much, Peter. Thanks for having me. 

LEE: I’m always so impressed when I talk to Matt, and I feel lucky that we get a chance to work together here at Microsoft. You know, one of the things that always strikes me whenever I talk to him is just how disruptive generative AI has been to a business like Nuance. Nuance has had clinical note-taking as part of their product portfolio for a long, long time. And so, you know, when generative AI comes along, it’s not only an opportunity for them, but also a threat because in a sense, it opens up the possibility of almost anyone being able to make clinical note-taking capabilities into products.  

It’s really interesting how Matt’s product, DAX Copilot, which since the time that we had our conversation has expanded into a full healthcare workflow product called Dragon Copilot, has really taken off in the marketplace and how many new competing AI products have also hit the market, and all in just two years, because of generative AI.  

The other thing, you know, that I always think about is just how important it is for these kinds of systems to work together and especially how they integrate into the electronic health record systems. This is something that Carey, Zak, and I didn’t really realize fully when we wrote our book. But you know, when you talk to both Matt and Seth, of course, we see how important it is to have that integration.  

Finally, what a great example of yet another person who is both a surgeon and a tech geek. [LAUGHS] People sometimes think of healthcare as moving very slowly when it comes to new technology, but people like Matt are actually making it happen much more quickly than most people might expect.  

Well, anyway, as I mentioned, we also had a chance to talk to Seth Hain, and so here’s my conversation with Seth:

LEE: Seth, thank you so much for joining.  

SETH HAIN: Well, Peter, it’s such an exciting time to sit down and talk about this topic. So much has changed in the last two years. Thanks for inviting me.  

LEE: Yeah, in fact, I think in a way both of our lives have been upended in many ways by the emergence of AI. [LAUGHTER]  

The traditional listeners of the Microsoft Research Podcast, I think for the most part, aren’t steeped in the healthcare industry. And so maybe we can just start with two things. One is, what is Epic, really? And then two, what is your job? What does the senior vice president for R&D at Epic do every day? 

HAIN: Yeah, well, let’s start with that first question. So, what is Epic? Most people across the world experience Epic through something we call MyChart. They might use it to message their physician. They might use it to check the lab values after they’ve gotten a recent test. But it’s an app on their phone, right, for connecting in with their doctors and nurses and really making them part of the care team.  

But the software we create here at Epic goes beyond that. It’s what runs in the clinic, what runs at the bedside, in the back office to help facilitate those different pieces of care, from collecting vital information at the bedside to helping place orders if you’re coming in for an outpatient visit, maybe with a kiddo with an earache, and capturing that note and record of what happened during that encounter, all the way through back-office encounters, back-office information for interacting with payers as an example.  

And so, we provide a suite of software that health systems and increasingly a broader set of the healthcare ecosystem, like payers and specialty diagnostic groups, use to connect with that patient at the center around their care. 

And my job is to help our applications across the company take advantage of those latest pieces of technology to help improve the efficiency of folks like clinicians in the exam room when you go in for a visit. We’ll get into, I imagine, some use cases like ambient conversations, capturing that conversation in the exam room to help drive some of that documentation.  

But then providing that platform for those teams to build those and then strategize around what to create next to help both the physicians be efficient and also the health systems. But then ultimately continuing to use those tools to advance the science of medicine. 

LEE: Right. You know, one thing that I explain to fellow technologists is that I think today health records are almost entirely digital. I think the last figures I saw is well over 99% of all health records are digital.  

But in the year 2001, fewer than 15% of health records were digital. They were literally in folders on paper in storerooms, and if you’re old enough, you might even remember seeing those storerooms.  

So, it’s been quite a journey. Epic and Epic’s competitors—though I think Epic is really the most important company—have really moved the entire infrastructure of record keeping and other communications in healthcare to a digital foundation.  

And I think one thing we’ll get into, of course, one of the issues that has really become, I think, a problem for doctors and nurses is the kind of clerical or paperwork, record-keeping, burden. And for that reason, Epic and Epic systems end up being a real focus of attention. And so, we’ll get into that in a bit here.  

HAIN: And I think that hits, just to highlight it, on both sides. There is both the need to capture documentation; there’s also the challenge in reviewing it.  

LEE: Yes.  

HAIN: The average medical record these days is somewhere between the length of Fahrenheit 451 and To Kill a Mockingbird. [LAUGHTER] So there’s a fair amount of effort going in on that review side, as well. 

LEE: Yeah, indeed. So much to get into there. But I would like to talk about encounters with AI. So obviously, I think there are two eras here: before the emergence of ChatGPT and what we now call of as generative AI and afterwards. And so, let’s take the former.  

Of course, you’ve been thinking about machine learning and health data probably for decades. Do you have a memory of how you got into this? Why did you get an interest in data analytics and machine learning in the first place? 

HAIN: Well, my background, as you noted, is in mathematics before I came to Epic. And the sort of patterns and what could emerge were always part of what drove that. Having done development and kind of always been around computers all my life, it was a natural transition as I came here.  

And I started by really focusing on, how do we scale systems for the very largest organizations, making sure they are highly available and also highly responsive? Time is critical in these contexts in regards to rapidly getting information to doctors and nurses.  

And then really in the, say, in the 2010s, there started to be an emergence of capabilities from a storage and compute perspective where we could begin to build predictive analytics models. And these were models that were very focused, right. It predicted the likelihood somebody would show up for an appointment. It predicted the likelihood that somebody may fall during an inpatient stay, as an example.  

And I think a key learning during that time period was thinking through the full workflow. What information was available at that point in time, right? At the moment somebody walks into the ED [emergency department], you don’t have a full picture to predict the likelihood that they may deteriorate during an inpatient encounter.  

And in addition to what information was available was, what can you do about it? And a key part of that was how do we help get the right people in the right point in time at the bedside to make an assessment, right? It was a human-in-the-loop type of workflow where, for example, you would predict deterioration in advance and have a nurse come to the bedside or a physician come to the bedside to assess.  

And I think that combination of narrowly focused predictive models with an understanding that to have them make an impact you had to think through the full workflow of where a human would make a decision was a key piece. 

LEE: Obviously there is a positive human impact. And so, for sure, part of the thought process for these kinds of capabilities comes from that.  

But Epic is also a business, and you have to worry about, you know, what are doctors and clinics and healthcare systems willing to buy. And so how do you balance those two things, and do those two things ever come into conflict as you’re imagining what kinds of new capabilities and features and products to create? 

HAIN: Two, sort of, two aspects I think really come to mind. First off, generally speaking, we see analytics and AI as a part of the application. So, in that sense, it’s not something we license separately. We think that those insights and those pieces of data are part of what makes the application meaningful and impactful.  

At the scale that many of these health systems operate and the number of patients that they care for, as well as having tens of thousands of users in the system daily, one needs to think about the compute overhead … 

LEE: Yes. 

HAIN: … that these things cause. And so, in that regard, there is always a ROI assessment that is taking place to some degree around, what happens if this runs at full scale? And in a way, that really got accelerated as we went into the generative AI era.  

LEE: Right. OK. So, you mentioned generative AI. What was the first encounter, and what was that experience for you?

HAIN: So, in the winter of ’22 and into 2023, I started experimenting alongside you with what we at that time called DV3, or Davinci 3, and eventually became GPT-4. And immediately, a few things became obvious. The tool was highly general purpose. One was able to, in putting in a prompt, have it sort of convert into the framing and context of a particular clinical circumstance and reason around that context. But I think the other thing that started to come to bear in that context was there was a fair amount of latent knowledge inside of it that was very, very different than anything we’d seen before. And, you know, there’s some examples from the Sparks of AGI paper from Microsoft Research, where a series of objects end up getting stacked together in the optimal way to build height. Just given the list of objects, it seems to have a understanding of physical space that it intuited from the training processes we hadn’t seen anywhere. So that was an entirely new capability that programmers now had access to.  

LEE: Well in fact, you know, I think that winter of 2022, and we’ll get into this, one of your projects that you’ve been running for quite a few years is something called Cosmos (opens in new tab), which I find exceptionally interesting. And I was motivated to understand whether this type of technology could have an impact there.  

And so, I had to receive permission from both OpenAI and Microsoft to provide you with early access.  

When I did first show this technology to you, you must have had an emotional response, either skepticism or … I can’t imagine you just trusted, you know, trusted me to the extent of believing everything I was telling you. 

HAIN: I think there’s always a question of, what is it actually, right? It’s often easy to create demos. It’s often easy to show things in a narrow circumstance. And it takes getting your hands on it and really spending your 10,000 hours digging in and probing it in different ways to see just how general purpose it was.  

And so, the skepticism was really around, how applicable can this be broadly? And I think the second question—and we’re starting to see this play out now in some of the later models—was, is this just a language thing? Is it narrowly only focused on that? Or can we start to imagine other modalities really starting to factor into this? How will it impact basic sciences? Those sorts of things.

On a personal note, I mean, I had, at that point, now they’re now 14 and 12, two kids that I wondered, what did this mean for them? What is the right thing for them to be studying? And so I remember sleepless nights on that topic, as well. 

LEE: OK, so now you get early access to this technology; you’re able to do some experimentation. I think one of the things that impressed me is just less than four months later at the major health tech industry conference, HIMSS, which also happened timing-wise to take place just after the public disclosure of GPT-4, Epic showed off some early prototype applications of generative AI. And so, describe what those were, and how did you choose what to try to do there? 

HAIN: Yeah, and we were at that point, we actually had the very first users live on that prototype, on that early version.  

And the key thing we’d focused on—we started this development in very, very late December, January of 2023—was a problem that its origins really were during the pandemic.  

So, during the pandemic, we started to see patients increasingly messaging their providers, nurses, and clinicians through MyChart, that patient portal I mentioned with about 190 million folks on it. And as you can imagine, that was a great opportunity in the context of COVID to limit the amount of direct contact between providers and patients while still getting their questions answered.  

But what we found as we came out of the pandemic was that folks preferred it regardless. And that messaging volume had stayed very, very high and was a time-consuming effort for folks.  

And so, the first use case we came out with was a draft message in the context of the message from the patient and understanding of their medical history using that medical record that we talked about.  

And the nurse or physician using the tool had two options. They could either click to start with that draft and edit it and then hit send, or they could go back to the old workflow and start with a blank text box and write it from their own memory as they preferred.

And so that was that very first use case. There were many more that we had started from a development perspective, but, yeah, we had that rolling out right in March of 2023 there with the first folks. 

LEE: So, I know from our occasional discussions that some things worked very well. In fact, this is a real product now for Epic. And it seems to be really a very, very popular feature now. I know from talking to you that a lot of things have been harder. And so, I’d like to dive into that. As a developer, tech developer, you know, what’s been easy, what’s been hard, what’s in your mind still is left to do in terms of the development of AI? 

HAIN: Yeah. You know, the first thing that comes to mind sort of starting foundationally, and we hinted at this earlier in our conversation, was at that point in time, it was kind of per a message, rather compute-intensive to run these. And so, there were always trade-offs we were making in regards to how many pieces of information we would send into the model and how much would we request back out of it.  

The result of that was that while kind of theoretically or even from a research perspective, we could achieve certain outcomes that were quite advanced, one had to think about, where you make those trade-offs from a scalability perspective as you wanted to roll that out to lot of folks. So … 

LEE: Were you charging your customers more money for this feature? 

HAIN: Yeah, essentially the way that we handle that is there’s compute that’s required. As I mentioned, the feature is just part of our application. So, it’s just what they get with an upgrade.  

But that compute overhead is something that we needed to pass through to them. And so, it was something, particularly given both the staffing challenges, but also the margin pressures that health systems are feeling today, we wanted to be very cautious and careful about. 

LEE: And let’s put that on the stack because I do want to get into, from the selling perspective, that challenge and how you perceive health systems as a customer making those trade-offs. But let’s continue on the technical side here. 

HAIN: Yeah. On the technical side, it was a consideration, right. We needed to be thoughtful about how we used them. But going up a layer in the stack, at that time, there’s a lot of conversation in the industry around something called RAG, or retrieval-augmented generation.  

And the idea was, could you pull the relevant bits, the relevant pieces of the chart, into that prompt, that information you shared with the generative AI model, to be able to increase the usefulness of the draft that was being created? And that approach ended up proving and continues to be to some degree, although the techniques have greatly improved, somewhat brittle, right. You have a general-purpose technology that is drafting the response. 

But in many ways, you needed to, for a variety of pragmatic reasons, have somewhat brittle capability in regards to what you pulled into that approach. It tended to be pretty static. And I think this becomes one of the things that, looking forward, as these models have gotten a lot more efficient, we are and will continue to improve upon because, as you get a richer and richer amount of information into the model, it does a better job of responding.  

I think the third thing, and I think this is going to be something we’re going to continue to work through as an industry, was helping users understand and adapt to these circumstances. So many folks when they hear AI think, it will just magically do everything perfectly.  

And particularly early on with some of those challenges we’re talking about, it doesn’t. You know, if it’s helpful 85% of the time, that’s great, but it’s not going to be 100% of the time. And it’s interesting as we started, we do something we call immersion, where we always make sure that developers are right there elbow to elbow with the users of the software. 

And one of the things that I realized through that experience with some of the very early organizations like UCSD [UC San Diego] or University of Wisconsin here in Madison was that even when I’m responding to an email or a physician is responding to one of these messages from a patient, depending on the patient and depending on the person, they respond differently.  

In that context, there’s opportunity to continue to mimic that behavior as we go forward more deeply. And so, you learn a lot about, kind of, human behavior as you’re putting these use cases out into the world. 

LEE: So, you know, this increasing burden of electronic communications between doctors, nurses, and patients is centered in one part of Epic. I think that’s called your in-basket application, if I understand correctly.  

HAIN: That’s correct. 

LEE: But that also creates, I think, a reputational risk and challenge for Epic because as doctors feel overburdened by this and they’re feeling burnt out—and as we know, that’s a big issue—then they point to, you know, “Oh, I’m just stuck in this Epic system.”  

And I think a lot of the dissatisfaction about the day-to-day working lives of doctors and nurses then focuses on Epic. And so, to what extent do you see technologies like generative AI as, you know, a solution to that or contributing either positively or negatively to this? 

HAIN: You know, earlier I made the comment that in December, as we started to explore this technology, we realized there were a class of problems that now might have solutions that never did before.  

And as we’ve started to dig into those—and we now have about 150 different use cases that are under development, many of which are live across … we’ve got about 350 health systems using them—one of the things we’ve started to find is that physicians, nurses, and others start to react to saying it’s helping them move forward with their job.  

And examples of this, obviously the draft of the in-basket message response is one, but using ambient voice recognition as a kind of new input into the software so that when a patient and a physician sit down in the exam room, the physician can start a recording and that conversation then ends up getting translated or summarized, if you will, including using medical jargon, into the note in the framework that the physician would typically write.  

Another one of those circumstances where they then review it, don’t need to type it out from scratch, for example, …  

LEE: Right. 

HAIN: … and can quickly move forward.  

I think looking forward, you know, you brought up Cosmos earlier. It’s a suite of applications, but at its core is a dataset of about 300 million de-identified patients. And so using generative AI, we built research tools on top of it. And I bring that up because it’s a precursor of how that type of deep analytics can be put into context at the point of care. That’s what we see this technology more deeply enabling in the future. 

LEE: Yeah, when you are creating … so you said there are about 150 sort of integrations of generative AI going into different parts of Epic’s software products.  

When you are doing those developments and then you’re making a decision that something is going to get deployed, one thing that people might worry about is, well, these AI systems hallucinate. They have biases. There are unclear accountabilities, you know, maybe patient expectations.  

For example, if there’s a note drafted by AI that’s sent to a patient, does the patient have a right to know what was written by AI and what was written by the human doctor? So, can we run through how you have thought about those things?  

HAIN: I think one thing that is important context to set here for folks, and I think it’s often a point of confusion when I’m chatting with folks in public, is that their interaction with generative AI is typically through a chatbot, right. It’s something like ChatGPT or Bing or one of these other products where they’re essentially having a back-and-forth conversation. 

LEE: Right. 

HAIN: And that is a dramatically different experience than how we think it makes sense to embed into an enterprise set of applications.  

So, an example use case may be in the back office, there are folks that are coding encounters. So, when a patient comes in, right, they have the conversation with the doctor, the doctor documents it, that encounter needs to be billed for, and those folks in the back-office associate to that encounter a series of codes that provide information about how that billing should occur.

So, one of the things we did from a workflow perspective was add a selector pane to the screen that uses generative AI to suggest a likely code. Now, this suggestion runs the risk of hallucination. So, the question is, how do you build into the workflow additional checks that can help the user do that?  

And so in this context, we always include a citation back to the part of the medical record that justifies or supports that code. So quickly on hover, the user can see, does this make sense before selecting it? And it’s those types of workflow pieces that we think are critical to using this technology as an aid to helping people make decisions faster, right. It’s similar to drafting documentation that we talked about earlier.  

And it’s interesting because there’s a series of patterns that are … going back to the AI Revolution book you folks wrote two years ago. Some of these are really highlighted there, right. This idea of things like a universal translator is a common pattern that we ended up applying across the applications. And in my mind, translation, this may sound a little bit strange, but summarization is an example of translating a very long series of information in a medical record into the context that an ED physician might care about, where they have three or four minutes to quick review that very long chart.  

And so, in that perspective, and back to your earlier comment, we added the summary into the workflow but always made sure that the full medical record was available to that user, as well. So, a lot of what we’ve done over the last couple of years has been to create a series of repeatable techniques in regards to both how to build the backend use cases, where to pull the information, feed it into the generative AI models.  

But then I think more importantly are the user experience design patterns to help mitigate those risks you talked about and to maintain consistency across the integrated suite of applications of how those are deployed.  

LEE: You might remember from our book, we had a whole chapter on reducing paperwork, and I think that’s been a lot of what we’ve been talking about. I want to get beyond that, but before transitioning, let’s get some numbers.  

So, you talked about messages drafted to patients, to be sent to patients. So, give a sense of the volume of what’s happening right now. 

HAIN: Oh, we are seeing across the 300 and, I think it’s, 48 health systems that are now using generative AI—and to be clear, we have about 500 health systems we have the privilege of working with, each with many, many hospitals—there are tens of thousands of physicians and nurses using the software. That includes drafting million-plus, for example, notes a month at this point, as well as helping to generate in a similar ballpark that number of responses to patients.  

The thing I’m increasingly excited about is the broader set of use cases that we’re seeing folks starting to deploy now. One of my favorites has been … it’s natural that as part of, for example, a radiology workflow, in studying that image, the radiologist made note that it would be worth double checking, say in six to eight months, that the patient have this area scanned of their chest. Something looks a little bit fishy there, but there’s not … 

LEE: There’s not a definitive finding yet. 

HAIN: … there’s not a definitive finding at that point. Part of that workflow is that the patient’s physician place an order for that in the future. And so, we’re using generative AI to note that back to the physician. And with one click, allow them to place that order, helping that patient get better care.  

That’s one example of dozens of use cases that are now live, both to help improve the care patients are getting but also help the workforce. So going back to the translation-summarization example, a nurse at the end of their shift needs to write up a summary of that shift for the next nurse for each … 

LEE: Right. 

HAIN: … each patient that they care for. Well, they’ve been documenting information in the chart over those eight or 12 hours, right.  

LEE: Yep, yep. 

HAIN: So, we can use that information to quickly draft that end-of-shift note for the nurse. They can verify it with those citations we talked about and make any additions or edits that they need and then complete their end of day far more efficiently.  

LEE: Right. OK. So now let’s get to Cosmos, which has been one of these projects that I think has been your baby for many years and has been something that has had a profound impact on my thinking about possibilities. So first off, what is Cosmos? 

HAIN: Well, just as an aside, I appreciate the thoughtful comments. There is a whole team of folks here that are really driving these projects forward. And a large part of that has been, as you brought up, both Cosmos as a foundational capability but then beginning to integrate it into applications. And that’s what those folks spend time on.  

Cosmos is this effort across hundreds of health systems that we have the privilege of working with to build out a de-identified dataset with today—and it climbs every day—but 300 million unique patient records in it.  

And one of the interesting things about that structure is that, for example, if I end up in a hospital in Seattle and have that encounter documented at a health system in Seattle, I still—a de-identified version of me—still only shows up once in Cosmos, stitching together both my information from here in Madison, Wisconsin, where Epic is at, with that extra data from Seattle. The result is these 300 million unique longitudinal records that have a deep history associated with them.  

LEE: And just to be clear, a patient record might have hundreds or even thousands of individual, I guess what you would call, clinical records or elements. 

HAIN: That’s exactly right. It’s the breadth of information from orders and allergies and blood pressures collected, for example, in an outpatient setting to cancer staging information that might have come through as part of an oncology visit. And it’s coming from a variety of sources. We exchange information about 10 million times a day between different health systems. And that full picture is available within Cosmos in that way of the patient. 

LEE: So now why? Why Cosmos? 

HAIN: Why Cosmos? Well, the real ultimate aim is to put a deeply informed in-context perspective at the point of care. So, as a patient, if I’m in the exam room, it’s helpful for the physician and me to know what have similar patients like me experienced in this context. What was the result of that line of treatment, for example? 

Or as a doctor, if I’m looking and working through a relatively rare or strange case to me, I might be able to connect with—this as an example workflow we built called Look-Alikes—with another physician who has seen similar patients or within the workflow see a list of likely diagnoses based on patients that have been in a similar context. And so, the design of Cosmos is to put those insights into the point of care in the context of the patient.  

To facilitate those steps there, the first phase was building out a set of research tooling. So, we see dozens of papers a year being published by the health systems that we work with. Those that participate in Cosmos have access to it to do research on it. And so they use both a series of analytical and data science tools to do that analysis and then publish research. So, building up trust that way.  

LEE: The examples you gave are, like with Look-Alikes, it’s very easy, I think, for people outside of the healthcare world to imagine how that could be useful. So now why is GPT-4 or any generative AI relevant to this? 

HAIN: Well, so a couple of different pieces, right. Earlier we talked about—and I think this is the most important—how generative AI is able to cast things into a specific context. And so, in that way, we can use these tools to help both identify a cohort of patients similar to you when you’re in the exam room. And then also help present that information back in a way that relates to other research and understandings from medical literature to understand what are those likely outcomes.  

I think more broadly, these tools and generative AI techniques in the transformer architecture envision a deeper understanding of sequences of events, sequences of words. And that starts to open up broader questions about what can really be understood about patterns and sequences of events in a patient’s journey.  

Which if you didn’t know, the name Epic, just like a great long nation’s journey is told through an epic story, is a patient’s story. So that’s where it came from. 

LEE: So, we’re running up against our time together. And I always like to end with a more provocative question.  

HAIN: Certainly. 

LEE: And for you, I wanted to raise a question that I think we had asked ourselves in the very earliest days that we were sharing Davinci 3, what we now know of as GPT-4, with each other, which is, is there a world in the future because of AI where we don’t need electronic health records anymore? Is there a world in the future without EHR? 

HAIN: I think it depends on how you define EHR. I see a world coming where we need to manage a hybrid workforce, where there is a combination of humans and something folks are sometimes calling agents working in concert together to care for more and more of our … of the country and of the world. And there is and will need to be a series of tools to help orchestrate that hybrid workforce. And I think things like EHRs will transform into helping that operate … be operationally successful.  

But as a patient, I think there’s a very different opportunity that starts to be presented. And we’ve talked about kind of understanding things deeply in context. There’s also a real acceleration happening in science right now. And the possibility of bringing that second- and third-order effects of generative AI to the point of care, be that through the real-world evidence we were talking about with Cosmos or maybe personalized therapies that really are well matched to that individual. These generative AI techniques open the door for that, as well as the full lifecycle of managing that from a healthcare perspective all the way through monitoring after the fact.  

And so, I think we’ll still be recording people’s stories. Their stories are relevant to them, and they can help inform the bigger picture. But I think the real question is, how do you put those in a broader context? And these tools open the door for a lot more. 

LEE: Well, that’s really a great vision for the future.  

[TRANSITION MUSIC] 

Seth, I always really learn so much talking to you, and thank you so much for this great chat. 

HAIN: Thank you for inviting me.   

LEE: I see Seth as someone on the very leading frontier of bringing generative AI to the clinic and into the healthcare back office and at the full scale of our massive healthcare system. It’s always impressive to me how thoughtful Seth has had to be about how to deploy generative AI into a clinical setting.  

And, you know, one thing that sticks out—and he made such a point of this—is, you know, generative AI in the clinical setting isn’t just a chatbot. They’ve had to really think of other ways that will guarantee that the human stays in the loop. And that’s of course exactly what Carey, Zak, and I had predicted in our book. In fact, we even had a full chapter of our book entitled “Trust but Verify,” which really spoke to the need in medicine to always have a human being directly involved in overseeing the process of healthcare delivery. 

One technical point that Carey, Zak, and I completely missed, on the other hand, in our book, was the idea of something that Seth brought up called RAG, which is retrieval-augmented generation. That’s the idea of giving AI access to a database of information and allowing it to use that database as it constructs its answers. And we heard from Seth how fundamental RAG is to a lot of the use cases that Epic is deploying. 

And finally, I continue to find Seth’s project called Cosmos to be a source of inspiration, and I’ve continued to urge every healthcare organization that has been collecting data to consider following a similar path. 

In our book, we spent a great deal of time focusing on the possibility that AI might be able to reduce or even eliminate a lot of the clerical drudgery that currently exists in the delivery of healthcare. We even had a chapter entitled “The Paperwork Shredder.” And we heard from both Matt and Seth that that has indeed been the early focus of their work.  

But we also saw in our book the possibility that AI could provide diagnoses, propose treatment options, be a second set of eyes to reduce medical errors, and in the research lab be a research assistant. And here in Epic’s Cosmos, we are seeing just the early glimpses that perhaps generative AI can actually provide new research possibilities in addition to assistance in clinical decision making and problem solving. On the other hand, that still seems to be for the most part in our future rather than something that’s happening at any scale today. 

But looking ahead to the future, we can still see the potential of AI helping connect healthcare delivery experiences to the advancement of medical knowledge. As Seth would say, the ability to connect bedside to the back office to the bench. That’s a pretty wonderful future that will take a lot of work and tech breakthroughs to make it real. But the fact that we now have a credible chance of making that dream happen for real, I think that’s pretty wonderful. 

[MUSIC TRANSITIONS TO THEME] 

I’d like to say thank you again to Matt and Seth for sharing their experiences and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a look at how patients are using generative AI for their own healthcare, as well as an episode on the laws, norms, and ethics developing around AI and health, and more. We hope you’ll continue to tune in.

Until next time.

[MUSIC FADES] 

[1] A provider of conversational, ambient, and generative AI, Nuance was acquired by Microsoft in March 2022 (opens in new tab). Nuance solutions and capabilities are now part of Microsoft Cloud for Healthcare.

[2] According to the survey (opens in new tab), of the 20% of respondents who said they use generative AI in clinical practice, 29% reported using the technology for patient documentation and 28% said they use it for differential diagnosis.

[3] A month after the conversation was recorded, Microsoft Dragon Copilot was unveiled. Dragon Copilot combines and extends the capabilities of DAX Copilot and Dragon Medical One.


The post Real-world healthcare AI development and deployment—at scale appeared first on Microsoft Research.

]]>