The AI Revolution in Medicine, Revisited Archives - Microsoft Research http://approjects.co.za/?big=en-us/research/podcast-series/the-ai-revolution-in-medicine-revisited/ Tue, 08 Apr 2025 00:46:53 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 Real-world healthcare AI development and deployment—at scale http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-real-world-healthcare-ai-development-and-deployment-at-scale/ Thu, 03 Apr 2025 13:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1135527 Microsoft’s Dr. Matthew Lungren and Epic’s Seth Hain discuss the challenges and opportunities of leveraging generative AI for enhanced patient care and improved clinical documentation and recordkeeping at scale—plus what’s next for the technology in the field.

The post Real-world healthcare AI development and deployment—at scale appeared first on Microsoft Research.

]]>
AI Revolution podcast | Episode 2 - Real-world healthcare AI development and deployment—at scale | outline illustration of Seth Hain, Peter Lee, Dr. Matthew Lungren

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this episode, Dr. Matthew Lungren (opens in new tab) and Seth Hain (opens in new tab), leaders in the implementation of healthcare AI technologies and solutions at scale, join Lee to discuss the latest developments. Lungren, the chief scientific officer at Microsoft Health and Life Sciences, explores the creation and deployment of generative AI for automating clinical documentation and administrative tasks like clinical note-taking. Hain, the senior vice president of R&D at the healthcare software company Epic, focuses on the opportunities and challenges of integrating AI into electronic health records at global scale, highlighting AI-driven workflows, decision support, and Epic’s Cosmos project, which leverages aggregated healthcare data for research and clinical insights. 


Learn more:

Meet Microsoft Dragon Copilot: Your new AI assistant for clinical workflow 
Microsoft Industry Blog | March 2025 

Unlocking next-generation AI capabilities with healthcare AI models 
Microsoft Industry Blog | October 2024 

Multimodal Generative AI: the Next Frontier in Precision Health 
Microsoft Research Forum | March 2024 

An Introduction to How Generative AI Will Transform Healthcare with Dr. Matthew Lungren (opens in new tab) 
LinkedIn Learning 

AI for Precision Health 
Video | July 2023 

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning 
Publication | December 2017 

Epic Cosmos (opens in new tab) 
Homepage

The AI Revolution in Medicine: GPT-4 and Beyond 
Book | April 2023

Transcript

[MUSIC]  

[BOOK PASSAGE]   

PETER LEE: “It’s hard to convey the huge complexity of today’s healthcare system. Processes and procedures, rules and regulations, and financial benefits and risks all interact, evolve, and grow into a giant edifice of paperwork that is well beyond the capability of any one human being to master. This is where the assistance of an AI like GPT-4 can be not only useful—but crucial.”   

[END OF BOOK PASSAGE]  

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.

[THEME MUSIC FADES] 

The passage I read at the top there is from Chapter 7 of the book, “The Ultimate Paperwork Shredder.”  

Paperwork plays a particularly important role in healthcare. It helps convey treatment information that supports patient care, and it’s also used to help demonstrate that providers are meeting regulatory responsibilities, among other things. But if we’re being honest, it’s taxing—for everyone—and it’s a big contributor to the burnout our clinicians are experiencing today. Carey, Zak, and I identified this specific pain point as one of the best early avenues to pursue as far as putting generative AI to good work in the healthcare space.  

In this episode, I’m excited to welcome Dr. Matt Lungren and Seth Hain to talk about matching technological advancements in AI to clinical challenges, such as the paperwork crisis, to deliver solutions in the clinic and in the health system back office.  

Matt is the chief scientific officer for Microsoft Health and Life Sciences, where he focuses on translating cutting-edge technology, including generative AI and cloud services, into innovative healthcare applications. He’s a clinical interventional radiologist and a clinical machine learning researcher doing collaborative research and teaching as an adjunct professor at Stanford University. His scientific work has led to more than 200 publications, including work on new computer vision and natural language processing approaches for healthcare.  

Seth is senior vice president of research and development at Epic, a leading healthcare software company specializing in electronic health record systems, also known as EHR, as well as other solutions for connecting clinicians and patients. During his 19 years at Epic, Seth has worked on enhancing the core analytics and other technologies in Epic’s platforms as well as their applications across medicine, bringing together his graduate training in mathematics and his dedication to better health.  

I’ve had the pleasure of working closely with both Matt and Seth. Matt, as a colleague here at Microsoft, really focused on our health and life sciences business. And Seth, as a collaborator at Epic, as we embark on the questions of how to integrate and deploy generative AI into clinical applications at scale.   

[TRANSITION MUSIC] 

Here’s my conversation with Dr. Matt Lungren:  

LEE: Matt, welcome. It’s just great to have you here. 

MATTHEW LUNGREN: Thanks so much, Peter. Appreciate being here. 

LEE: So, I’d like to just start just talking about you. You know, I had mentioned your role as the chief scientific officer for Microsoft Health and Life Sciences. Of course, that’s just a title. So, what the heck is that? What is your job exactly? And, you know, what does a typical day at work look like for you? 

LUNGREN: So, really what you could boil my work down to is essentially cross collaboration, right. We have a very large company, lots of innovation happening all over the place, lots of partners that we work with and then obviously this sort of healthcare mission.

And so, what innovations, what kind of advancements are happening that can actually solve clinical problems, right, and sort of kind of direct that. And we can go into some examples, you know, later. But then the other direction, too, is important, right. So, identifying problems that may benefit from a technologic application or solution and kind of translating that over into the, you know, pockets of innovation saying, “Hey, if you kind of tweaked it this way, this is something that would really help, you know, the clinical world.”  

And so, it’s really a bidirectional role. So, my day to day is … every day is a little different, to be honest with you. Some days it’s very much in the science and learning about new techniques. On the other side, though, it can be very much in the clinic, right. So, what are the pain points that we’re seeing? Where are the gaps in the solutions that we’ve already rolled out? And, you know, again, what can we do to make healthcare better broadly? 

LEE: So, you know, I think of you as a technologist, and, Matt, you and I actually are colleagues working together here at Microsoft. But you also do spend time in the clinic still, as well, is that right? 

LUNGREN: You know, initially it was kind of a … very much a non-negotiable for me … in sort of taking an industry role. I think like a lot of, you know, physicians, you know, we’re torn with the idea of like, hey, I spent 20 years training. I love what I do, you know, with a lot of caveats there in terms of some of the administrative burden and some of the hassle sometimes. But for the most part, I love what I do, and there’s no greater feeling than using something that you trained years to do and actually see the impact on a human life. It’s unbelievable, right.  

So, I think part of me was just, like, I didn’t want to let that part of my identity go. And frankly, as I often say, to this day, I walk by a fax machine in our office today, like in 2025.  

So just to be extra clear, it really grounds me in, like, yes, I love the possibilities. I love thinking about what we can do. But also, I have a very stark understanding of the reality on the ground, both in terms of the technology but also the burnout, right. The challenges that we’re facing in taking care of patients has gotten, you know, much, much more difficult in the last few years, and, you know, I like to think it keeps my perspective, yeah. 

LEE: You know, I think some listeners to this podcast might be surprised that we have doctors on staff in technical roles at Microsoft. How do you explain that to people? 

LUNGREN: [LAUGHS] Yeah, no, yeah, it is interesting. I would say that, you know, from, you know, the legacy Nuance [1] world, it wasn’t so far-fetched that you have physicians that were power users and eventually sort of, you know, became, “Hey, listen, I think this is a strategic direction; you should take it” or whatever. And certainly maybe in the last, I want to say, five years or so, I’ve seen more and more physicians who have, you know, taken the time, sometimes on their own, to learn some of the AI capabilities, learn some of the principles and concepts; and frankly, some are, you know, even coding solutions and leading companies.

So, I do think that that has shifted a bit in terms of like, “Hey, doctor, this is your lane, and over here, you know, here’s a technical person.” And I think that’s fused quite a bit more.  

But yeah, it is an unusual thing, I think, in sort of how we’ve constructed what at least my group does. But again, I can’t see any other way around some of the challenges.  

I think, you know, an anecdote I’d like to tell you, when I was running the AIMI [Artificial Intelligence in Medicine and Imaging] Center, you know, we were bringing the medical school together with the computer science department, right, at Stanford. And I remember one day a student, you know, very smart, came into my office, you know, a clinical day or something, and he’s like, is there just, like, a book or something where I can just learn medicine? Because, like, I feel like there’s a lot of, like, translation you have to do for me.  

It really raised an important insight, which is that you can learn the, you know, medicine, so to speak. You know, go to med school; you know, take the test and all that. But it really … you don’t really understand the practice of medicine until you are doing that.  

And in fact, I even push it a step further to say after training those first two or three years of … you are the responsible person; you can turn around, and there’s no one there. Like, you are making a decision. Getting used to that and then having a healthy respect for that actually I think provides the most educational value of anything in healthcare.  

LEE: You know, I think what you’re saying is so important because as I reflect on my own journey. Of course, I’m a computer scientist. I don’t have medical training, although at this point, I feel confident that I could pass a Step 1 medical exam.  

LUNGREN: I have no doubt. [LAUGHS] 

LEE: But I think that the tech industry, because of people like you, have progressed tremendously in having a more sophisticated and nuanced understanding of what actually goes on in clinic and also what goes on in the boardrooms of healthcare delivery organizations. And of course, at the end of the day, I think that’s really been your role.  

So roughly speaking, your job as an executive at a big tech company has been to understand what the technology platforms need to be, particularly with respect to machine learning, AI, and cloud computing, to best support healthcare. And so maybe let’s start pre-GPT-4, pre-ChatGPT, and tell us a little bit, you know, about maybe some of your proudest moments in getting advanced technologies like AI into the clinic. 

LUNGREN: You know, when I first started, so remember, like you go all the way back to about 2013, right, my first faculty job, and, you know, we’re building a clinical program and I, you know, I had a lot of interest in public health and building large datasets for pop [population] health, etc. But I was doing a lot of that, you know, sort of labeling to get those insights manually, right. So, like, I was the person that you’d probably look at now and say, “What are you doing?” Right?  

So … but I had a complete random encounter with Andrew Ng, who I didn’t know at the time, at Stanford. And I, you know, went to one of the seminars that he was holding at the Gates building, and, you know, they were talking about their performance on ImageNet. You know, cat and dog and, you know, tree, bush, whatever. And I remember sitting in kind of the back, and I think I maybe had my scrubs on at the time and just kind of like, what? Like, why … like, this … we could use this in healthcare, you know. [LAUGHS]  

But for me, it was a big moment. And I was like, this is huge, right. And as you remember, the deep learning really kind of started to show its stuff with, you know, Fei-Fei Li’s ImageNet stuff.

So anyway, we started the collaboration that actually became a NIDUS. And one of the first things we worked on, we just said, “Listen, one of the most common medical imaging examinations in the world is the chest x-ray.” Right? Two, three billion are done every year in the world, and so is that not a great place to start?

And of course, we had a very democratizing kind of mission. As you know, Andrew has done a lot of work in that space, and I had similar ambitions. And so, we really started to focus on bringing the, you know, the sort of the clinical and the CS together and see what could be done.  

So, we did CheXNet. And this is, remember this is around the time when, like, Geoffrey Hinton was saying things like we should stop training radiologists, and all this stuff was going on. [LAUGHTER] So there’s a lot of hype, and this is the narrow AI days just to remind the audience.  

LEE: How did you feel about that since you are a radiologist? 

LUNGREN: Well, it was so funny. So, Andrew is obviously very prolific on social media, and I was, who am I, right? So, I remember he tagged me. Well, first he said, “Matt, you need to get a Twitter account.” And I said OK. And he tagged me on the very first post of our, what we call, CheXNet that was kind of like the “Hello, World!” for this work.  

And I remember it was a clinical day. I had set my phone, as you do, outside the OR. I go in. Do my procedure. You know, hour or so, come back, my phone’s dead. I’m like, oh, that’s weird. Like I had a decent charge. So, you know, I plug it in. I turn it on. I had like hundreds of thousands of notifications because Andrew had tweeted out to his millions or whatever about CheXNet.  

And so, then of course, as you point out, I go to RSNA that year, which is our large radiology conference, and that Geoffrey Hinton quote had come out. And everyone’s looking at me like, “What are you doing, Matt?” You know, like, are you coming after our specialty? I’m like, “No, no,” that’s, [LAUGHS] you know, it’s a way to interpret it, but you have to take a much longer horizon view, right.  

LEE: Well, you know, we’re going to, just as an enticement for listeners to this podcast to listen to the very end, I’m going to pin you down toward the end on your assessment of whether Geoffrey Hinton will eventually be proven right or not. [LAUGHTER] But let’s take our time to get there.  

Now let’s go ahead and enter the generative AI era. When we were first exposed to what we now know of as GPT-4—this was before it was disclosed to the world—a small number of people at Microsoft and Microsoft Research were given access in order to do some technical assessment.  

And, Matt, you and I were involved very early on in trying to assess what might this technology mean for medicine. Tell us, you know, what was the first encounter with this new technology like for you?  

LUNGREN: It was the weirdest thing, Peter. Like … I joined that summer, so the summer before, you know, the actual GPT came out. I had literally no idea what I was getting into.  

So, I started asking it questions, you know, kind of general stuff, right. Just, you know, I was like, oh, all right, it’s pretty good. And so, then I would sort of go a little deeper. And eventually I got to the point where I’m asking questions that, you know, maybe there’s three papers on it in my community, and remember I’m a sub-sub specialist, right, pediatric interventional radiology. And the things that we do in vascular malformations and, you know, rare cancers are really, really strange and not very commonly known.  

And I kind of walked away from that—first I said, can I have this thing, right? [LAUGHS]  

But then I, you know, I don’t want to sound dramatic, but I didn’t sleep that well, if I’m being honest, for the first few nights. Partially because I couldn’t tell anybody, except for the few that I knew were involved, and partially because I just couldn’t wrap my head around how we went from what I was doing in LSTMs [long short-term memory networks], right, which was state of the artish at the time for NLP [natural language processing].  

And all of a sudden, I have this thing that is broadly, you know, domain experts, you know, representations of knowledge that there’s no way you could think of it would be in distribution for a normal approach to this.  

And so, I really struggled with it, honestly. Interpersonally, like, I would be like, uh, well, let’s not work on that. They’re like, why not? You were just excited about it last week. I’m like, I don’t know. I think that we could think of another approach later. [LAUGHS]  

And so yeah, when we were finally able to really look at some of the capabilities and really think clearly, it was really clear that we had a massive opportunity on our hands to impact healthcare in a way that was never possible before. 

LEE: Yeah, and at that time you were still a part of Nuance. Nuance, I think, was in the process of being acquired by Microsoft. Is that right?  

LUNGREN: That’s right.  

LEE: And so, of course, this was also a technology that would have profound and very direct implications for Nuance. How did you think about that? 

LUNGREN: Nuance, for those in the audience who don’t know, for 25 years was, sort of, the medical speech-to-text thing that all, you know, physicians used. But really the brass ring had always been … and I want to say going back to like 2013, 2014, Nuance had tried to figure out, OK, we see this pain point. Doctors are typing on their computers while they’re trying to talk to their patients, right.  

We should be able to figure out a way to get that ambient conversation turned into text that then, you know, accelerates the doctor … takes all the important information. That’s a really hard problem, right. You’re having a conversation with a patient about their knee pain, but you’re also talking about, you know, their cousin’s wedding and their next vacation and their dog is sick or whatever and all that gets recorded, right.  

And so, then you have to have the intelligence/context to be able to tease out what’s important for a note. And then it has to be at the performance level that a physician who, again, 20 years of training and education plus a huge, huge amount of, you know, need to get through his cases efficiently, that’s a really difficult problem.  

And so, for a long time, there was a human-in-the-loop aspect to doing this because you needed a human to say, “This transcript’s great, but here’s actually what needs to go on the note.” And that can’t scale, as you know.  

When the GPT-4, you know, model kind of, you know, showed what it was capable of, I think it was an immediate light bulb because there was no … you can ask any physician in your life, anyone in the audience, you know, what are your … what is the biggest pain point when you go to see your doctor? Like, “Oh, they don’t talk to me. They don’t look me in the eye. They’re rushing around trying to finish a note.”  

If we could get that off their plate, that’s a huge unlock, Peter. And I think that, again, as you know, it’s now led to so much more. But that was kind of the initial, I think, reaction. 

LEE: And so, maybe that gets us into our next set of questions, our next topic, which is about the book and all the predictions we made in the book. Because Carey, Zak, and I—actually we did make a prediction that this technology would have a huge impact on this problem of clinical note-taking.  

And so, you’re just right in the middle of that. You’re directly hands-on creating, I think, what is probably the most popular early product for doing exactly that. So, were we right? Were we wrong? What else do we need to understand about this? 

LUNGREN: No, you were right on. I think in the book, I think you called it like a paper shredder or something. I think you used a term like that. That’s exactly where the activity is right now and the opportunity.  

I’ve even taken that so far as to say that when folks are asking about what the technology is capable of doing, we say, well, listen, it’s going to save time before it saves lives. It’ll do both. But right now, it’s about saving time.  

It’s about peeling back the layers of the onion that if you, you know, put me in where I started medicine in 2003, and then fast-forward and showed me a day in the life of 2025, I would be shocked at what I was doing that wasn’t related to patient care, right. So, all of those layers that have been stacked up over the years, we can start finding ways to peel that back. And I think that’s exactly what we’re seeing.

And to your point, I think you mentioned this, too, which is, well, sure, we can do this transcript, and we can turn a note, but then we can do other things, right. We can summarize that in the patient’s language or education level of choice. We can pend orders. We can eventually get to a place of decision support. So, “Hey, did you think about this diagnosis, doctor?” Like those kinds of things.  

And all those things, I think you highlighted beautifully, and again, it sounds like with, you know, a lot of, right, just kind of guesswork and prediction, but those things are actually happening every single day right now.  

LEE: Well, so now, you know, in this episode, we’re really trying to understand, you know, where the technology industry is in delivering these kinds of things. And so from your perspective, you know, in the business that you’re helping to run here at Microsoft, you know, what are the things that are actually shipping as product versus things that clinicians are doing, let’s say, off label, just by using, say, ChatGPT on their personal mobile devices, and then what things aren’t happening? 

LUNGREN: Yeah. I’ll start with the shipping part because I think you, again, you know my background, right. Academic clinician, did a lot of research, hadn’t had a ton of product experience.  

In other words, like, you know, again, I’m happy to show you what benchmarks we beat or a new technique or, you know, get a grant to do all this, or even frankly, you know, talk about startups. But to actually have an audience that is accustomed to a certain level of performance for the solutions that they use, to be able to deliver something new at that same level of expectation, wow, that’s a big deal.  

And again, this is part of the learning by, you know, kind of being around this environment that we have, which is we have this, you know, incredibly focused, very experienced clinical product team, right.

And then I think on the other side, to your point about the general-purpose aspect of this, it’s no secret now, right, that, you know, this is a useful technology in a lot of different medical applications. And let’s just say that there’s a lot of knowledge that can be used, particularly by the physician community. And I think the most recent survey I saw was from the British Medical Journal, which said, hey, you know, which doctors are using … are you willing to tell us, you know, what you’re doing? And it turns out that folks are, what, 30% or so said that they were using it regularly in clinic [2]. And again, this is the general, this is the API or whatever off the shelf.

And then frankly, when they ask what they’re using it for, tends to be things like, “Hey, differential, like, help me fill in my differential or suggest … ” and to me, I think what that created, at least—and you’re starting to see this trend really accelerate in the US especially—is, well, listen, we can’t have everybody pulling out their laptops and potentially exposing, you know, patient information by accident or something to a public API.  

We have to figure this out, and so brilliantly, I think NYU [New York University] was one of the first. Now I think there’s 30 plus institutions that said, listen, “OK, we know this is useful to the entire community in the healthcare space.” Right? We know the administrators and nurses and everybody thinks this is great.  

We can’t allow this sort of to be a very loosey-goosey approach to this, right, given this sort of environment. So, what we’ll do is we’ll set up a HIPAA-compliant instance to allow anyone in the community—you know, in the health system—to use the models, and then whatever, the newest model comes, it gets hosted, as well.  

And what’s cool about that—and that’s happened now a lot of places—is that at the high level … first of all, people get to use it and experiment and learn. But at the high level, they’re actually seeing what are the common use cases. Because you could ask 15 people and you might get super long lists, and it may not help you decide what to operationalize in your health system.  

LEE: But let me ask you about that. When you observe that, are there times when you think, “Oh, some specific use cases that we’re observing in that sort of organic way need to be taken into specialized applications and made into products?” Or is it best to keep these things sort of, you know, open-chat-interface types of general-purpose platform?  

LUNGREN: Honestly, it’s both, and that’s exactly what we’re seeing. I’m most familiar with Stanford, kind of, the work that Nigam Shah leads on this. But he, he basically, … you know, there’s a really great paper that is coming out in JAMA, but basically saying, “Here’s what our workforce is using it for. Here are the things in the literature that would suggest what would be popular.”  

And some of those line up, like helping with a clinical diagnosis or documentation, but some of them don’t. But for the most part, the stuff that flies to the top, those are opportunities to operationalize and productize, etc. And I think that’s exactly what we’re seeing. 

LEE: So, let’s get into some of the specific predictions. We’ve, I think, beaten note-taking to death here. But there’s other kinds of paperwork, like filling out prior authorization request forms or referral letters, an after-visit note or summary to give instructions to patients, and so on. And these were all things that we were making guesses in our book might be happening. What’s the reality there? 

LUNGREN: I’ve seen every single one of those. In fact, I’ve probably seen a dozen startups too, right, doing exactly those things. And, you know, we touched a little bit on translation into the actual clinic. And that’s actually another thing that I used to kind of underappreciate, which is that, listen, you can have a computer scientist and a physician or nurse or whatever, like, give the domain expertise, and you think you’re ready to build something.  

The health IT [LAUGHS] is another part of that Venn diagram that’s so incredibly critical, and then exactly how are you going to bring that into the system. That’s a whole new ballgame. 

And so I do want to do a callout because the collaboration that we have with Epic is monumental because here, you have the system of record that most physicians, at least in the US, use. And they’re going to use an interface and they’re going to have an understanding of, hey, we know these are pain points, and so I think there’s some really, really cool, you know, new innovations that are coming out of the relationship that we have with Epic. And certainly the audience may be familiar with those, that I think will start to knock off a lot of the things that you predicted in your book relatively soon. 

LEE: I think most of the listeners to this podcast will know what Epic is. But for those that are unfamiliar with the health industry, and especially the technology foundation, Epic is probably the largest provider of electronic health record systems. And, of course, in collaboration with you and your team, they’ve been integrating generative AI quite a bit. Are there specific uses that Epic is making and deploying that get you particularly excited? 

LUNGREN: First of all, the ambient note generation, by the way, is integrated into Epic now. So like, you know, it’s not another screen, another thing for physicians. So that’s a huge, huge unlock in terms of the translation.

But then Epic themselves, so they have, I guess, on the last roadmap that they talked [about], more than 60, but the one that’s kind of been used now is this inbox response. 

So again, maybe someone might not be familiar with, why is it such a big deal? Well, if you’re a physician, you already have, you know, 20 patients to see that day and you got all those notes to do, and then Jevons paradox, right. So if you give me better access to my doctor, well, maybe I won’t make an appointment. I’m just going to send him a note and this is kind of this inbox, right.  

So then at the end of my day, I got to get all my notes done. And then I got to go through all the inbox messages I’ve received from all of my patients and make sure that they’re not like having chest pain and they’re blowing it off or something.  

Now that’s a lot of work and the cold start problem of like, OK, I to respond to them. So Epic has leveraged this system to say, “Let me just draft a note for you,” understanding the context of, you know, what’s going on with the patient, etc. And you can edit that and sign it, right. So you can accelerate some of those … so that’s probably one I’m most excited about. But there’s so many right now. 

LEE: Well, I think I need to let you actually state the name of the clinical note-taking product that you’re associated with. Would you like to do that? [LAUGHS] 

LUNGREN: [LAUGHS] Sure. Yeah, it’s called DAX Copilot [3]. And for the record, it is the fastest-growing copilot in the Microsoft ecosystem. We’re very proud of that. Five hundred institutions already are using it, and millions of notes have already been created with it. And the feedback has been tremendous.

LEE: So, you sort of referred to this a little bit, you know, this idea of AI being a second set of eyes. So, doctor makes some decisions in diagnosis or kind of working out potential treatments or medication decisions. And in the book, you know, we surmise that, well, AI might not replace the doctor doing those things. It could but might not. But AI could possibly reduce errors if doctors and nurses are making decisions by just looking at those decisions and just checking them out. Is that happening at all, and what do you see the future there? 

LUNGREN: Yeah, I would say, you know, that’s kind of the jagged edge of innovation, right, where sometimes the capability gets ahead of the ability to, you know, operationalize that. You know, part of that is just related to the systems. The evidence has been interesting on this. So, like, you know this, our colleague Eric Horvitz has been doing a lot of work in sort of looking at physician, physician with GPT-4, let’s say, and then GPT-4 alone for a whole variety of things. You know, we’ve been saying to the world for a long time, particularly in the narrow AI days, that AI plus human is better than either alone. We’re not really seeing that bear out really that well yet in some of the research.  

But it is a signal to me and to the use case you’re suggesting, which is that if we let this system, in the right way, kind of handle a lot of the safety-net aspects of what we do but then also potentially take on some of the things that maybe are not that challenging or at least somewhat simple.  

And of course, this is really an interesting use case in my world, in the vision world, which is that we know these models are multimodal, right. They can process images and text. And what does that look like for pathologists or radiologists, where we do have a certain percentage of the things we look at in a given day are normal, right? Or as close to normal as you can imagine. So is there a way to do that? And then also, by the way, have a safety net.  

And so I think that this is an extremely active area right now. I don’t think we’ve figured out exactly how to have the human and AI model interact in this space yet. But I know that there’s a lot of attempts at it right now. 

LEE: Yeah, I think, you know, this idea of a true copilot, you know, a true collaborator, you know, I think is still something that’s coming. I think we’ve had a couple of decades of people being trained to think of computers as question-answering machines. Ask a question, get an answer. Provide a document, get a summary. And so on.  

But the idea that something might actually be this second set of eyes just assisting you all day continuously, I think, is a new mode of interaction. And we haven’t quite figured that out.  

Now, in preparation for this podcast, Matt, you said that you actually used AI to assist you in getting ready. [LAUGHS] Would you like to share what you learned by doing that? 

LUNGREN: Yeah, it’s very funny. So, like, you may have heard this term coined by Ethan Mollick called the “secret cyborg,” (opens in new tab) which is sort of referring to the phenomena of folks using GPT, realizing it can actually help them a ton in all kinds of parts of their work, but not necessarily telling anybody that they’re using it, right.  

And so in a similar secret cyborgish way, I was like, “Well, listen, you know, I haven’t read your book in like a year. I recommend it to everybody. And [I need] just a refresher.” So what I did was I took your book, I put it into GPT-4, OK, and asked it to sort of talk about the predictions that you made.  

And then I took that and put it in the stronger reasoning model—in this case, the “deep research” that you may have just seen or heard of and the audience from OpenAI—and asked it to research all the current papers, you know, and blogs and whatever else and tell me like what was right, what was wrong in terms of the predictions. [LAUGHS]  

So it, actually, it was an incredible thing. It’s a, like, what, six or seven pages. It probably would have taken me two weeks, frankly, to do this amount of work.  

LEE: I’ll be looking forward to reading that in the New England Journal of Medicine shortly. 

LUNGREN: [LAUGHS] That’s right. Yeah, no, don’t, before this podcast comes out, I’ll submit it as an opinion piece. No. [LAUGHS] But, yeah, but I think on balance, incredibly insightful views. And I think part of that was, you know, your team that got together really had a lot of different angles on this. But, you know, and I think the only area that was, like, which I’ve observed as well, it’s just, man, this can do a lot for education.  

We haven’t seen … I don’t think we’re looking at this as a tutor. To your point, we’re kind of looking at it as a transactional in and out. But as we’ve seen in all kinds of data, both in low-, middle-income countries and even in Harvard, using this as a tutor can really accelerate your knowledge and in profound ways.  

And so that is probably one area where I think your prediction was maybe slightly even further ahead of the curve because I don’t think folks have really grokked that opportunity yet. 

LEE: Yeah, and for people who haven’t read the book, you know, the guess was that you might use this as a training aid if you’re an aspiring doctor. For example, you can ask GPT-4 to pretend to be a patient that presents a certain way and that you are the doctor that this patient has come to see. And so you have an interaction. And then when you say end of encounter, you ask GPT-4 to assess how well you did. And we thought that this might be a great training aid, and to your point, it seems not to have materialized.  

LUNGREN: There’s some sparks. You know, with, like, communication, end-of-life conversations that no physician loves to have, right. It’s very, very hard to train someone in those. I’ve seen some work done, but you’re right. It’s not quite hit mainstream yet. 

LEE: On the subject of things that we missed, one thing that you’ve been very, very involved in in the last several months has been in shipping products that are multimodal. So that was something I think that we missed completely. What is the current state of affairs for multimodal, you know, healthcare AI, medical AI? 

LUNGREN: Yeah, the way I like to explain it—and first of all, no fault to you, but this is not an area that, like, we were just so excited about the text use cases that I can’t fault you. But yeah, I mean, so if we look at healthcare, right, how we take care of patients today, as you know, the vast majority of the data in terms of just data itself is actually not in text, right. It’s going be in pathology and genomics and radiology, etc.  

And it seems like an opportunity here to watch this huge curve just goes straight up in the general reasoning and frankly medical competency and capabilities of the models that are coming and continue to come but then to see that it’s not as proficient for medical-specific imaging and video and, you know, other data types. And that gap is, kind of, what I describe as the multimodal medical AI gap.  

We’re probably in GPT-2 land, right, for this other modality types versus the, you know, we’re now at o3, who knows where we’re going to go. At least in our view, we can innovate in that space.  

How do we help bring those innovations to the broader community to close that gap and see some of these use cases really start to accelerate in the multimodal world?  

And I think we’ve taken a pretty good crack at that. A lot of that is credit to the innovative work. I mean, MSR [Microsoft Research] was two or three years ahead of everyone else on a lot of this. And so how do we package that up in a way that the community can actually access and use? And so, we took a lot of what your group had done in, let’s just say, radiology or pathology in particular, and say, “OK, well, let’s put this in an ecosystem of other models.” Other groups can participate in this, but let’s put it in a platform where maybe I’m really competent in radiology or pathology. How do I connect those things together? How do I bring the general reasoner knowledge into a multimodal use case?  

And I think that’s what we’ve done pretty well so far. We have a lot of work to do still, but this is very, very exciting. We’re seeing just such a ton of interest in building with the tools that we put out there. 

LEE: Well, I think how rapidly that’s advancing has been a surprise to me. So I think we’re running short on time. So two last questions to wrap up this conversation. The first one is, as we think ahead on AI in medicine, what do you think will be the biggest changes or make the biggest differences two years from now, five years from now, 10 years from now?

LUNGREN: This is really tough. OK. I think the two-year timeframe, I think we will have some autonomous agent-based workflows for a lot of the … what I would call undifferentiated heavy lifting in healthcare.  

And this is happening in, you know, the pharmaceutical industry, the payer … every aspect is sort of looking at their operations at a macro level: where are these big bureaucratic processes that largely involve text and where can we shrink those down and really kind of unlock a lot of our workforce to do things that might be more meaningful to the business? I think that’s my safe one.  

Going five years out, you know, I have a really difficult time grappling with this seemingly shrinking timeline to AGI [artificial general intelligence] that we hear from people who I would respect and certainly know more than me. And in that world, I think there’s only been one paper that I’ve seen that has attempted to say, what does that mean in healthcare (opens in new tab) when we have this?  

And the fact is, I actually don’t know. [LAUGHS] I wonder whether there’ll still be a gap in some modalities. Maybe there’ll be the ability to do new science, and all kinds of interesting things will come of that.  

But then if you go all the way to your 10-year, I do feel like we’re going to have systems that are acting autonomously in a variety of capacities, if I’m being honest.  

What I would like to see if I have any influence on some of this is, can we start to celebrate the closing of hospitals instead of opening them? Meaning that, can we actually start to address—at a personal, individual level—care? And maybe that’s outside the home, maybe that’s, you know, in a way that doesn’t have to use so many resources and, frankly, really be very reactive instead of proactive.  

I really want to see that. That’s been the vision of precision medicine for, geez, 20-plus years. I feel like we’re getting close to that being something we can really tackle. 

LEE: So, we talked about Geoff Hinton and his famous prediction that we would soon not have human radiologists. And of course, maybe he got the date wrong. So, let’s reset the date to 2028. So, Matt, do you think Geoff is right or wrong? 

LUNGREN: [LAUGHS] Yeah, so the way … I’m not going to dodge the question, but let me just answer this a different way.  

We have a clear line of sight to go from images to draft reports. That is unmistakable. And that’s now in 2025. How it will be implemented and what the implications of that will be, I think, will be heavily dependent on the health system or the incentive structure for where it’s deployed.  

So, if I’m trying to take a step back, back to my global health days, man, that can’t come fast enough. Because, you know, you have entire health systems, you know, in fact entire countries that have five, you know, medical imaging experts for the whole country, but they still need this to you know take care of patients.  

Zooming in on today’s crisis in the US, right, we have the burnout crisis just as much as the doctors who are seeing patients and write notes. We can’t keep up with the volume. In fact, we’re not training folks fast enough, so there is a push pull; there may be a flip to your point of autonomous reads across some segments of what we do.  

By 2028, I think that’s a reasonable expectation that we’ll have some form of that. Yes. 

LEE: I tend to agree, and I think things get reshaped, but it seems very likely that even far into the future we’ll have humans wanting to take care of other humans and be taken care of by humans.  

Matt, this has been a fantastic conversation, and, you know, I feel it’s always a personal privilege to have a chance to work with someone like you so keep it up. 

[TRANSITION MUSIC] 

LUNGREN: Thank you so much, Peter. Thanks for having me. 

LEE: I’m always so impressed when I talk to Matt, and I feel lucky that we get a chance to work together here at Microsoft. You know, one of the things that always strikes me whenever I talk to him is just how disruptive generative AI has been to a business like Nuance. Nuance has had clinical note-taking as part of their product portfolio for a long, long time. And so, you know, when generative AI comes along, it’s not only an opportunity for them, but also a threat because in a sense, it opens up the possibility of almost anyone being able to make clinical note-taking capabilities into products.  

It’s really interesting how Matt’s product, DAX Copilot, which since the time that we had our conversation has expanded into a full healthcare workflow product called Dragon Copilot, has really taken off in the marketplace and how many new competing AI products have also hit the market, and all in just two years, because of generative AI.  

The other thing, you know, that I always think about is just how important it is for these kinds of systems to work together and especially how they integrate into the electronic health record systems. This is something that Carey, Zak, and I didn’t really realize fully when we wrote our book. But you know, when you talk to both Matt and Seth, of course, we see how important it is to have that integration.  

Finally, what a great example of yet another person who is both a surgeon and a tech geek. [LAUGHS] People sometimes think of healthcare as moving very slowly when it comes to new technology, but people like Matt are actually making it happen much more quickly than most people might expect.  

Well, anyway, as I mentioned, we also had a chance to talk to Seth Hain, and so here’s my conversation with Seth:

LEE: Seth, thank you so much for joining.  

SETH HAIN: Well, Peter, it’s such an exciting time to sit down and talk about this topic. So much has changed in the last two years. Thanks for inviting me.  

LEE: Yeah, in fact, I think in a way both of our lives have been upended in many ways by the emergence of AI. [LAUGHTER]  

The traditional listeners of the Microsoft Research Podcast, I think for the most part, aren’t steeped in the healthcare industry. And so maybe we can just start with two things. One is, what is Epic, really? And then two, what is your job? What does the senior vice president for R&D at Epic do every day? 

HAIN: Yeah, well, let’s start with that first question. So, what is Epic? Most people across the world experience Epic through something we call MyChart. They might use it to message their physician. They might use it to check the lab values after they’ve gotten a recent test. But it’s an app on their phone, right, for connecting in with their doctors and nurses and really making them part of the care team.  

But the software we create here at Epic goes beyond that. It’s what runs in the clinic, what runs at the bedside, in the back office to help facilitate those different pieces of care, from collecting vital information at the bedside to helping place orders if you’re coming in for an outpatient visit, maybe with a kiddo with an earache, and capturing that note and record of what happened during that encounter, all the way through back-office encounters, back-office information for interacting with payers as an example.  

And so, we provide a suite of software that health systems and increasingly a broader set of the healthcare ecosystem, like payers and specialty diagnostic groups, use to connect with that patient at the center around their care. 

And my job is to help our applications across the company take advantage of those latest pieces of technology to help improve the efficiency of folks like clinicians in the exam room when you go in for a visit. We’ll get into, I imagine, some use cases like ambient conversations, capturing that conversation in the exam room to help drive some of that documentation.  

But then providing that platform for those teams to build those and then strategize around what to create next to help both the physicians be efficient and also the health systems. But then ultimately continuing to use those tools to advance the science of medicine. 

LEE: Right. You know, one thing that I explain to fellow technologists is that I think today health records are almost entirely digital. I think the last figures I saw is well over 99% of all health records are digital.  

But in the year 2001, fewer than 15% of health records were digital. They were literally in folders on paper in storerooms, and if you’re old enough, you might even remember seeing those storerooms.  

So, it’s been quite a journey. Epic and Epic’s competitors—though I think Epic is really the most important company—have really moved the entire infrastructure of record keeping and other communications in healthcare to a digital foundation.  

And I think one thing we’ll get into, of course, one of the issues that has really become, I think, a problem for doctors and nurses is the kind of clerical or paperwork, record-keeping, burden. And for that reason, Epic and Epic systems end up being a real focus of attention. And so, we’ll get into that in a bit here.  

HAIN: And I think that hits, just to highlight it, on both sides. There is both the need to capture documentation; there’s also the challenge in reviewing it.  

LEE: Yes.  

HAIN: The average medical record these days is somewhere between the length of Fahrenheit 451 and To Kill a Mockingbird. [LAUGHTER] So there’s a fair amount of effort going in on that review side, as well. 

LEE: Yeah, indeed. So much to get into there. But I would like to talk about encounters with AI. So obviously, I think there are two eras here: before the emergence of ChatGPT and what we now call of as generative AI and afterwards. And so, let’s take the former.  

Of course, you’ve been thinking about machine learning and health data probably for decades. Do you have a memory of how you got into this? Why did you get an interest in data analytics and machine learning in the first place? 

HAIN: Well, my background, as you noted, is in mathematics before I came to Epic. And the sort of patterns and what could emerge were always part of what drove that. Having done development and kind of always been around computers all my life, it was a natural transition as I came here.  

And I started by really focusing on, how do we scale systems for the very largest organizations, making sure they are highly available and also highly responsive? Time is critical in these contexts in regards to rapidly getting information to doctors and nurses.  

And then really in the, say, in the 2010s, there started to be an emergence of capabilities from a storage and compute perspective where we could begin to build predictive analytics models. And these were models that were very focused, right. It predicted the likelihood somebody would show up for an appointment. It predicted the likelihood that somebody may fall during an inpatient stay, as an example.  

And I think a key learning during that time period was thinking through the full workflow. What information was available at that point in time, right? At the moment somebody walks into the ED [emergency department], you don’t have a full picture to predict the likelihood that they may deteriorate during an inpatient encounter.  

And in addition to what information was available was, what can you do about it? And a key part of that was how do we help get the right people in the right point in time at the bedside to make an assessment, right? It was a human-in-the-loop type of workflow where, for example, you would predict deterioration in advance and have a nurse come to the bedside or a physician come to the bedside to assess.  

And I think that combination of narrowly focused predictive models with an understanding that to have them make an impact you had to think through the full workflow of where a human would make a decision was a key piece. 

LEE: Obviously there is a positive human impact. And so, for sure, part of the thought process for these kinds of capabilities comes from that.  

But Epic is also a business, and you have to worry about, you know, what are doctors and clinics and healthcare systems willing to buy. And so how do you balance those two things, and do those two things ever come into conflict as you’re imagining what kinds of new capabilities and features and products to create? 

HAIN: Two, sort of, two aspects I think really come to mind. First off, generally speaking, we see analytics and AI as a part of the application. So, in that sense, it’s not something we license separately. We think that those insights and those pieces of data are part of what makes the application meaningful and impactful.  

At the scale that many of these health systems operate and the number of patients that they care for, as well as having tens of thousands of users in the system daily, one needs to think about the compute overhead … 

LEE: Yes. 

HAIN: … that these things cause. And so, in that regard, there is always a ROI assessment that is taking place to some degree around, what happens if this runs at full scale? And in a way, that really got accelerated as we went into the generative AI era.  

LEE: Right. OK. So, you mentioned generative AI. What was the first encounter, and what was that experience for you?

HAIN: So, in the winter of ’22 and into 2023, I started experimenting alongside you with what we at that time called DV3, or Davinci 3, and eventually became GPT-4. And immediately, a few things became obvious. The tool was highly general purpose. One was able to, in putting in a prompt, have it sort of convert into the framing and context of a particular clinical circumstance and reason around that context. But I think the other thing that started to come to bear in that context was there was a fair amount of latent knowledge inside of it that was very, very different than anything we’d seen before. And, you know, there’s some examples from the Sparks of AGI paper from Microsoft Research, where a series of objects end up getting stacked together in the optimal way to build height. Just given the list of objects, it seems to have a understanding of physical space that it intuited from the training processes we hadn’t seen anywhere. So that was an entirely new capability that programmers now had access to.  

LEE: Well in fact, you know, I think that winter of 2022, and we’ll get into this, one of your projects that you’ve been running for quite a few years is something called Cosmos (opens in new tab), which I find exceptionally interesting. And I was motivated to understand whether this type of technology could have an impact there.  

And so, I had to receive permission from both OpenAI and Microsoft to provide you with early access.  

When I did first show this technology to you, you must have had an emotional response, either skepticism or … I can’t imagine you just trusted, you know, trusted me to the extent of believing everything I was telling you. 

HAIN: I think there’s always a question of, what is it actually, right? It’s often easy to create demos. It’s often easy to show things in a narrow circumstance. And it takes getting your hands on it and really spending your 10,000 hours digging in and probing it in different ways to see just how general purpose it was.  

And so, the skepticism was really around, how applicable can this be broadly? And I think the second question—and we’re starting to see this play out now in some of the later models—was, is this just a language thing? Is it narrowly only focused on that? Or can we start to imagine other modalities really starting to factor into this? How will it impact basic sciences? Those sorts of things.

On a personal note, I mean, I had, at that point, now they’re now 14 and 12, two kids that I wondered, what did this mean for them? What is the right thing for them to be studying? And so I remember sleepless nights on that topic, as well. 

LEE: OK, so now you get early access to this technology; you’re able to do some experimentation. I think one of the things that impressed me is just less than four months later at the major health tech industry conference, HIMSS, which also happened timing-wise to take place just after the public disclosure of GPT-4, Epic showed off some early prototype applications of generative AI. And so, describe what those were, and how did you choose what to try to do there? 

HAIN: Yeah, and we were at that point, we actually had the very first users live on that prototype, on that early version.  

And the key thing we’d focused on—we started this development in very, very late December, January of 2023—was a problem that its origins really were during the pandemic.  

So, during the pandemic, we started to see patients increasingly messaging their providers, nurses, and clinicians through MyChart, that patient portal I mentioned with about 190 million folks on it. And as you can imagine, that was a great opportunity in the context of COVID to limit the amount of direct contact between providers and patients while still getting their questions answered.  

But what we found as we came out of the pandemic was that folks preferred it regardless. And that messaging volume had stayed very, very high and was a time-consuming effort for folks.  

And so, the first use case we came out with was a draft message in the context of the message from the patient and understanding of their medical history using that medical record that we talked about.  

And the nurse or physician using the tool had two options. They could either click to start with that draft and edit it and then hit send, or they could go back to the old workflow and start with a blank text box and write it from their own memory as they preferred.

And so that was that very first use case. There were many more that we had started from a development perspective, but, yeah, we had that rolling out right in March of 2023 there with the first folks. 

LEE: So, I know from our occasional discussions that some things worked very well. In fact, this is a real product now for Epic. And it seems to be really a very, very popular feature now. I know from talking to you that a lot of things have been harder. And so, I’d like to dive into that. As a developer, tech developer, you know, what’s been easy, what’s been hard, what’s in your mind still is left to do in terms of the development of AI? 

HAIN: Yeah. You know, the first thing that comes to mind sort of starting foundationally, and we hinted at this earlier in our conversation, was at that point in time, it was kind of per a message, rather compute-intensive to run these. And so, there were always trade-offs we were making in regards to how many pieces of information we would send into the model and how much would we request back out of it.  

The result of that was that while kind of theoretically or even from a research perspective, we could achieve certain outcomes that were quite advanced, one had to think about, where you make those trade-offs from a scalability perspective as you wanted to roll that out to lot of folks. So … 

LEE: Were you charging your customers more money for this feature? 

HAIN: Yeah, essentially the way that we handle that is there’s compute that’s required. As I mentioned, the feature is just part of our application. So, it’s just what they get with an upgrade.  

But that compute overhead is something that we needed to pass through to them. And so, it was something, particularly given both the staffing challenges, but also the margin pressures that health systems are feeling today, we wanted to be very cautious and careful about. 

LEE: And let’s put that on the stack because I do want to get into, from the selling perspective, that challenge and how you perceive health systems as a customer making those trade-offs. But let’s continue on the technical side here. 

HAIN: Yeah. On the technical side, it was a consideration, right. We needed to be thoughtful about how we used them. But going up a layer in the stack, at that time, there’s a lot of conversation in the industry around something called RAG, or retrieval-augmented generation.  

And the idea was, could you pull the relevant bits, the relevant pieces of the chart, into that prompt, that information you shared with the generative AI model, to be able to increase the usefulness of the draft that was being created? And that approach ended up proving and continues to be to some degree, although the techniques have greatly improved, somewhat brittle, right. You have a general-purpose technology that is drafting the response. 

But in many ways, you needed to, for a variety of pragmatic reasons, have somewhat brittle capability in regards to what you pulled into that approach. It tended to be pretty static. And I think this becomes one of the things that, looking forward, as these models have gotten a lot more efficient, we are and will continue to improve upon because, as you get a richer and richer amount of information into the model, it does a better job of responding.  

I think the third thing, and I think this is going to be something we’re going to continue to work through as an industry, was helping users understand and adapt to these circumstances. So many folks when they hear AI think, it will just magically do everything perfectly.  

And particularly early on with some of those challenges we’re talking about, it doesn’t. You know, if it’s helpful 85% of the time, that’s great, but it’s not going to be 100% of the time. And it’s interesting as we started, we do something we call immersion, where we always make sure that developers are right there elbow to elbow with the users of the software. 

And one of the things that I realized through that experience with some of the very early organizations like UCSD [UC San Diego] or University of Wisconsin here in Madison was that even when I’m responding to an email or a physician is responding to one of these messages from a patient, depending on the patient and depending on the person, they respond differently.  

In that context, there’s opportunity to continue to mimic that behavior as we go forward more deeply. And so, you learn a lot about, kind of, human behavior as you’re putting these use cases out into the world. 

LEE: So, you know, this increasing burden of electronic communications between doctors, nurses, and patients is centered in one part of Epic. I think that’s called your in-basket application, if I understand correctly.  

HAIN: That’s correct. 

LEE: But that also creates, I think, a reputational risk and challenge for Epic because as doctors feel overburdened by this and they’re feeling burnt out—and as we know, that’s a big issue—then they point to, you know, “Oh, I’m just stuck in this Epic system.”  

And I think a lot of the dissatisfaction about the day-to-day working lives of doctors and nurses then focuses on Epic. And so, to what extent do you see technologies like generative AI as, you know, a solution to that or contributing either positively or negatively to this? 

HAIN: You know, earlier I made the comment that in December, as we started to explore this technology, we realized there were a class of problems that now might have solutions that never did before.  

And as we’ve started to dig into those—and we now have about 150 different use cases that are under development, many of which are live across … we’ve got about 350 health systems using them—one of the things we’ve started to find is that physicians, nurses, and others start to react to saying it’s helping them move forward with their job.  

And examples of this, obviously the draft of the in-basket message response is one, but using ambient voice recognition as a kind of new input into the software so that when a patient and a physician sit down in the exam room, the physician can start a recording and that conversation then ends up getting translated or summarized, if you will, including using medical jargon, into the note in the framework that the physician would typically write.  

Another one of those circumstances where they then review it, don’t need to type it out from scratch, for example, …  

LEE: Right. 

HAIN: … and can quickly move forward.  

I think looking forward, you know, you brought up Cosmos earlier. It’s a suite of applications, but at its core is a dataset of about 300 million de-identified patients. And so using generative AI, we built research tools on top of it. And I bring that up because it’s a precursor of how that type of deep analytics can be put into context at the point of care. That’s what we see this technology more deeply enabling in the future. 

LEE: Yeah, when you are creating … so you said there are about 150 sort of integrations of generative AI going into different parts of Epic’s software products.  

When you are doing those developments and then you’re making a decision that something is going to get deployed, one thing that people might worry about is, well, these AI systems hallucinate. They have biases. There are unclear accountabilities, you know, maybe patient expectations.  

For example, if there’s a note drafted by AI that’s sent to a patient, does the patient have a right to know what was written by AI and what was written by the human doctor? So, can we run through how you have thought about those things?  

HAIN: I think one thing that is important context to set here for folks, and I think it’s often a point of confusion when I’m chatting with folks in public, is that their interaction with generative AI is typically through a chatbot, right. It’s something like ChatGPT or Bing or one of these other products where they’re essentially having a back-and-forth conversation. 

LEE: Right. 

HAIN: And that is a dramatically different experience than how we think it makes sense to embed into an enterprise set of applications.  

So, an example use case may be in the back office, there are folks that are coding encounters. So, when a patient comes in, right, they have the conversation with the doctor, the doctor documents it, that encounter needs to be billed for, and those folks in the back-office associate to that encounter a series of codes that provide information about how that billing should occur.

So, one of the things we did from a workflow perspective was add a selector pane to the screen that uses generative AI to suggest a likely code. Now, this suggestion runs the risk of hallucination. So, the question is, how do you build into the workflow additional checks that can help the user do that?  

And so in this context, we always include a citation back to the part of the medical record that justifies or supports that code. So quickly on hover, the user can see, does this make sense before selecting it? And it’s those types of workflow pieces that we think are critical to using this technology as an aid to helping people make decisions faster, right. It’s similar to drafting documentation that we talked about earlier.  

And it’s interesting because there’s a series of patterns that are … going back to the AI Revolution book you folks wrote two years ago. Some of these are really highlighted there, right. This idea of things like a universal translator is a common pattern that we ended up applying across the applications. And in my mind, translation, this may sound a little bit strange, but summarization is an example of translating a very long series of information in a medical record into the context that an ED physician might care about, where they have three or four minutes to quick review that very long chart.  

And so, in that perspective, and back to your earlier comment, we added the summary into the workflow but always made sure that the full medical record was available to that user, as well. So, a lot of what we’ve done over the last couple of years has been to create a series of repeatable techniques in regards to both how to build the backend use cases, where to pull the information, feed it into the generative AI models.  

But then I think more importantly are the user experience design patterns to help mitigate those risks you talked about and to maintain consistency across the integrated suite of applications of how those are deployed.  

LEE: You might remember from our book, we had a whole chapter on reducing paperwork, and I think that’s been a lot of what we’ve been talking about. I want to get beyond that, but before transitioning, let’s get some numbers.  

So, you talked about messages drafted to patients, to be sent to patients. So, give a sense of the volume of what’s happening right now. 

HAIN: Oh, we are seeing across the 300 and, I think it’s, 48 health systems that are now using generative AI—and to be clear, we have about 500 health systems we have the privilege of working with, each with many, many hospitals—there are tens of thousands of physicians and nurses using the software. That includes drafting million-plus, for example, notes a month at this point, as well as helping to generate in a similar ballpark that number of responses to patients.  

The thing I’m increasingly excited about is the broader set of use cases that we’re seeing folks starting to deploy now. One of my favorites has been … it’s natural that as part of, for example, a radiology workflow, in studying that image, the radiologist made note that it would be worth double checking, say in six to eight months, that the patient have this area scanned of their chest. Something looks a little bit fishy there, but there’s not … 

LEE: There’s not a definitive finding yet. 

HAIN: … there’s not a definitive finding at that point. Part of that workflow is that the patient’s physician place an order for that in the future. And so, we’re using generative AI to note that back to the physician. And with one click, allow them to place that order, helping that patient get better care.  

That’s one example of dozens of use cases that are now live, both to help improve the care patients are getting but also help the workforce. So going back to the translation-summarization example, a nurse at the end of their shift needs to write up a summary of that shift for the next nurse for each … 

LEE: Right. 

HAIN: … each patient that they care for. Well, they’ve been documenting information in the chart over those eight or 12 hours, right.  

LEE: Yep, yep. 

HAIN: So, we can use that information to quickly draft that end-of-shift note for the nurse. They can verify it with those citations we talked about and make any additions or edits that they need and then complete their end of day far more efficiently.  

LEE: Right. OK. So now let’s get to Cosmos, which has been one of these projects that I think has been your baby for many years and has been something that has had a profound impact on my thinking about possibilities. So first off, what is Cosmos? 

HAIN: Well, just as an aside, I appreciate the thoughtful comments. There is a whole team of folks here that are really driving these projects forward. And a large part of that has been, as you brought up, both Cosmos as a foundational capability but then beginning to integrate it into applications. And that’s what those folks spend time on.  

Cosmos is this effort across hundreds of health systems that we have the privilege of working with to build out a de-identified dataset with today—and it climbs every day—but 300 million unique patient records in it.  

And one of the interesting things about that structure is that, for example, if I end up in a hospital in Seattle and have that encounter documented at a health system in Seattle, I still—a de-identified version of me—still only shows up once in Cosmos, stitching together both my information from here in Madison, Wisconsin, where Epic is at, with that extra data from Seattle. The result is these 300 million unique longitudinal records that have a deep history associated with them.  

LEE: And just to be clear, a patient record might have hundreds or even thousands of individual, I guess what you would call, clinical records or elements. 

HAIN: That’s exactly right. It’s the breadth of information from orders and allergies and blood pressures collected, for example, in an outpatient setting to cancer staging information that might have come through as part of an oncology visit. And it’s coming from a variety of sources. We exchange information about 10 million times a day between different health systems. And that full picture is available within Cosmos in that way of the patient. 

LEE: So now why? Why Cosmos? 

HAIN: Why Cosmos? Well, the real ultimate aim is to put a deeply informed in-context perspective at the point of care. So, as a patient, if I’m in the exam room, it’s helpful for the physician and me to know what have similar patients like me experienced in this context. What was the result of that line of treatment, for example? 

Or as a doctor, if I’m looking and working through a relatively rare or strange case to me, I might be able to connect with—this as an example workflow we built called Look-Alikes—with another physician who has seen similar patients or within the workflow see a list of likely diagnoses based on patients that have been in a similar context. And so, the design of Cosmos is to put those insights into the point of care in the context of the patient.  

To facilitate those steps there, the first phase was building out a set of research tooling. So, we see dozens of papers a year being published by the health systems that we work with. Those that participate in Cosmos have access to it to do research on it. And so they use both a series of analytical and data science tools to do that analysis and then publish research. So, building up trust that way.  

LEE: The examples you gave are, like with Look-Alikes, it’s very easy, I think, for people outside of the healthcare world to imagine how that could be useful. So now why is GPT-4 or any generative AI relevant to this? 

HAIN: Well, so a couple of different pieces, right. Earlier we talked about—and I think this is the most important—how generative AI is able to cast things into a specific context. And so, in that way, we can use these tools to help both identify a cohort of patients similar to you when you’re in the exam room. And then also help present that information back in a way that relates to other research and understandings from medical literature to understand what are those likely outcomes.  

I think more broadly, these tools and generative AI techniques in the transformer architecture envision a deeper understanding of sequences of events, sequences of words. And that starts to open up broader questions about what can really be understood about patterns and sequences of events in a patient’s journey.  

Which if you didn’t know, the name Epic, just like a great long nation’s journey is told through an epic story, is a patient’s story. So that’s where it came from. 

LEE: So, we’re running up against our time together. And I always like to end with a more provocative question.  

HAIN: Certainly. 

LEE: And for you, I wanted to raise a question that I think we had asked ourselves in the very earliest days that we were sharing Davinci 3, what we now know of as GPT-4, with each other, which is, is there a world in the future because of AI where we don’t need electronic health records anymore? Is there a world in the future without EHR? 

HAIN: I think it depends on how you define EHR. I see a world coming where we need to manage a hybrid workforce, where there is a combination of humans and something folks are sometimes calling agents working in concert together to care for more and more of our … of the country and of the world. And there is and will need to be a series of tools to help orchestrate that hybrid workforce. And I think things like EHRs will transform into helping that operate … be operationally successful.  

But as a patient, I think there’s a very different opportunity that starts to be presented. And we’ve talked about kind of understanding things deeply in context. There’s also a real acceleration happening in science right now. And the possibility of bringing that second- and third-order effects of generative AI to the point of care, be that through the real-world evidence we were talking about with Cosmos or maybe personalized therapies that really are well matched to that individual. These generative AI techniques open the door for that, as well as the full lifecycle of managing that from a healthcare perspective all the way through monitoring after the fact.  

And so, I think we’ll still be recording people’s stories. Their stories are relevant to them, and they can help inform the bigger picture. But I think the real question is, how do you put those in a broader context? And these tools open the door for a lot more. 

LEE: Well, that’s really a great vision for the future.  

[TRANSITION MUSIC] 

Seth, I always really learn so much talking to you, and thank you so much for this great chat. 

HAIN: Thank you for inviting me.   

LEE: I see Seth as someone on the very leading frontier of bringing generative AI to the clinic and into the healthcare back office and at the full scale of our massive healthcare system. It’s always impressive to me how thoughtful Seth has had to be about how to deploy generative AI into a clinical setting.  

And, you know, one thing that sticks out—and he made such a point of this—is, you know, generative AI in the clinical setting isn’t just a chatbot. They’ve had to really think of other ways that will guarantee that the human stays in the loop. And that’s of course exactly what Carey, Zak, and I had predicted in our book. In fact, we even had a full chapter of our book entitled “Trust but Verify,” which really spoke to the need in medicine to always have a human being directly involved in overseeing the process of healthcare delivery. 

One technical point that Carey, Zak, and I completely missed, on the other hand, in our book, was the idea of something that Seth brought up called RAG, which is retrieval-augmented generation. That’s the idea of giving AI access to a database of information and allowing it to use that database as it constructs its answers. And we heard from Seth how fundamental RAG is to a lot of the use cases that Epic is deploying. 

And finally, I continue to find Seth’s project called Cosmos to be a source of inspiration, and I’ve continued to urge every healthcare organization that has been collecting data to consider following a similar path. 

In our book, we spent a great deal of time focusing on the possibility that AI might be able to reduce or even eliminate a lot of the clerical drudgery that currently exists in the delivery of healthcare. We even had a chapter entitled “The Paperwork Shredder.” And we heard from both Matt and Seth that that has indeed been the early focus of their work.  

But we also saw in our book the possibility that AI could provide diagnoses, propose treatment options, be a second set of eyes to reduce medical errors, and in the research lab be a research assistant. And here in Epic’s Cosmos, we are seeing just the early glimpses that perhaps generative AI can actually provide new research possibilities in addition to assistance in clinical decision making and problem solving. On the other hand, that still seems to be for the most part in our future rather than something that’s happening at any scale today. 

But looking ahead to the future, we can still see the potential of AI helping connect healthcare delivery experiences to the advancement of medical knowledge. As Seth would say, the ability to connect bedside to the back office to the bench. That’s a pretty wonderful future that will take a lot of work and tech breakthroughs to make it real. But the fact that we now have a credible chance of making that dream happen for real, I think that’s pretty wonderful. 

[MUSIC TRANSITIONS TO THEME] 

I’d like to say thank you again to Matt and Seth for sharing their experiences and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a look at how patients are using generative AI for their own healthcare, as well as an episode on the laws, norms, and ethics developing around AI and health, and more. We hope you’ll continue to tune in.

Until next time.

[MUSIC FADES] 

[1] A provider of conversational, ambient, and generative AI, Nuance was acquired by Microsoft in March 2022 (opens in new tab). Nuance solutions and capabilities are now part of Microsoft Cloud for Healthcare.

[2] According to the survey (opens in new tab), of the 20% of respondents who said they use generative AI in clinical practice, 29% reported using the technology for patient documentation and 28% said they use it for differential diagnosis.

[3] A month after the conversation was recorded, Microsoft Dragon Copilot was unveiled. Dragon Copilot combines and extends the capabilities of DAX Copilot and Dragon Medical One.


The post Real-world healthcare AI development and deployment—at scale appeared first on Microsoft Research.

]]>
The reality of generative AI in the clinic http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-the-reality-of-generative-ai-in-the-clinic/ Thu, 20 Mar 2025 21:50:50 +0000 http://approjects.co.za/?big=en-us/research/?p=1134402 UC San Diego Health’s Dr. Christopher Longhurst and UC San Francisco Health’s Dr. Sara Murray explore how generative AI is changing patient care, clinical workflows, and decision-making and how they envision the technology impacting the future of healthcare.

The post The reality of generative AI in the clinic appeared first on Microsoft Research.

]]>
AI Revolution podcast | Episode 1 - The reality of generative AI in the clinic | outline illustration of Dr. Sara Murray, Peter Lee, Dr. Christopher Longhurst

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series—The AI Revolution in Medicine, Revisited—Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Dr. Christopher Longhurst (opens in new tab) and Dr. Sara Murray (opens in new tab), leading experts in healthcare AI implementation, join Lee to discuss the current state and future of AI in clinical settings. Longhurst, chief clinical and innovation officer at UC San Diego Health and executive director of the Jacobs Center for Health Innovation, details his healthcare system’s collaboration with Epic and Microsoft to integrate GPT into their electronic health record system, offering clinicians support in responding to patient messages. Dr. Murray, chief health AI officer at UC San Francisco Health, discusses AI’s integration into clinical workflows, the promise and risks of AI-driven decision-making, and how generative AI is reshaping patient care and physician workload.


Learn more:

Large Language Models for More Efficient Reporting of Hospital Quality Measures (opens in new tab)
Publication | October 2024 

Generative artificial intelligence responses to patient messages in the electronic health record: early lessons learned (opens in new tab)
Publication | July 2024 

The Chief Health AI Officer — An Emerging Role for an Emerging Technology (opens in new tab)
Publication | June 2024 

AI-Generated Draft Replies Integrated Into Health Records and Physicians’ Electronic Communication (opens in new tab) 
Publication | April 2024 

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum (opens in new tab)
Publication | April 2023

The AI Revolution in Medicine: GPT-4 and Beyond
Book | April 2023

Transcript 

[MUSIC] 

[BOOK PASSAGE]  

PETER LEE: “The workload on healthcare workers in the United States has increased dramatically over the past 20 years, and in the worst way possible. … Far too much of the practical, day-to-day work of healthcare has evolved into a crushing slog of filling out and handling paperwork. … GPT-4 indeed looks very promising as a foundational technology for relieving doctors of many of the most taxing and burdensome aspects of their daily jobs.” 

[END OF BOOK PASSAGE] 

[THEME MUSIC] 

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee. 

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?  

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 

[THEME MUSIC FADES]

What I read there at the top is a passage from Chapter 2 of the book, which captures part of what we’re going to cover in this episode. 

In our book, we predicted how AI would be leveraged in the clinic. Some of those predictions, I felt, were slam dunks, for example, AI being used to listen to doctor-patient conversations and write clinical notes. There were already early products coming out in the world not using generative AI that were doing just that. But other predictions we made were bolder, for instance, on the use of generative AI as a second set of eyes, to look over the shoulder of a doctor or a nurse or a patient and spot mistakes.

In this episode, I’m pleased to welcome Dr. Chris Longhurst and Dr. Sara Murray to talk about how clinicians in their respective systems are using AI, their reactions to it, and what’s ahead. Chris is the chief clinical and innovation officer at UC San Diego Health, and he is also the executive director of the Joan & Irwin Jacobs Center for Health Innovation. He’s in charge of UCSD Health’s digital strategy, including the integration of new technologies from bedside to bench and reaching across UCSD Health, the School of Medicine, and the Jacobs School of Engineering. Chris is a board-certified pediatrician and clinical informaticist.

Sara is vice president and chief health AI officer at UC San Francisco Health. Sara is an internal medicine specialist and associate professor of clinical medicine. A doctor, a professor of medicine, and a strategic health system leader, she builds infrastructure and governance processes to ensure that UCSF’s deployment of AI, including both AI procured from companies as well as AI-powered tools developed in-house, are trustworthy and ethical.

I’ve known Chris and Sara for years, and what’s really impressed me about their work—and frankly, the work of all the guests we’ll have on the show—is that they’ve all done something significant to advance the use of AI in healthcare.

[TRANSITION MUSIC]

Here’s my conversation with Dr. Chris Longhurst:  

LEE: Chris, thank you so much for joining us today. 

CHRISTOPHER LONGHURST: Peter, it’s a pleasure to be here. Really appreciate it. 

LEE: We’re going to get into, you know, what’s happening in the clinic with AI. But I think we need to find out a little bit more about you first. I introduced you as a person with a fancy title, chief clinical and innovation officer. What is that exactly, and how do you spend a typical day at work? 

LONGHURST: Well, I have a little bit of a unicorn job because my portfolio includes information technology, and I’m a recovering CIO after spending seven years in that role. It also includes quality patient safety, case management, and the office of our chief medical officer.  

And so I’m really trying to unify our mission to deliver highly reliable care with these new tools in a way that allows us to transform that care. One good analogy, I think, is it’s about the game, right. Our job is not only to play the game and win the game using the existing tools but also to change the game by leveraging these new tools and showing the rest of the country how that can be done. 

LEE: And so as you’re doing that, I can understand, of course, you’re working at a very, kind of, senior executive level. But, you know, when I’ve visited you at UCSD Health, you’re also working with clinicians, doctors, and nurses all the time. In a way, I viewed you as, sort of, connective tissue between these things. Is that accurate? 

LONGHURST: Well, sure. And we’ve got, you know, several physicians who are part of the executive team who are also continuing to practice, and I think that’s one of the ways in which doctors on the executive team can bring value, is being that connective tissue, being the ears on the ground and a little dose of reality. 

LEE: [LAUGHS] Well, in fact, that reality is really what I want to delve into. But I just want to, before getting into that, talk a little bit about AI and your encounters with AI. And I think we have to do it in two stages because there is AI and machine learning and data analytics prior to the rise of generative AI and then, of course, after. And so tell us a little bit about, you know, what got you into health informatics and AI to begin with. 

LONGHURST: Well, Peter, I know that you play video games, and I did too for many years. So I was an early John Carmack id Software, Castle Wolfenstein, and Doom fan.  

LEE: Love it.  

LONGHURST: And that kept me occupied because I lived out in the country on 50 acres of almond trees. And so it was computer gaming that first got me into computers.  

But during medical school, I decided to pursue graduate work in this field called health informatics. And actually my master’s thesis was using machine learning to help identify and distinguish innocent from pathologic heart murmurs in children. And I worked with Dr. Nancy Reed at UC Davis, who had programmed using Lisp, a really fancy tool to do exactly that.  

And I will tell you that if I never see another parentheses in Lisp code again, it’ll be too soon. So I spent a solid year on that. 

LEE: [LAUGHS] No, no, but you should wear that as a badge of honor. And I will guess that no other guest on this podcast series will have programmed in Lisp. So kudos to you. 

LONGHURST: [LAUGHS] Well, it was a lot of work, and I learned a lot, but as you can imagine, it wasn’t highly successful at the time. And fast forward, we’ve had lots of traditional machine learning kind of activities using discrete data for predictive analytics to help predict flow in the hospital and even sepsis, which we can talk about. But as you said, the advent of generative AI in the fall of 2022 was a real game-changer. 

LEE: Well, you have this interest in technology, and, in fact, I do know you as a fairly intensely geeky person. Really, I think maybe that’s one reason why we’ve been attracted to each other. But you also got drawn into medicine. Where did that come from? 

LONGHURST: So my father was a practicing cardiologist and scientist. He was MD, PhD trained, and he really shared with me both a love of medicine but also science. I worked in his lab for three summers, and it was during college I decided I wanted to apply to medical school because the human side of the science really drew me in.  

But my father was the one who really identified it was important to cross-train. And that’s why I decided to take time off to do that master’s degree in health informatics and see if I could figure out how to take two disparate fields and really combine them into one.  

I actually went down to Stanford to become a pediatrician because they have a standalone children’s hospital that’s one of the best in the country. And I still practice pediatrics and see newborns, and it’s a passion for me and part of my identity.

LEE: Well, I’m just endlessly fascinated and impressed with people who can span these two worlds in the way that you’ve done. So now, you know, 2022, in November, ChatGPT gets released to the world, and then, you know, a few months later, GPT-4, and then, of course, in the last two years, so much has happened. But what was your first encounter with what we now know of as generative AI? 

LONGHURST: So I remember when ChatGPT was released, and, you know, some of my computer science-type of nerd friends, we were on text threads, you know, with a lot of mind-blowing emojis. But when it really hit medicine was when I got a call right after Thanksgiving in 2022 from my colleague. He was playing with ChatGPT, and he said to me, Chris, I’ve been feeding it patient questions and you wouldn’t believe the responses. And he emailed some of the examples to me, and my mind was blown.

And so that’s when I became one of the reviewers on the paper that was published in April of 2023 that showed not only could ChatGPT help answer questions from patients in a high-quality way, but it also expressed a tremendous amount of empathy.[1] And in fact, in our review, the clickbait headlines that came out of the paper were that the chatbot was both higher quality and more empathetic than doctors.

But that wasn’t my takeaway at all. In fact, I’ll take my doctors any day and put them against your chatbot if you give them an hour to Google and construct a really long, thoughtful response. To me, part of the takeaway was that this was really an opportunity to improve efficiency and save time. And so I called up our colleagues at Epic. I think it was right around December of 2022. And I said, Sumit, have you seen this? I’d like to share some results with you. And I showed him the data from our paper before we had actually had it published. And he said, “Well, that’s great because we’re working with Peter Lee and the team at Microsoft to integrate GPT into Epic.”  

And so, of course, that’s how we became one of the first two sites in the country to roll out GPT inside our electronic health record to help draft answers to patient questions.  

LEE: And, you know, one thing that’s worth emphasizing in the story that you’ve just told is that there is no other major health system that has been confronting the reality of generative AI longer than UC San Diego Health—and I think largely because of your drive and early adoption.  

And many listeners of this podcast will know what Epic is, but many will not. And so it’s worth saying that Epic is a very important creator of an electronic health records system. And of course, UC San Diego Health uses Epic to store all of the clinical data for its patients.  

And then Sumit is, of course, Sumit Rana, who is president at Epic.  

LONGHURST: So in partnership with Epic, we decided to tackle a really important challenge in healthcare today, which is, particularly since the pandemic and the increase in virtual and telehealth care, our clinicians get more messages than ever from patients. But answering those asynchronous messages is an unreimbursed, noncompensated activity that can often take time after hours—what we call “pajama time”—for our doctors.  

And in truth, you know, health systems that have thought through this, most of the answers are not actually generated by the doctors themselves. Many times, it’s mid-level providers, protocol schedulers, other things, because the questions can be about anything from rescheduling an appointment to a medication refill. They don’t all require doctors.  

When they do, it’s a more complicated question, and sometimes can require a more complicated answer. And in many cases, the clinicians will see a long complex question, and rather than typing an answer, they’ll say, “You know, this is complicated. Why don’t you schedule a visit with me so we can talk about it more?” 

LEE: Yeah, so now you’ve made a decision to contact people at Epic to what … posit the idea that AI might be able to make responding to patient queries easier? Is that the story here?

LONGHURST: That’s exactly right. And Sumit knew well that this is a challenge across many organizations. This is not unique to UC San Diego or Stanford. And there’s been a lot of publications about it. It’s even been in the lay press. So our hypothesis was that using GPT to help draft responses for doctors would save them time, make it easier, and potentially result in higher-quality, more empathetic answers to patients. 

LEE: And so now the thing that I was so impressed with is you actually did a carefully controlled study to try to understand how well does that work. So tell us a little bit first about the results of that study but then how you set it up. 

LONGHURST: Sure. Well, first, I want to acknowledge something you said at the beginning, which is one of my hats is the executive director of the Joan & Irwin Jacobs Center for Health Innovation. And we’re incredibly grateful to the Jacobs for their gift, which has allowed us to not only implement AI as part of hospital operations but also to have resources that other health systems may not have to be able to study outcomes. And so that really enabled what we’re going to talk about here. 

LEE: Right. By the way, one of the things I was personally so fascinated by is, of course, in our book, we speculated that things like after-visit notes to patients, responding to patient queries might be something that happens. And you, at the same time we were writing the book, were actually actively trying to make that real, which is just incredible and for me, and I think my coauthors, pretty affirming. 

LONGHURST: I think you guys were really prescient in your vision. The book is tremendous. I have a signed copy of Peter’s book, and I recommend it for all your listeners. [LAUGHTER]  

LEE: All right, so now what have you found about … 

LONGHURST: Yeah. 

LEE: … generative AI?  

LONGHURST: Yeah. Well, first to understand what we found, you have to understand how we built [the AI inbox response tool]. And so Stanford and UC San Diego really collaborated with Epic on designing what this would look like. So doctor gets that patient message. We feed some information to GPT that’s not only the message but also some information about the patient—their problems and medications and past medical and surgical history and that sort of thing. 

LEE: Is there a privacy concern that patients should be worried about when that happens? 

LONGHURST: Yeah, it’s a really good question. There’s not because we’re operating in partnership with Epic and Microsoft in a HIPAA-compliant cloud. And so that data is not only secure and private, but that’s our top priority, is keeping it that way. 

LEE: Great. 

LONGHURST: So once we feed that into GPT, of course, we very quickly get a draft message that we could send to a patient. But we chose not to just send that message to a patient. So part of our AI governance is keeping a human in the loop. And there’s two buttons that allow that clinician to review the message. One button says Edit draft message, and the other button says Start new blank message. So there’s no button that says just Send now. And that really is illustrative of the approach that we took. The second thing, though, that we chose to do I think is really interesting from a conversation standpoint is that our AI governance, as they were looking at this, said, “You know, AI is new and novel. It can be scary to patients. And if we want to maximize trust with our patients, we should maximize transparency.” And so anytime a clinician uses the button that says Edit draft response, we automatically append something in the message that says, “This message was automatically generated and reviewed and edited by your doctor.” We felt strongly that was the right approach, and we’ve had a lot of positive feedback. 

LEE: And so we’ll want to get into, you know, how good these messages are, whether there are issues with bias or hallucination, but before doing that, you know, on this human in loop, this was another theme in our book. And in fact, we recommended this. But there were other health systems around the country that were also later experimenting with similar ideas. And some have taken different approaches. In fact, as time has gone on, if anything, it seems like it’s become a little bit less clear, this sort of labeling idea. Has your view on this evolved at all over the last two years? 

LONGHURST: First of all, I’m glad that we did it. I think it was the right choice for University of California, and in fact, the other four UC sites are all doing this, as well. There is variability across the organizations that are using this functionality, and as you suggest, there’s tens of thousands of physicians and hundreds of thousands if not millions of patients receiving these messages. And it’s been highlighted a bit in the press.  

I can tell you that talking about our approach to transparency, one of our lawmakers in the state of California heard about this and actually proposed a bill that was signed into legislation by our governor so that effective Jan. 1, any communication with patients that uses AI has to be disclosed with those patients. And so there is some thought that this is perhaps the right approach.  

I don’t think that it’s a perfect approach, though. We’re using AI in more and more ways, and it’s not as if we’re going to be able to disclose every single time that we’re doing it to prioritize, you know, scheduling for the sickest patients or to help operationally on billing or something else. And so I think that there are other ways we need to figure it out. But we have called on national societies and others to try to create some guidelines around this because we should be as transparent as we can with our patients. 

LEE: Obviously, one of the issues—and we highlighted this a lot in our book—is the problem of hallucination. And surely this must be an issue when you’re having AI draft these notes to patients. What have you found? 

LONGHURST: We were worried about that when we rolled it out. And what we found is not only were there very few hallucinations, in some cases, our doctors were learning from the GPT. And I can give you an example. When a patient who had had a visit wrote their doctor afterwards and said, “Doc, I’ve been thinking a lot about what we discussed in quitting smoking marijuana.” And the GPT draft reply said something to the effect of, “That’s great news. Here’s a bunch of evidence on how smoking marijuana can harm your lungs and cause other effects. And by the way, since you live in the state of California, here’s the marijuana quitters helpline.” And the doctor who was sending this called me up to tell me about it. And I said, well, is there a marijuana quitters helpline in the state of California? And he said, “I didn’t know, so I Googled it. And yeah, there is.” And so that’s an example of the GPT actually having more information than, you know, a primary care clinician might have. And so there are cases clearly where the GPT can help us increase the quality. In addition, some of the feedback that we’ve been getting both anecdotally and now measuring is that these draft responses do carry that tone of empathy that Dr. [John] Ayers [2] and I saw in the original manuscript. And we’ve heard from our clinicians that it’s reminding them to be empathetic because you don’t always have that time when you’re hammering out a quick short message, right?  

LEE: You know, I think the thing that we’ve observed, and we’ve discussed this also, is exactly that reminding thing. There might be in the encounter between a doctor and patient, maybe a conversation about, you know, going to a football game for the first time. That could be part of the conversation. But in a busy doctor’s life, when writing a note, you might forget about that. And, of course, an AI has the endless ability to remember that it might be friendly to send well wishes. 

LONGHURST: Exactly right, Peter. In fact, one of the findings in Dr. Ayers’s manuscript that didn’t get as much attention but I think is really important was the difference in length between the responses. So I was one of the putatively blinded reviewers, but as I was looking at the questions and answers, it was really obvious which ones were the chatbot and which ones were the doctors because the chatbot was always, you know, three or four paragraphs and the doctor was three or four sentences, right. It’s about time. And so we saw that in the results of our study.  

LEE: All right, so now let’s get into those results.

LONGHURST: OK. Well, first of all, my hypothesis was that this would help us save time, and I was wrong. It turns out a busy primary care clinician might get about 30 messages a day from patients, and each one of those messages might take about 30 seconds to type a quick response, a two-sentence response, a dot phrase, a macro. Your labs are normal. No need to worry. I’ll call you if anything comes up. After we implemented the AI tool, it still took about 30 seconds per message to respond. But we saw that the responses were two to three times longer on average, and they carried a more empathetic tone. [3] And our physicians told us it decreased cognitive burden, which is not surprising because any of you have written know that it’s much easier to edit somebody else’s copy than it is to face a blank screen, right. That’s why I like to be senior author, not lead author.

And so the tool actually helped quite a bit, but it didn’t help in the ways that we had expected necessarily. There are some other sites that have now found a little bit of time savings, but it’s really nominal overall. The Stanford study (opens in new tab) that was done at the same time—and we actually had some shared coauthors—measured physician burnout using a validated survey, and they saw a decrease in measured physician burnout. And so there are clear advantages to this, and we’re still learning more.

In fact, we’ve now rolled this out not only to all of our physicians, but to all of our nurses who help answer those messages in many different clinics. And one of the things that we’re finding—and Dr. CT Lin at University of Colorado recently published (opens in new tab)—is that this tool might actually help those mid-level providers even more because it’s really good at protocolized responses. I mentioned at the beginning, some of the questions that come to the physicians may be more the edge cases that require a little bit less protocolized kind of answers. And so as we get into academic subspecialties like gynecology oncology, the GPT might not be dishing up a draft message that’s quite as useful. But if you’re a nurse in obstetrics and you’re getting very routine pregnancy questions, it could save a ton of time. And so we’ve rolled this out broadly.  

I want to acknowledge the partnership with Seth Hain and the team at Epic, who’ve just been fantastic. And we’re finding all sorts of new ways to integrate the GPT tools into our electronic health record, as well. 

LEE: Yeah. Certainly the doctors and nurses that I’ve encountered that have access to this feature, they just don’t want to give it up. But it’s so interesting that it actually doesn’t really save time. Is that a problem? Because, of course, you know, there seems to be a workforce shortage in healthcare, a need to lower costs and have greater efficiencies. You know, how do you think about that?

LONGHURST: Great question. There are so many opportunities, as you’ve kind of mentioned. I mean, healthcare is full of waste and inefficiency, and I am super bullish on how these generative AI tools are going to help us reduce some of that inefficiency.  

So everything from revenue cycle to our call centers to operations efficiency, I think, can be positively impacted, and those things make more resources available for clinicians and others. When we think about, you know, saving clinicians time, I don’t think it’s necessarily, sort of, the communicating with patients where you want to save that time actually. I think what we want to do is we want to offload some of those administrative tasks that, you know, take a lot of time for our physicians.  

So we’ve measured “pajama time” in our doctors, and on average, a busy primary care clinician can spend one to two hours after clinic doing things. But only about 15 minutes is answering messages from patients. Actually, the bulk of the time after hours is documenting the notes that are required from those visits, right. And those notes are used for a number of different purposes, not only communicating to the next doctor who sees the patient but also for billing purposes and compliance purposes and medical legal purposes. So another really exciting area is AI scribes. 

LEE: Yeah. And so, you know, we’ll get into scribes and actually other possibilities. I wonder, though, about this empathy issue. Because as computer scientists, we know that you can fall into traps if you anthropomorphize these AI systems or any machine. So in this study, how was that measured, and how real do think that is? 

LONGHURST: So in the study, you’ll see anecdotal or qualitative evidence about empathy. We have a follow-up study that will be published soon where we’ve actually measured empathy using some more quantitative tools, and there is no doubt that the chatbot-generated drafts are coming through with more empathy. And we’ve heard this from a number of our doctors, so it’s not surprising. Here’s one of the more surprising things though. I published a paper last year with Dr. Sally Baxter (opens in new tab), one of our ophthalmologists, and she actually looked at messages with a negative tone. It turns out, not surprisingly, healthcare can be frustrating. And stressed patients can send some pretty nasty messages to their care teams. [LAUGHTER] And you can imagine being a busy, …  

LEE: I’ve done it. [LAUGHS]

LONGHURST: … tired, exhausted clinician, and receiving a bit of a nasty gram from one of your patients can be pretty frustrating. And the GPT is actually really helpful in those instances in helping draft a pretty empathetic response when I think the human instinct would be a pretty nasty one. [LAUGHTER] I should probably use it in my email, Peter. 

LEE: And is the patient experience, the actually lived experience of patients when they receive these notes, are you absolutely convinced and certain that they are also benefiting from this empathetic tone? 

LONGHURST: I am. In fact, in our paper, we also found that the messages going to patients that had been drafted with the AI tool were two to three times longer (opens in new tab) than the messages going to patients that weren’t using the drafts. And so it’s clear there’s more content going and that content is either contributing to a greater sense of empathy and relationship among the patients as well as the clinicians, and/or in some cases, that content may be educating the patients or even reducing the need for follow-up visits.  

LEE: Yeah, so now I think an important thing to share with the audience here is, you know, healthcare, of course, is a very highly regulated industry for good reasons. There are issues of safety and privacy that have to be guarded very, very carefully and thoroughly. And for that reason, clinical studies oftentimes have very carefully developed controls and randomization setups. And so to what extent was that done in this case? Because here, it’s not like you’re testing a new drug. It’s something that’s a little fuzzier, isn’t it?

LONGHURST: Yeah, that’s right, Peter. And credit to the lead author, Dr. Ming Tai-Seale, we actually did randomize. And so that’s unusual in these type of studies. We actually got IRB [institutional review board] exemption to do this as a randomized QI study. And it was a crossover study because all the doctors wanted the functionality. So what we tested was the early adopters versus the late adopters. And we compared at the same time the early adopters to those who weren’t using the functionality and then later the late adopters to the folks that weren’t using the functionality. 

 LEE: And in that type of study, you might also, depending on how the randomization is set up, also have to have doctors some days using it and some days not having access. Did that also happen? 

LONGHURST: We did, but it wasn’t on a day-to-day basis. It was more a month-to-month basis. 

LEE: Uh-huh. And what kind of conversation do you have with a doctor that might be attached to a technology and then be told for the next month you don’t get to use it?  

LONGHURST: [LAUGHS] The good news is because of a doctor’s medical training, they all understood the need for it. And the conversation was sort of, hey, we’re going to need you to stop using that for a month so that we can compare it, but we’ll give it back to you afterwards. 

LEE: [LAUGHS] OK, great. All right. So now we made some other predictions. So we talked about, you know, responding to patients. You briefly mentioned clinical note-taking. We also made guesses about other types of paperwork, you know, filling out prior authorization requests or referral letters, maybe for a doctor to refer to a specialist. We even made some guesses about a second set of eyes on medications, on various treatment options, diagnoses. What of these things have happened and what hasn’t happened, at least in your clinical experience?

LONGHURST: Your guesses were spot on. And I would say almost all of them have already happened and are happening today at UC San Diego and many other health systems. We have a HIPAA-compliant GPT instance that can be used for things like generating patient letters, generating referral letters, even generating patient education with patient-friendly language. And that’s a common use case. The second set of eyes on medications is something that we’re exploring but have not yet rolled out. One of the areas I’m really excited about is reporting. So Johns Hopkins did a study a couple of years ago that showed an average academic medical center our size spends about $5 million annually (opens in new tab) just reporting on quality measures that are regulatory requirements. And that’s about accurate for us. 

We published a paper just last fall showing that large language models could help to pre-populate quality data (opens in new tab) for things like sepsis reporting in a really effective way. It was like 91% accurate. And so that’s a huge time savings and efficiency opportunity. Again, allows us to redeploy those qualities staff. We’re now looking at things like how do we use large language models to review charts for peer review to help ensure ongoing, you know, accuracy and mitigate risk. I’m really passionate about the whole space of using AI to improve quality and patient safety in particular.  

Your readers may be familiar with the famous report in 1999, “To Err is Human (opens in new tab),” that suggests a hundred thousand Americans die on an annual basis from medical errors. And unfortunately the data shows we really haven’t made great progress in 25 years, but these new tools give us the opportunity to impact that in a really meaningful way. This is a turning point in healthcare.

LEE: Yeah, medication errors—actually, all manner of medical errors—I think has been just such a frustrating problem. And, you know, I think this gives us some new hope. Well, let’s look ahead a little bit. And just to be a little bit provocative, you know, one question that I get asked a lot by both patients and clinicians is, you know, “Will AI replace doctors sometime in the future?” What are your thoughts? 

LONGHURST: So the pat response is AI won’t replace doctors, but AI will replace doctors who don’t use AI. And the implication there, of course, is that a doctor using AI will end up being a more effective practitioner than a doctor who doesn’t. And I think that’s absolutely true. From a medical legal standpoint, what is standard of care today and what is standard of care five or 10 years from now will be different. And I think there will be a point where doctors who aren’t using AI regularly, it would almost be unconscionable.  

LEE: Yeah, I think there are already some areas where we’ve seen this happen. My favorite example is with the technology of ultrasound, where if you’re a gynecologist or some part of internal medicine, there are some diagnostic procedures where it would really be malpractice not to use ultrasound. Whereas in the late 1950s, the safety and also the doctor training to read ultrasound images were all called into question. And so let’s look ahead two years from now, five years from now, 10 years from now. And on those three time frames, you know, what do you think—based on the practice of medicine today, what doctors and nurses are doing in clinic every day today—what do you think the biggest differences will be two years from now, five years from now, and 10 years from now? 

LONGHURST: Great question, Peter. So first of all, 10 years from now, I think that patients will be still coming to clinic. Doctors will still be seeing them. Hopefully we’ll have more house calls and care occurring outside the clinic with remote monitoring and things like that. But the most important part of healthcare is the humanism. And so what I’m really excited about is AI helping to restore humanism in medical care. Because we’ve lost some of it over the last 20, 30 years as healthcare has become more corporate.  

So in the next two to five years, some things I expect to see is AI baked into more workflows. So AI scribes are going to become incredibly commonplace. I also think that there are huge opportunities to use those scribes to help reduce errors in diagnosis. So five or seven years from now, I think that when you’re speaking to your physician about your symptoms and other things, the scribe is going to be developing a differential diagnosis and helping recommend not only the right follow-up tests or imaging but even the physical exam findings that the doctor might want to look for in particular to help make a diagnosis.  

Dirty secret in healthcare, Peter, is that 50% of doctors are below average. It’s just math. And I think that the AI can help raise all of our doctors. So it’s like Lake Wobegon. They’re all above average. It has important implications for the workforce as you were saying. Do we need all visits to be with primary care doctors? Will mid-level providers augmented by AI be able to do as great a job as many of our physicians do? I think these are unanswered questions today that need to be explored. And then there was a really stimulating editorial in The New York Times recently by Dr. Eric Topol (opens in new tab), and he was waxing philosophic about a recent study that showed AI could interpret X-rays with 90% accuracy and radiologists actually achieve about 72% accuracy (opens in new tab).

LEE: Right. 

LONGHURST: The study looked at, how did the radiologists do with AI working together? And they got about 74% accuracy. So the doctors didn’t believe the AI. They thought that they were in the right, and the inference that Eric took that I agree with is that rather than always looking for ways to combine the two, we should be thinking about those tasks that are amenable to automation that could be offloaded with AI. So that our physicians are focused on the things that they’re great at, which is not only the humanism in healthcare but a lot of those edge cases we talked about. So let’s take mammogram screening as an example, chest X-ray screening. There’s going to be a point in the next five years where all first reads are being done by AI, and then it’s a subset of those that are positive that need to be reviewed by physicians. And that helps free up radiologists to do a lot of other things that we need them to do. 

LEE: Wow, that is really just such a great vision for the future. And I call some of this the flip, where even patient expectations on the use of technology flips from fear and uncertainty to, you know, you would try to do this without the technology? And I think you just really put a lot of color and detail on that. Well, Chris, thank you so much for this. On that groundbreaking paper from April of 2023, we’ll put a link to it. It’s a really great thing to read. And of course, you’ve published extensively since then. But I can’t thank you enough for just all the great work that you’re doing. It’s really changing medicine. 

[TRANSITION MUSIC]  

LONGHURST: Peter, can’t thank you enough for the opportunity to be here today and the partnership with Microsoft to make this all possible. 

LEE: I always love talking to Chris because he really is a prime example of an important breed of doctor, a doctor who has clinical experience but is also world-class tech geek. [LAUGHS] You know, it’s surprising to me, and pleasantly so, that the traditional gold standard of randomized trials that Chris has employed can be used to assess the viability of generative AI, not just for things like medical diagnosis, but even for seemingly mundane things like writing email notes to patients.  

The other surprise is that the use of AI, at least in the in-basket task, which involves doctors having to respond to emails from patients, doesn’t seem to save much time for doctors, even though the AI is drafting those notes. Doctors seem to love the reduced cognitive burden, and patients seem to appreciate the greater detail and friendliness that AI provides, but it’s not yet a big timesaver. And of course, the biggest surprise out of the conversation with Chris was his celebrated paper back two years ago now on the idea that AI notes are perceived by patients as being more empathetic than notes written by human doctors. Wow.

Let’s go ahead to my conversation with Dr. Sara Murray: 

LEE: Sara, I’m thrilled you’re here. Welcome. 

SARA MURRAY: Thank you so much for having me. 

LEE: You know, you have actually a lot of roles, and I know that’s not so uncommon for people at the leading academic medical institutions. But, you know, I think for our audience, understanding what a chief health AI officer does, an associate professor of clinical medicine—what does it all mean? And so to start, when you talk to someone, say, like your parents, how do you describe your job? You know, how do you spend a typical day at work? 

MURRAY: So first and foremost, I do always introduce myself as a physician because that’s how I identify, that’s how I trained. But in my current role, as the chief health AI officer, I’m really responsible for the vision and strategy for how we use trustworthy AI at scale to solve the biggest problems in our health system. And so I think there’s a couple key important points about that. One is that we have to be very careful that everything we’re doing in healthcare is trustworthy, meaning it’s safe, it’s ethical, it’s doing what we hope it’s doing, and it’s not causing any unexpected harm.  

And then, you know, second, we really want to be doing things that affect, you know, the population at large of the patients we’re taking care of. And so I think if you look historically at what’s happened with AI in healthcare, you’ve seen little studies here and there, but nothing broadly affecting or transforming how we deliver care. And I think now that we’re in this generative AI era, we have the tools to start thinking about how we’re doing that. And so that’s part of my role. 

LEE: And I’m assuming a chief health AI officer is not a role that has been around for a long time. Is this fairly new at UCSF, or has this particular job title been around?

MURRAY: No, it’s a relatively new role, actually. I came into this role about 18 months ago. I am the first chief health AI officer at UCSF, and I actually wrote the paper defining the role (opens in new tab) with Dr. Ashley Beecy, Dr. Chris Longhurst, Dr. Karandeep Singh, and Dr. Bob Wachter where we discuss what is this role in healthcare, why do we actually need it now, and what is this person accountable for. And I think it’s very important that as we roll these technologies out in health systems, we have someone who’s really accountable for thinking about, you know, whether we’re selecting the right tools and whether they’re being used in the right ways to impact our patients.  

LEE: It’s so interesting because I would say in the old days, you know, like five years ago, [LAUGHS] information technology in a hospital or health-system setting might be under the control and responsibility of a chief information officer, a CIO, or an IT, you know, chief. Or if it’s maybe some sort of medical device technology integration, maybe it’s some engineering-type of leader, a chief technology officer. But you’re different, and in fact the role that I think I would credit you with, sort of, making the blueprint for seems different because it’s actually doctors, practicing clinicians, who tend to inhabit these roles. Is there a reason why it’s different that way? Like, a typical CIO is not a clinician.

MURRAY: Yeah, so I report to our CIO. And I think that there’s a recognition that you need a clinician who really understands in practice how the tools can be deployed effectively. So it’s not enough to just understand the technology, but you really have to understand the use cases. And I think when you’re seeing physician chief health AI officers pop up around the country, it’s because they’re people who both understand the technology—not to the level you do obviously—but to some sufficient level and then understand how to use these tools in clinical care and where they can drive value and what the risks are in clinical care and that type of thing. And so I think it’d be hard for it not to be some type of clinician in this role.  

LEE: So I’m going to want to get into, you know, what’s really happening in clinic, but before that, I’ve been asking our guests about their “stages of AI grief,” [LAUGHS] as I like to put it. And for most people, I’ve been talking about the experiences and encounters with machine learning and AI before ChatGPT and then afterwards. And so can you tell us a little bit about, you know, how did you get into AI in the first place and what were your first encounters like? 

MURRAY: Yeah. So I actually started out as a health services researcher, and this was before we had electronic health records [EHR], when we were still writing our notes on carbon copy in the elevators, and a lot of the data we used was actually from claims data. And that was the kind of rich data source at the time, but as you know, that was very limited.  

And so when we went live with our electronic health record, I realized there was this tremendous opportunity to really use rich clinical data for research. And so I initially started collaborating with folks down at Stanford to do machine learning to identify, you know, rare diseases like lupus in the electronic health record but quickly realized there was this real gap in the health system for using data in an actionable way.  

And so I built what was initially our advanced analytics team, grew into our data science team, and is now our health AI team as our ability to use the data in more sophisticated ways evolved. But if we think about, I guess, the pre-generative era and my first encounter with AI or at least AI deployment in healthcare, you know, we initially, gosh, it was probably eight or nine years ago where we got access through our EHR vendor to some initial predictive tools, and these were relatively simple tools, but they were predicting things we care about in healthcare, like who’s not going make it to a clinic visit or how long patients are going stay in the hospital.  

And so there’s a lot of interest in, you know, predicting who might not make it to a clinic visit because we have big access issues with it being difficult for patients to get appointments, and the idea was that if you knew who wouldn’t show, you could actually put someone else in that slot, and it’s called overbooking. And so when we looked at the initial model, it was striking to me how risky it was for vulnerable patient populations because immediately it was obvious that this model was likely to overbook people by race, by body weight, by things that were clearly protected patient characteristics.  

And so we did a lot of work initially with that model and a lot of education around how these tools could be biased. But the risk existed, and as we continued to look at more of these models, we found there were a lot of issues with trustworthiness. You know, there was a length-of-stay prediction model that my team was able to outperform with a pair of dice. And when I talked to other systems about not implementing this model, you know, folks said, but it must be useful a little bit. I was like, actually, you know, if the dice is better, it’s not useful at all. [LAUGHS] 

LEE: Right!  

MURRAY: And so there was very little out there to frame this, but we quickly realized we have to start putting something together because there’s a lot of hype and there’s a lot of hope, but there’s also a lot of risk here. And so that was my pre-generative moment. 

LEE: You know, just before I get to your post-generative moment, you know, this story that you told, I sometimes refer to it as the healthcare IT world’s version of irrational exuberance. Because I think one thing that I’ve learned, and I have to say I’ve been guilty personally as a techie, you look at some of the problems that the world of healthcare faces, and to a techie first encountering this, a lot of it looks like common sense. Of course, we can build a model and predict these things.  

And you sort of don’t understand some of the realities, as you’ve described, that make this complicated. And at the same time, from healthcare professionals, I sometimes think they look at all of this dazzling machine learning magic and also are kind of overly optimistic that it can solve so many problems.  

And it does create this danger, this irrational exuberance, that both sides kind of get into a reinforcing cycle where they’re too quick to adopt technologies without thinking through the implications more carefully. I don’t know if that resonates with you at all. 

MURRAY: Yeah, totally. I think there’s a real educational opportunity here because it’s the “you don’t know what you don’t know” phenomenon. And so I do think there is a lot of work in healthcare to be done around, you know, people understanding the strengths and limitations of these tools because they’re not magic, but they are perceived to be magic.  

And likewise, you know, I think the tech world often doesn’t understand, you know, how healthcare is practiced and doesn’t think through these risks in the same way we do, right. So I know that some of the vulnerable patients who might’ve been overbooked by that algorithm are the people who I most need to see in clinic and are the people who would be, you know, most slighted if that they show up and the other patient shows up and now you have an overworked clinician. But I just think those are stages, you know, further down the pathway of utilization of these algorithms that people don’t think of when they’re initially developing them.  

And so one of the things we actually, you know, require in our AI oversight process is when folks come to the table with a tool, they have to have a plan for how it’s going to be used and operationalized. And a lot of things die right there, honestly, because folks have built a cool tool, but they don’t know who’s going to use it in clinic, who the clinical champions are, how it’ll be acted on, and you can’t really evaluate whether these tools are trustworthy unless you’ve thought through all of that.  

Because you can imagine using the same algorithm in dramatically different ways, right. If you’re using the no-show model to do targeted outreach and send people a free Lyft if they have transportation issues, that’s going to have very different outcomes than overbooking folks.  

LEE: It’s so interesting and I’m going to want to get back to this topic because I think it also speaks to the challenges of how do you integrate technologies into the daily workflow of a clinic. And I know this is something you think about a lot, but let’s get back now to my original question about your AI moments. So now November 2022, ChatGPT happens, and what is your encounter with this new technology? 

MURRAY: Yeah. So I used to be on MedTwitter, or I still am actually; it’s just not as active anymore. But I would say, you know, MedTwitter went crazy after ChatGPT was initially released and it was largely filled with catchy poems and people, you know, having fun …  

LEE: [LAUGHS] Guilty. 

MURRAY: Yeah, exactly. I still use poems. And people having fun trying to make it hallucinate. And so, you know, I went—I was guilty of that, as well—and so one of the things I initially did was I asked it to do something crazy. So I asked it, draft me a letter for a prior authorization request for a drug called Apixaban, which is a blood thinner, to treat insomnia. And if you practice clinical medicine, you know that we would never use a blood thinner to treat insomnia. But it wrote me such a compelling letter that I actually went back to PubMed and I made sure that I wasn’t missing anything, like some unexpected side effect. I wasn’t missing anything and in fact it was hallucination. And so at that moment I said, this is very promising technology, but this is still a party trick.  

LEE: Yeah. 

MURRAY: A few months later, I went and did the exact same prompt, and I got a lecture, instead of a draft, about how it would be unethical [LAUGHTER] and unsafe for me to draft such a request. And so, you know, I realized these tools were rapidly evolving, and the game was just going to be changing very quickly. I think the other thing that, you know, we’ve never seen before is the deployment of a technology at scale like we have with AI scribes.  

So this is a technology that was in its infancy, you know, two years ago, and is now largely a commodity deployed at scale across many health systems. A very short period of time. There’s been no government incentives for people to do this. And so it clearly works well enough to be used in clinics. And I think these tools, you know, like AI scribes, have the opportunity to really undo a lot of the harm that the electronic health record implementations were perceived to have caused.  

LEE: What is a scribe, first off? 

MURRAY: Yeah, so AI scribes or, as we’re now calling them, AI assistants or ambient assistants, are tools that essentially listen to your clinical interaction. We record them with the permission of a patient, with consent, and then they draft a clinical note, and they can also draft other things like the patient instructions. And the idea is those drafts are very helpful to clinicians, and they have to review them and edit them, but it saves a lot of the furious typing that was previously happening during patient encounters. 

LEE: We have been talking also to Chris Longhurst, your colleague at UC San Diego, and, you know, he mentions also the importance of having appropriate billing codes in those notes, which is yet another burden. Of course, when Carey, Zak, and I wrote our book, we predicted that AI scribes would get better and would find wider use because of the improvement in technology. Let me start by asking, do you yourself use an AI scribe? 

MURRAY: So I do not use it yet because I’m an inpatient doctor, and we have deployed them to all ambulatory clinic doctors because that’s where the technology is tried and true. So we’re looking now to deploy it in the inpatient setting, but we’re doing very initial testing. 

LEE: And what are the reasons for not integrating it into the inpatient setting? 

MURRAY: Well, there’s two things actually. Most inpatient documentation work, I would say, is follow-up documentation. And so you’re often taking your prior notes and making small changes to it as you change the care from day to day. And so the tools are just, all of the companies are working on this, but right now they don’t really incorporate your prior documentation or note when they draft your note for today.  

The second reason is that a lot of the decision-making that we do in the inpatient setting is asynchronous with the patient. So we’ll often have a conversation in the morning with the patient in their room, and then I’ll see some labs come back and I’ll make decisions and act on those labs and give the patient a call later and let them know what’s going on. And so it’s not a very succinct encounter, and so the technology is going to have to be a little bit different to work in that case, I think. 

LEE: Right, and so these are distinct workflows from the ambulatory setting, where it is the classic, you’re sitting with a patient in an exam room having an encounter. 

MURRAY: Mm-hmm. Exactly. And all your decisions are made there. And I would say it’s also different from nursing. We’re also looking at deploying these tools to nurses. But a lot of their documentation is in something called flowsheets. They write in columns, you know, specific numbers, and so for them to use these tools, they’d have to start saying to the patient, sounds like your pain is a five. Your blood pressure is 120 over 60. And so those are different workflows they’d have to adopt to use the tools. 

LEE: So you’ve been in the position of having to oversee the integration of AI scribes into UCSF health. From your perspective how were clinical staff actually viewing all of this?  

MURRAY: So I would say clinical staff are largely very excited, receptive, and would like us to move faster. And in fact, I gave a town hall to UCSF, and all of the comments were, when is this coming for APPs [advanced practice providers]? When is this coming for allied health professionals? And so people want this across healthcare. It’s not just doctors. But at the same time, you know, I think there’s a technology adoption curve, and about half of our ambulatory clinicians have signed up and about a third of them are now using the tool. And so we are now doing outreach to figure out who is not using it, why aren’t they using it, and what can we do to increase adoption. Or are there true barriers that we need to help folks overcome? 

LEE: And when you do these things, of course, there are risks. And as you were mentioning several times before, you were really concerned about hallucinations, about trustworthiness. So what were the steps that you took at UCSF to make these integrations happen? 

MURRAY: Yeah, so we have a AI oversight process for all tools that come into our healthcare with AI, regardless of where they’re coming from. So industry tools, internally developed tools, and research tools come through the same process. And we have a committee that is quite multidisciplinary. We have health system leaders, data scientists, bioethicists, researchers, health-equity experts. And through our process, we break down the AI lifecycle to a couple key places where these tools come for committee review. And so for every AI deployment, we expect people to establish performance metrics, fairness metrics, and we help them with figuring out what those things should be.

We were also fortunate to receive a donation to build a AI monitoring platform, which we’re working on now at UCSF. We call it our Impact Monitoring Platform for AI and Clinical Care, IMPACC, and AI scribes is actually our first use case. And so on that platform, we have a metric adjudication process where we’ve established, you know, what do we really care about for our health system executive leaders, what do we really care about for, you know, ensuring safety and trustworthiness, and then, you know, what are our patients going to want to know? Because we want to also be transparent with our patients about the use of these tools. And so we have processes for doing all this work.  

I think the challenge is actually how we scale these processes as more and more tools come through because as you could imagine, a lot of conversation with a lot of stakeholders to figure out what and how we measure things right now. 

LEE: And so there’s so much to get into there, but I actually want to zoom in on the actual experience that doctors, nurses, and patients are having. And, you know, do you find that AI is meeting expectations? Is it making a difference, positive or negative, in people’s lives? And what kinds of potential surprises are people encountering? 

MURRAY: Mm-hmm. So we’re collecting data in a couple of ways. First, we’re surveying clinicians before and after their experience, and we are hearing from folks that they feel like their clinic work is more manageable, that they’re more able to finish their documentation in a timely fashion.  

And then we’re looking at actual metrics that we can extract from the EHR around how long people are spending doing things. And that data is largely aligning with what people are reporting, although the caveat is they’re not saving enough time for us to have them see more patients. And so we’ve been very explicit at UCSF around making it clear that this is a tool to improve experience and not to improve efficiency.  

So we’re not expecting for people to see more patients as a result of using this tool. We want their clinic experience to be more meaningful. But then the other thing that’s interesting that folks share is this tremendous relief of cognitive burden that folks feel when using this tool. So they may have been really efficient before. You know, they could get all their work done. They could type while they were talking to their patients. But they didn’t actually, you know, get to look at their patients eye to eye and have the meaningful conversation that people went into medicine for. And so we’re hearing that, as well.  

And I think one of the things that’s going to be important to us is actually measuring that moving forward. And that is matched by some of the feedback we’re getting from patients. So we have quotes from patients where they’ve said, you know, my doctor is using this new tool and it’s amazing. We’re just having eye-to-eye conversations. Keep using it. So I think that’s really important. 

LEE: I’ve been pushing my own primary care doctor to get into this because I really depend on her. I love her dearly, but we never … I’m always looking at her back as she’s typing at a computer during our encounters. [LAUGHS] 

So, Sara, while we’re talking about efficiency, and at least the early evidence doesn’t show clear efficiency gains, it does actually beg the question about how or why health systems, many of which are financially, you know, not swimming in money, how or why they could adopt these things.  

And then we could also even imagine that there are even more important applications in the future that, you know, might require quite a bit of expense on developers as well as procurers of these things. You know, what’s your point of view on the I guess we would call this the ROI question about AI? 

MURRAY: Mm-hmm. I think this is a really challenging area because return on investment is very important to health systems that are trying to figure out how to spend a limited budget to improve care delivery. And so I think we’ve started to see a lot of small use cases that prove this technology could likely be beneficial.  

So there are use cases that you may have heard of from Dr. Longhurst around drafting responses to patient messages, for example, where we’ve seen that this technology is helpful but doesn’t get us all the way there. And that’s because these technologies are actually quite expensive. And when you want to process large amounts of data, that’s called tokens, and tokens cost money.  

And so I think one of the challenges when we envision the future of healthcare, we’re not really envisioning the expense of querying the entire medical record through a large language model. And we’re going to have to build systems from a technology standpoint that can do that work in a more affordable way for us to be able to deliver really high-value use cases to clinicians that involve processing that.  

And so those are use cases like summarizing large parts of the patient’s medical record, providing really meaningful clinical decision support that takes into account the patient’s entire medical history. We haven’t seen those types of use cases really come into being yet, largely because, you know, they’re technically a bit more complex to do well and they’re expensive, but they’re completely feasible. 

LEE: Yeah. You know, what you’re saying really resonates so strongly from the tech industry’s perspective. You know, one way that that problem manifests itself is shareholders in big tech companies like ours more or less expect … they’re paying a high premium—a high multiple on the share price—because they’re expecting our revenues to grow at very spectacular rates, double digit rates. But that isn’t obviously compatible with how healthcare works and the healthcare business works. It doesn’t grow, you know, at 30% year over year or anything like that.

And so how to make these things financially make sense for all comers. And it’s sort of part and parcel also with the problem that sometimes efficiency gains in healthcare just translate into heavier caseloads for doctors, which isn’t obviously the best outcome either. And so in a way, I think it’s another aspect of the work on impact and trustworthiness when we think about technology, at all, in healthcare.  

MURRAY: Mm-hmm. I think that’s right. And I think, you know, if you look at the difference between the AI scribe market and the rest of the summarization work that’s largely happening within the electronic health record, in the AI scribe market, you have a lot of independent companies, and they all are competing to be the best. And so because of that, we’re seeing the technology get more efficient, cheaper. There’s just a lot of investment in that space.  

Whereas like with the electronic health record providers, they’re also invested in really providing us with these tools, but it’s not their main priority. They’re delivering an entire electronic health record, and they also have to do it in a way that is affordable for, you know, all kinds of health systems, big UCSF health systems, smaller settings, and so there’s a real tension, I think, between delivering good-enough tools and truly transformative tools.  

LEE: So I want to go back for a minute to this idea of cognitive burden that you described. When we talk about cognitive burden, it’s often in the context of paperwork, right. There are maybe referral letters, after-visit notes, all of these things. How do you see these AI tools progressing with respect to that stream of different administrative tasks? 

MURRAY: These tools are going to continue to be optimized to do more and more tasks for us. So with AI scribes, for example, you know, we’re starting to look at whether it can draft the billing and coding information for the clinician, which is a tedious task with many clicks.  

These tools are poised to start pending orders based on the conversation. Again, a tedious task. All of this with clinician oversight. But I think as we move from them being AI scribes to AI assistants, it’s going to be like a helper on the side for clinicians doing more and more work so they can really focus on the conversations, the shared decision-making, and the reason they went into medicine really. 

LEE: Yeah, let me, since you mentioned AI assistants and that’s such an interesting word and it does connect with something that was apparent to us even, you know, as we were writing the book, which is this phenomenon that these AI systems might make mistakes.  

They might be guilty of making biased decisions or showing bias, and yet they at the same time seem incredibly effective at spotting other people’s mistakes or other people’s biased decisions. And so is there a point where these AI scribes do become AI assistants, that they’re sort of looking over a doctor’s shoulder and saying, “Hey, did you think about something else?” or “Hey, you know, maybe you’re wrong about a certain diagnosis?” 

MURRAY: Mm-hmm. I mean, absolutely. You’re just really talking about combining technologies that already exist into a more streamlined clinical care experience, right. So you can—and I already do this when I’m on rounds—I’ll kind of give the case to ChatGPT if it’s a complex case, and I’ll say, “Here’s how I’m thinking about it; are there other things?” And it’ll give me additional ideas that are sometimes useful and sometimes not but often useful, and I’ll integrate them into my conversation about the patient.  

I think all of these companies are thinking about that. You know, how do we integrate more clinical decision-making into the process? I think it’s just, you know, healthcare is always a little bit behind the technology industry in general, to say the least. And so it’s kind of one step at a time, and all of these use cases need a lot of validation. There’s regulatory issues, and so I think it’s going to take time for us to get there. 

LEE: Should I be impressed or concerned that the chief health AI officer at UC San Francisco Health is using ChatGPT off label? 

MURRAY: [LAUGHS] Well, I actually, every time I go on service, I encourage my residents to use it because I think we need to learn how to use these technologies. And, you know, when our medical education leaders start thinking about how do we teach students to use these, we don’t know how to teach students to use them if we’re not using them ourselves, right. And so I’ve learned a lot about what I perceive the strengths and limitations of the tools are.  

And I think … but you know, one of the things that we’ve learned is—and you’ve written about this in your book—but the prompting really matters. And so I had a resident ask it for a differential for abnormal liver tests. But in asking for that differential, there is a key important blood finding, something called eosinophilia. It’s a type of blood cell that was mildly, mildly elevated, and they didn’t know it. So they didn’t give it in the prompt, and as a result, they didn’t get the right differential, but it wasn’t actually ChatGPT’s fault. It just didn’t get the right information because the trainee didn’t recognize the right information. And so I think there’s a lot to learn as we practice using these tools clinically. So I’m not ashamed of it. [LAUGHS] 

LEE: [LAUGHS] Yeah. Well, in fact, I think my coauthor Carey Goldberg would find what you said really validating because in our book, she actually wrote this fictional account of what it might be like in the future. And this medical resident was also using an AI chatbot off label for pretty much the same kinds of purposes. And it’s these kinds of things that, you know, it seems like might be coming next. 

MURRAY: I mean, medicine, the practice of medicine, is a very imperfect science, and so, you know, when we have a difficult case, I might sit in the workroom with my colleagues and run it by people. And everyone has different thoughts and opinions on, you know, things I should check for. And so I think this is just one other resource where you can kind of run cases, obviously, just reviewing all of the outputs yourself. 

LEE: All right, so we’re running short on time and so I want to be a little provocative at the end here. And since we’ve gotten into AI assistants, two questions: First off, do we get to a point in the near future when it would be unthinkable and maybe even bordering on malpractice for a doctor not to use AI assistants in his or her daily work? 

MURRAY: So it’s possible that we see that in the future. We don’t see it right now. And that’s part of the reason we don’t force this on people. So we see AI scribes or AI assistants as a tool we offer to people to improve their daily work because we don’t have sufficient data that the outcomes are markedly better from using these tools.  

I think there is a future where specific, you know, tools do actually improve outcomes. And then their use should be incentivized either through, you know, CMS [Centers for Medicare & Medicaid Services] or other systems to ensure that, you know, we’re delivering standard of care. But we’re not yet at the place where any of these tools are standard of care, which means they should be used to practice good medicine. 

LEE: And I think I would say that it’s the work of people like you that would make it possible for these things to become standard of care. And so now, final provocation. It must have crossed your mind through all of this, the possibility that AI might replace doctors in some ways. What are your thoughts? 

MURRAY: I think we’re a long way from that happening, honestly. And I think even when I talk to my colleagues in radiology about this, where I perceive, as an internist, they might be the most replaceable, there’s a million reasons why that’s not the case. And so I think these tools are going to augment our work. They’re going to help us streamline access for patients. They’re going to maybe change what clinicians have to do, but I don’t think they’re going fully replace doctors. There’s just too much complexity and nuance in providing clinical care for these tools to do that work fully. 

LEE: Yeah, I think you’re right. And actually, you know, I think there’s plenty of evidence because in the history of modern medicine, we actually haven’t seen technology replace human doctors. Maybe you could say that we don’t use barbers for bloodletting anymore because of technology. But I think, as you say, we’re at least a long ways away.  

MURRAY: Yeah. 

LEE: Sara, this has been just a great conversation. And thank you for the great work that you’re doing, you know, and for being so open with us on your personal use of AI, but also how you see the adoption of AI in our health system. 

[TRANSITION MUSIC] 

MURRAY: Thank you, it was really great talking with you. 

LEE: I get so much out of talking to Sara. Every time, she manages to get me refocused on two things: the quality of the user experience and the importance of trust in any new technology that is brought into the clinic. I felt like there were several good takeaways from the conversation. One is that she really validated some predictions that Carey, Zak, and I made in our book, first and foremost, that automated note taking would be a highly desirable and practical reality. The other validation is Sara revealing that even she uses ChatGPT as a daily assistant in her clinical work, something that we guessed would happen in the book, but we weren’t really sure since health systems oftentimes are very locked down when it comes to the use of technological tools.  

And of course, maybe the biggest thing about Sara’s work is her role in defining a new type of job in healthcare, the health AI officer. This is something that Carey, Zak, and I didn’t see coming at all, but in retrospect, makes all the sense in the world. Taken together, these two conversations really showed that we were on the right track in the book. AI has made its way into day-to-day life and work in the clinic, and both doctors and patients seem to be appreciating it.  

[MUSIC TRANSITIONS TO THEME] 

I’d like to extend another big thank you to Chris and Sara for joining me on the show and sharing their insights. And to our listeners, thank you for coming along for the ride. We have some really great conversations planned for the coming episodes. We’ll delve into how patients are using generative AI for their own healthcare, the hype and reality of AI drug discovery, and more. We hope you’ll continue to tune in. Until next time. 

[MUSIC FADES] 

[1] The paper, “Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum (opens in new tab),” presented the findings of a cross-sectional study that compared doctor responses to patient questions posted to the public forum r/AskDocs on Reddit to responses to the same questions generated by ChatGPT.

[2] Dr. John Ayers is corresponding author of the April 2023 paper “Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.” (opens in new tab)

[3] These findings are detailed in the paper “AI-Generated Draft Replies Integrated Into Health Records and Physicians’ Electronic Communication,” (opens in new tab) published in JAMA Network Open.


The post The reality of generative AI in the clinic appeared first on Microsoft Research.

]]>
The AI Revolution in Medicine, Revisited: An Introduction http://approjects.co.za/?big=en-us/research/podcast/the-ai-revolution-in-medicine-revisited-an-introduction/ Thu, 06 Mar 2025 14:00:00 +0000 http://approjects.co.za/?big=en-us/research/?p=1132722 Host Peter Lee, Microsoft Research president, discusses the motivation behind the new series and the GPT-4 encounter that helped him view the tech not only as a potential tool for improving healthcare but a chance to reexamine what it means to care for people.

The post The AI Revolution in Medicine, Revisited: An Introduction appeared first on Microsoft Research.

]]>

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this introduction to the series, Lee talks about his early encounters with GPT-4, when the AI model was still in secret development with OpenAI, and the range of emotions he cycled through as he came to understand the new technology better. The emergence of generative AI has created a “new world,” Lee says, one he is eager to investigate with the aim of discovering the technology’s impact so far and what it means for the future of healthcare and medicine.

Transcript

[MUSIC]

PETER LEE: This is The AI Revolution in Medicine, Revisited. I’m Peter Lee, president of Microsoft Research, and I’m pretty excited to introduce this series of conversations as part of the Microsoft Research Podcast.

About two years ago, with Carey Goldberg and Zak Kohane, we wrote a book, The AI Revolution in Medicine. This was a book that was intended to educate the world of healthcare and the world of medical research about this new thing that was emerging. This idea of generative AI. And we wrote the book in secret. In fact, the whole existence of what we now know of as OpenAI’s GPT-4 AI model hadn’t been publicly disclosed or revealed to the world. And so when we were working on this book, we had to make some guesses. What is this going to mean for healthcare? If you’re a doctor or a nurse, in what ways will AI impact your work? If you’re a patient, in what ways could AI change your experience as you try to navigate a complex healthcare system?

And so now it’s been about two years. Two years hence, what did we get right? What did we get wrong? What things have come along much faster than we ever would have dreamed of? What did we miss? And what things have turned out to be much harder than we ever could have realized? And so this series of conversations is going to talk to people in the real world. We’ll delve into exactly what’s happening in the clinic, the patient experience, how people are thinking about safety and regulatory matters, and what this all means for discovery and advancements of medical science. And even then, we’ll have guests that will allow us to look into the future—the AI advances that are happening now and what is going to happen next.

[MUSIC TRANSITIONS TO SERIES THEME]

[MUSIC FADES]

So now, let me just take a step back here to talk about this book project. And I’d like to just read the first couple of sentences in Chapter 1, and Chapter 1 is entitled “First Contact.” And it starts with a quote. Quote, “I think that Zak and his mother deserve better than that,” unquote. “I was being scolded. And while I’ve been scolded plenty in my life, for the first time it wasn’t a person scolding me; it was an artificial intelligence system.” So that’s how we started this book, and I wanted to read that because, at least for me, it takes me back to the kind of awe and wonderment in those early days when in secret development, we had access from OpenAI to what we now know of as GPT-4.

And what was that quote about? Well, after getting access to GPT-4, I became very interested in what this might mean for healthcare. But I, not being a doctor, knew I needed help. So I had reached out to a good colleague of mine who is a doctor, a pediatric endocrinologist, and head of the bioinformatics department at Harvard Medical School, Dr. Isaac “Zak” Kohane. And I sought his help. And in our back-and-forth discussions, one of the things that Zak shared with me was an article that he wrote for a magazine where he talked about his use of machine learning in the care of his 90-year-old mother, his 90-year-old mother, who—like many 90-year-old people—was having some health issues.

And this article was very interesting. It really went into some detail about not only the machine learning technology that Zak had created in order to help manage his mother’s health but also the kind of emotional burden of doing this and in what ways technology was helping Zak cope with that. And so as I read that article, it touched me because at that time, I was struggling in a very similar way with my own father, who was at that time 89 years old and was also suffering from some very significant health issues. And, like Zak, I was feeling some pangs of guilt because my father was living in Southern California; I was way up in the Pacific Northwest, you know, just feeling guilty not being there, present for him, through his struggles. And reading that article a thought that occurred to me was, I wonder if in the future, AI could pretend to be me so that my father could always have a version of me to talk to. And I also had the thought in the other direction. Could AI someday capture enough of my father so that when and if he passes, I always have some memory of my father that I could interact with? A strange and bizarre thought, I admit, but a natural one, I think, for any human being that’s encountering this amazing AI technology for the first time. And so I ran an experiment. I used GPT-4 to read Zak’s article and then posed the question to GPT-4, “Based on this article, could you pretend to be Zak? I’ll pretend to be Zak’s mother, and let’s test whether it’s possible to have a mother-son conversation.”

To my surprise, GPT-4’s response at that time was to scold me, basically saying that this is wrong; that this has a lot of dangers and risks. You know, what if Zak’s mother really needs the real Zak. And in those early days of this encounter with AI, that was incredibly startling. It just really forces you to reexamine yourself, and it kicked off our writing in the book as really not only being about a technology that could help lead to better diagnoses, help reduce medical errors, reduce the amount of paperwork and clerical burden that doctors go through, could help demystify and help patients navigate a healthcare system, but it could actually be a technology that forces people to reexamine their relationships and reexamine what it really means for people to take care of other people.

And since then, of course, I’ve come to learn that many people have had similar experiences in their first encounters with AI. And in fact, I’ve come to think of this as, somewhat tongue in cheek, the nine stages of AI grief. And they actually relate to what we’ll try to address in this new series of conversations.

For me, the first time that Greg Brockman and Sam Altman presented what we now know of as OpenAI’s GPT-4 to me, they made some claims about what it could do. And my first reaction was one of skepticism, and it seemed that the claims that were being made just couldn’t be true. Then that, kind of, passed into, I would say, a period of annoyance because I started to see my colleagues here in Microsoft Research start to show some amazement about the technology. I actually was annoyed because I felt they were being duped by this technology. So that’s the second phase. And then, the third phase was concern and maybe even a little bit of frustration because it became clear that, as a company here at Microsoft, we were on the verge of making a big bet on this new technology. And that was concerning to me because of my fundamental skepticism. But then I got my hands on the technology myself. And that enters into a fourth stage, of amazement. You start to encounter things that just are fundamentally amazing. This leads to a period of intensity because I immediately surmised that, wow, this could really change everything and in very few areas other than healthcare would be more important areas of change. And that is stage five, a period of serious intensity where you’re just losing sleep and working so hard to try to imagine what this all could mean. Running as many experiments as you can; trying to lean on as much real expertise as possible. You then lead from there into a period of what I call chagrin because as amazing as the technology is, actually understanding how to harness it in real life is not easy. You finally get into this stage of what I would call enlightenment. [MUSIC] And I won’t claim to be enlightened. But it is, sort of, a combination of acceptance that we are in a new world today, that things are happening for real, and that there’s, sort of, no turning back. And at that point, I think we can really get down to work. And so as we think about really the ultimate purpose of this series of conversations that we’re about to have, it’s really to help people get to that stage of enlightenment, to really, kind of, roll up our sleeves, to sit down and think through all of the best knowledge and experience that we’ve gathered over the last two years, and chart the future of this AI revolution in medicine.

[MUSIC TRANSITIONS TO SERIES THEME]

Let’s get going.

[MUSIC FADES]

The post The AI Revolution in Medicine, Revisited: An Introduction appeared first on Microsoft Research.

]]>