{"id":985488,"date":"2023-11-27T09:00:00","date_gmt":"2023-11-27T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=985488"},"modified":"2023-11-29T08:18:22","modified_gmt":"2023-11-29T16:18:22","slug":"gpt-4s-potential-in-shaping-the-future-of-radiology","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/gpt-4s-potential-in-shaping-the-future-of-radiology\/","title":{"rendered":"GPT-4’s potential in shaping the future of radiology"},"content":{"rendered":"\n
This research paper is being presented at the <\/em><\/strong>2023 Conference on Empirical Methods in Natural Language Processing<\/em><\/strong> (opens in new tab)<\/span><\/a> (EMNLP 2023), the premier conference on natural language processing and artificial intelligence.<\/em><\/strong><\/p>\n\n\n\n In recent years, AI has been increasingly integrated into healthcare, bringing about new areas of focus and priority, such as diagnostics, treatment planning, patient engagement. While AI\u2019s contribution in certain fields like image analysis and drug interaction is widely recognized, its potential in natural language tasks with these newer areas presents an intriguing research opportunity. <\/p>\n\n\n\n One notable advancement in this area involves GPT-4’s impressive performance (opens in new tab)<\/span><\/a> on medical competency exams and benchmark datasets. GPT-4 has also demonstrated potential utility (opens in new tab)<\/span><\/a> in medical consultations, providing a promising outlook for healthcare innovation.<\/p>\n\n\n\n Our paper, \u201cExploring the Boundaries of GPT-4 in Radiology (opens in new tab)<\/span><\/a>,\u201d which we are presenting at EMNLP 2023 (opens in new tab)<\/span><\/a>, further explores GPT-4\u2019s potential in healthcare, focusing on its abilities and limitations in radiology\u2014a field that is crucial in disease diagnosis and treatment through imaging technologies like x-rays, computed tomography (CT) and magnetic resonance imaging (MRI). We collaborated with our colleagues at Nuance (opens in new tab)<\/span><\/a>, a Microsoft company, whose solution, PowerScribe, is used by more than 80 percent of US radiologists. Together, we aimed to better understand technology\u2019s impact on radiologists\u2019 workflow.<\/p>\n\n\n\n Our research included a comprehensive evaluation and error analysis framework to rigorously assess GPT-4\u2019s ability to process radiology reports, including common language understanding and generation tasks in radiology, such as disease classification and findings summarization. This framework was developed in collaboration with a board-certified radiologist to tackle more intricate and challenging real-world scenarios in radiology and move beyond mere metric scores.<\/p>\n\n\n\n We also explored various effective zero-, few-shot, and chain-of-thought (CoT) prompting techniques for GPT-4 across different radiology tasks and experimented with approaches to improve the reliability of GPT-4 outputs. For each task, GPT-4 performance was benchmarked against prior GPT-3.5 models and respective state-of-the-art radiology models. <\/p>\n\n\n\n We found that GPT-4 demonstrates new state-of-the-art performance in some tasks, achieving about a 10-percent absolute improvement over existing models, as shown in Table 1. Surprisingly, we found radiology report summaries generated by GPT-4 to be comparable and, in some cases, even preferred over those written by experienced radiologists, with one example illustrated in Table 2.<\/p>\n\n\n\n Another encouraging prospect for GPT-4 is its ability to automatically structure radiology reports, as schematically illustrated in Figure 1. These reports, based on a radiologist\u2019s interpretation of medical images like x-rays and include patients\u2019 clinical history, are often complex and unstructured, making them difficult to interpret. Research shows<\/a> that structuring these reports can improve standardization and consistency in disease descriptions, making them easier to interpret by other healthcare providers and more easily searchable for research and quality improvement initiatives. Additionally, using GPT-4 to structure and standardize radiology reports can further support efforts to augment real-world data (RWD) and its use for real-world evidence<\/a> (RWE). This can complement more robust and comprehensive clinical trials and, in turn, accelerate the application of research findings into clinical practice.<\/p>\n\n\n\n<\/figure>\n\n\n\n
\n\t\t
Progressing radiology AI for real problems<\/h2>\n\n\n\n