{"id":946080,"date":"2023-06-13T11:00:00","date_gmt":"2023-06-13T18:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=946080"},"modified":"2023-06-13T15:48:24","modified_gmt":"2023-06-13T22:48:24","slug":"accounting-for-past-imaging-studies-enhancing-radiology-ai-and-reporting","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/accounting-for-past-imaging-studies-enhancing-radiology-ai-and-reporting\/","title":{"rendered":"Accounting for past imaging studies: Enhancing radiology AI and reporting"},"content":{"rendered":"\n
The use of self-supervision from image-text pairs has been a key enabler in the development of scalable and flexible vision-language AI models in not only general domains but also in biomedical domains such as radiology. The goal in the radiology setting is to produce rich training signals without requiring manual labels so the models can learn to accurately recognize and locate findings in the images and relate them to content in radiology reports.<\/p>\n\n\n\n
Radiologists use radiology reports to describe imaging findings and offer a clinical diagnosis or a range of possible diagnoses, all of which can be influenced by considering the findings on previous imaging studies. In fact, comparisons with previous images are crucial for radiologists to make informed decisions. These comparisons can provide valuable context for determining whether a condition is a new concern or improving, deteriorating, or stable if an existing condition and can inform more appropriate treatment recommendations. Despite the importance of comparisons, current AI solutions for radiology often fall short in aligning images with report data because of the lack of access to prior scans. Current AI solutions also typically fail to account for the chronological progression of disease or imaging findings often present in biomedical datasets. This can lead to ambiguity in the model training process and can be risky in downstream applications such as automated report generation, where models may make up temporal content without access to past medical scans. In short, this limits the real-world applicability of such AI models to empower caregivers and augment existing workflows.<\/p>\n\n\n\n
In our previous work<\/a>, we demonstrated that multimodal self-supervised learning of radiology images and reports can yield significant performance improvement in downstream applications of machine learning models, such as detecting the presence of medical conditions and localizing these findings within the images. In our latest study, which is being presented at the 2023 IEEE\/CVF Computer Vision and Pattern Recognition Conference (CVPR)<\/a>, we propose BioViL-T<\/a><\/em>, a self-supervised training framework that further increases the data efficiency of this learning paradigm by leveraging the temporal structure present in biomedical datasets. This approach enables the incorporation of temporal information and has the potential to perform complementary self-supervision without the need for additional data, resulting in improved predictive performance.<\/p>\n\n\n\n Our proposed approach can handle missing or spatially misaligned images and can potentially scale to process a large number of prior images. By leveraging the existing temporal structure available in datasets, BioViL-T <\/em>achieves state-of-the-art results on several downstream benchmarks. We’ve made both our models (opens in new tab)<\/span><\/a> and source code (opens in new tab)<\/span><\/a> open source, allowing for a comprehensive exploration and validation of the results discussed in our study. We\u2019ve also released a new multimodal temporal benchmark dataset, MS-CXR-T (opens in new tab)<\/span><\/a>, to support further research into longitudinal modeling of medical images and text data.<\/p>\n\n\n\n\t \n\t\tMicrosoft research podcast<\/span>\n\t<\/p>\n\t\n\t