{"id":1148874,"date":"2025-08-29T13:53:46","date_gmt":"2025-08-29T20:53:46","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=1148874"},"modified":"2025-08-29T13:53:48","modified_gmt":"2025-08-29T20:53:48","slug":"bridging-gaps-in-ophthalmology-education-through-large-language-models","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/bridging-gaps-in-ophthalmology-education-through-large-language-models\/","title":{"rendered":"Bridging Gaps in Ophthalmology Education Through Large Language Models"},"content":{"rendered":"
\n
Purpose<\/h5>\n
To assess the performance of general-domain large language models (LLMs), particularly OpenAI\u2019s Generative Pre-trained Transformer (GPT) models, within the American Academy of Ophthalmology (AAO) Self-Assessment Program, which is based on AAO\u2019s Basic and Clinical Science Course.<\/div>\n
<\/div>\n<\/div>\n
\n
Methods<\/h5>\n
We input 3357 questions into GPT-4o, GPT-4-Turbo, o1 and o3-mini via Microsoft\u2019s Azure OpenAI Service using zero-shot and chain-of-thought (CoT) prompting. Questions with images were analyzed using the multimodal version of GPT-4o and GPT-4.1. The performance of the LLMs was compared to 1371 unique residents who had previously participated in the program. Additionally, we compared the performance on 1399 questions, including information on 3 question types: recall, interpretation, and decision-making or clinical management. Average accuracy rates were used to evaluate performance and compare statistical significance across categories.<\/div>\n
<\/div>\n<\/div>\n
\n
Results<\/h5>\n
o1 (CoT) was the most accurate model (95% confidence interval [CI]: 90.3%\u201392.1%) with performance ranging from 95.17% (general medicine) to 86.9% (cornea) and 91.1% accuracy on a synthesized sample test. It also outperformed residents in recall-type, interpretation-type, and decision-making or clinical management questions (95.7%, 85.3%, and 90.8%, respectively,\u00a0P<\/em>\u00a0< 0.001). Third-year residents were more accurate than first-year or second-year residents (78.2%, 68.3%, 74.9%, respectively). On multimodal inputs, adding images improved the model\u2019s accuracy but all models still underperformed compared to residents.<\/div>\n
<\/div>\n<\/div>\n
\n
Conclusions<\/h5>\n
The accuracy of the LLMs models continues to improve, with o1 (CoT) showing the highest overall performance. Multimodal inputs can enhance model accuracy, but current models still need improvement. LLMs shows great potential in democratizing access to high-quality medical knowledge.<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"

Purpose To assess the performance of general-domain large language models (LLMs), particularly OpenAI\u2019s Generative Pre-trained Transformer (GPT) models, within the American Academy of Ophthalmology (AAO) Self-Assessment Program, which is based on AAO\u2019s Basic and Clinical Science Course. Methods We input 3357 questions into GPT-4o, GPT-4-Turbo, o1 and o3-mini via Microsoft\u2019s Azure OpenAI Service using zero-shot […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"AJO International","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2025-8-23","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":null,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13553],"msr-publication-type":[193715],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[269148,269142],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1148874","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-include-in-river"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2025-8-23","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"AJO International","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S295025352500070X","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S295025352500070X\/pdfft?md5=5db9b25bc7205725827269b046fb7326&pid=1-s2.0-S295025352500070X-main.pdf","label_id":"243132","label":0},{"type":"doi","viewUrl":"false","id":"false","title":"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S295025352500070X\/pdfft?md5=5db9b25bc7205725827269b046fb7326&pid=1-s2.0-S295025352500070X-main.pdf","label_id":"243106","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Shahrzad Gholami","user_id":39757,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shahrzad Gholami"},{"type":"text","value":"Beth Wilson","user_id":0,"rest_url":false},{"type":"text","value":"Sarah Page","user_id":0,"rest_url":false},{"type":"text","value":"Daniel B. Mummert","user_id":0,"rest_url":false},{"type":"text","value":"Joseph Carr","user_id":0,"rest_url":false},{"type":"text","value":"Robert R. McNabb","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Rahul Dodhia","user_id":41401,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Rahul Dodhia"},{"type":"user_nicename","value":"Juan M. Lavista Ferres","user_id":39552,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Juan M. Lavista Ferres"},{"type":"user_nicename","value":"Bill Weeks","user_id":39582,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Bill Weeks"},{"type":"text","value":"Dale E. Fajardo","user_id":0,"rest_url":false},{"type":"text","value":"Dale E. Fajardo","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[696544],"msr_project":[778522],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"article","related_content":{"projects":[{"ID":778522,"post_title":"AI for Health","post_name":"ai-for-health","post_type":"msr-project","post_date":"2023-05-16 14:26:13","post_modified":"2024-10-14 15:42:21","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ai-for-health\/","post_excerpt":"AI for Health is a philanthropic program launched by Microsoft, which aims to support nonprofits, researchers, and organizations working on global health challenges. The program provides access to artificial intelligence (AI) technology and expertise in three main areas: population health, imaging analytics, genomics & proteomics.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/778522"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1148874","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1148874\/revisions"}],"predecessor-version":[{"id":1148875,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1148874\/revisions\/1148875"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1148874"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1148874"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1148874"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1148874"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=1148874"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1148874"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1148874"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1148874"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1148874"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1148874"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1148874"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1148874"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1148874"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}