{"id":477507,"date":"2018-04-23T12:52:00","date_gmt":"2018-04-23T19:52:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=477507"},"modified":"2018-04-24T12:53:47","modified_gmt":"2018-04-24T19:53:47","slug":"hearing-believing-researchers-innovation-provides-richer-web-browsing-experience-people-blind","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/hearing-believing-researchers-innovation-provides-richer-web-browsing-experience-people-blind\/","title":{"rendered":"HEARING IS BELIEVING \u2013 Researchers\u2019 innovation provides a richer web-browsing experience for people who are blind"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-482397 size-full aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_BHeader_04_2018_1000x400.jpg\" alt=\"\" width=\"1000\" height=\"400\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_BHeader_04_2018_1000x400.jpg 1000w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_BHeader_04_2018_1000x400-300x120.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_BHeader_04_2018_1000x400-768x307.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>Imagine for a moment that you are blind and are navigating the web using a screen reader to hear websites rather than see them. Imagine that the article you have navigated to includes images. To understand the content and significance, you are relying on your screen reader to narrate the alt text associated with each image, a textual description that should be provided by the web page author.<\/p>\n<p>Now imagine sitting there and hearing the following description of an image:<br \/>\n\u201cslash h 3 f s 0 x u d f 3 0 l 0 6 j f t k a h dot jpeg image\u201d<\/p>\n<p>Unfortunately, low quality alt texts (such as using a file fame rather than a caption) or completely absent alt texts are quite common, resulting in a poor browsing experience for people who rely on screen reader technology. A team of researchers at Microsoft Research decided to do something about this \u2013 address the issue of missing and poor-quality alt text online \u2013 using existing technology and a bit of out-of-the-box thinking.<\/p>\n<p>The resulting innovation is Caption Crawler, a prototype browser plugin that allows screen reader users to automatically replace bad or missing descriptions of images on their favorite websites with image captions from other pages that have the same image. The researchers found that this technique can retrieve captions for about 13 percent of images that previously had no alt text at all on popular websites, with even better performance (around 25 percent coverage) on sites in categories such as e-commerce that use commonly-replicated image. Caption Crawler can handle multiple captions, queueing up the results in order of quality, and these descriptions are loaded into the browser in the background in real time. The user is then able to toggle to the next reverse-searched caption in the queue using a simple keystroke.<\/p>\n<blockquote><p><strong>\u201cTechnology can be used very effectively to help people but often what happens is we focus on ourselves. Sometimes people get ignored in that process. A lot of our passion is in trying to be more inclusive and in broadening the scope of who benefits from technology \u2013 and why.\u201d \u2013 Ed Cutrell, Principal Researcher<\/strong><\/p><\/blockquote>\n<p>Caption Crawler must determine how to rank alt texts in the event that multiple alternatives for the same image are discovered online. Through carefully designed questionnaires given to people with and without vision, the team discovered that for any given image the longest caption was overwhelmingly identified by both groups as the best. This allowed the team to design the plugin to queue the resulting alternative caption results according to likely quality. A user study revealed that providing a queue of alternative possibilities was valued by participants who are blind as a way for them to not only learn more about an image, but as a way of having more confidence in the accuracy of the captions.<\/p>\n<p>\u201cWhat we were trying to do was find a way that would not necessarily require more effort on the part of website authors that were not doing a good job of producing high quality alt text anyway, and seeing if there are other places on the web where they have done that and where we could leverage that to backfill the experience,\u201d explained <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cutrell\/\">Dr. Ed Cutrell<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Principal Researcher at Microsoft Research in Redmond, Washington.<\/p>\n<p>Cutrell and his fellow researchers, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/merrie\/\">Dr. Meredith Ringel Morris<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> of Microsoft Research and intern Darren Guinness (a doctoral student at the University of Colorado Boulder) share a deep appreciation for the unique challenges faced by screen reader users. All three have been working in the area of accessible technology for some time. This passion resonates personally through their shared desire to work on behalf of folks whom they feel sometimes are forgotten by tech.<\/p>\n<p><strong>Behind the Scenes<\/strong><\/p>\n<p>When existing captions on other sites are found for an image, they get streamed into the user\u2019s browser extension via a web socket connection. The browser extension dynamically adds the caption to the page in the form of alt text for image elements and aria-labels for background images. Caption Crawler also extracts the alt text and image captions in the DOM while the user is browsing a page using the browser extension. This allows the system to keep improving as more pages are browsed by users. When multiple potential captions for a target image are located, the longest caption is presented first while a queue of all captions found is built; if the user is not satisfied with a caption, they press a shortcut key to access additional captions from the queue.<\/p>\n<p>The importance of the ability to hear multiple caption options was not clear until user testing. As part of its debugging interface, the team had created the shortcut key that would allow hearing more than one caption for a given image if Caption Crawler located additional alt text online. This ability to hear multiple descriptions of a single image in fact delighted users who are blind or with low vision as they discovered that each additional caption in the queue added additional and different types of information and detail. The researchers also noticed how this added to the users\u2019 confidence \u2013 for example, confidence that the captions were accurate if they tended to corroborate each other.<br \/>\nCaption Crawler automatically supplies captions when alt text for an image is missing entirely. In the case of poor quality alt text, the user simply presses a keyboard shortcut to request a replacement of the alt text with a Caption Crawler queued caption. The screen reader observes the change and automatically speaks the new caption.<\/p>\n<p>When Caption Crawler is unable to find a pre-existing caption for an image on the web, it requests a computer-generated caption from the CaptionBot API (part of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">Microsoft Cognitive Services<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) which uses computer vision to describe an image. When the text from CaptionBot is read aloud, the screen reader first speaks the word \u201cCaptionBot\u201d so that the user is aware that this is not a human-authored caption.<\/p>\n<p>Morris points to the pervasiveness of digital imagery, with billions of images being posted daily across a variety of media, as a key motivator for research on automated techniques for creating and improving image descriptions. \u201cEngaging with this digital imagery is part of the fabric of participation in contemporary society, including education, the professions, e-commerce, civic life, entertainment, and socializing. High-quality captions empower screen reader users to more effectively engage with this key aspect of modern life.\u201d<\/p>\n<p>Part of the innovation of this project was taking advantage of existing technology, making future implementation by any interested party fairly straightforward. But the other thing Caption Crawler accomplishes is to provide a bridge between the present and the future. Current AI visual description solutions are not yet as high-quality as human-authored descriptions; there will come a day when AI implementations are able to quite effectively provide high quality visual descriptions and write excellent captions. Cutrell points out that that time isn\u2019t yet here. What the Caption Crawler team wanted to do is leverage high-quality human authored alt-text content until the day when AI can do perform such tasks much better.<\/p>\n<blockquote><p><strong>\u201cWhat I love about this research is that it really exemplifies Microsoft\u2019s mission statement of empowering every person to achieve more.\u201d \u2013 Meredith Ringel Morris, Principal Researcher\u00a0<\/strong><\/p><\/blockquote>\n<p>Caption Crawler only works for the most part for popular images; private images \u2013 images that would only appear in one location by definition, such as vacation snaps or images of items on eBay \u2013 may or may not include alt text but they fall outside the purview of Caption Crawler\u2019s raison d&#8217;\u00eatre. But many images, for example, those having to do with current events, politics, science, or celebrity movie reviews are going to appear in multiple places and Caption Crawler can play a valuable role in these cases.<\/p>\n<p>The team points out that many people who are blind triangulate, that is, use multiple data points to get closer to understanding what they\u2019re encountering on the web. The queue increases their confidence that the info they are getting back from the system is accurate.<\/p>\n<p>\u201cFolks who are blind or low vision are incredibly competent at making sense of the world around them,\u201d says Cutrell. Indeed, people who are blind use all kinds of information to do this effectively. In the case of the web, they rely on textual content, contextual cues, who published or authored the site and what it\u2019s trying to provide. The image is just one little bit of additional information. The Caption Crawler team believes that if it can provide some extra bits of information on top of what screen reader users are already using, they will have a fuller picture of what is on the screen. The team also plans to explore how to match captions for very similar (rather than only identical) images, to further improve the coverage that can be obtained by this approach.<\/p>\n<p>Be sure to check out the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/caption-crawler-enabling-reusable-alternative-text-descriptions-using-reverse-image-search-2\/\">team\u2019s paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, to be presented this month at the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/chi-2018\/\">CHI 2018<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> conference in Montreal to see the depth and dedication this project evinces. The video also lets you see Caption Crawler in action and is absolutely worth a watch.<\/p>\n<p><iframe loading=\"lazy\" title=\"Caption Crawler\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube-nocookie.com\/embed\/MoshFVp9dbU?feature=oembed&rel=0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p><strong>Related Links:<\/strong><\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/caption-crawler-enabling-reusable-alternative-text-descriptions-using-reverse-image-search-2\/\">Caption Crawler: Enabling Reusable Alternative Text Descriptions Using Reverse Image Search<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/caption-crawler-enabling-reusable-alternative-text-descriptions-using-reverse-image-search\/\">Video: Caption Crawler<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/accessibility-microsoft-making-technology-accessible-people\/\">Video: Accessibility \u2013 How Microsoft is making technology accessible to more people<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/aka.ms\/msrability\">Microsoft Research Ability Team<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Imagine for a moment that you are blind and are navigating the web using a screen reader to hear websites rather than see them. Imagine that the article you have navigated to includes images. To understand the content and significance, you are relying on your screen reader to narrate the alt text associated with each [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":482403,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194481,194455],"tags":[],"research-area":[13556,13554,13553],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-477507","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-human-centered-computing","category-machine-learning","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-research-area-medical-health-genomics","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[283244,371909,144928],"related-projects":[501950],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"480\" height=\"280\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_Carosel_04_2018_480x280.jpg\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_Carosel_04_2018_480x280.jpg 480w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/04\/CaptionCrawler2_HCC_Carosel_04_2018_480x280-300x175.jpg 300w\" sizes=\"(max-width: 480px) 100vw, 480px\" \/>","byline":"","formattedDate":"April 23, 2018","formattedExcerpt":"Imagine for a moment that you are blind and are navigating the web using a screen reader to hear websites rather than see them. Imagine that the article you have navigated to includes images. To understand the content and significance, you are relying on your&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/477507"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=477507"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/477507\/revisions"}],"predecessor-version":[{"id":482418,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/477507\/revisions\/482418"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/482403"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=477507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=477507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=477507"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=477507"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=477507"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=477507"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=477507"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=477507"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=477507"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=477507"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=477507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}