{"id":987693,"date":"2023-12-04T06:00:37","date_gmt":"2023-12-04T14:00:37","guid":{"rendered":""},"modified":"2023-12-04T06:00:38","modified_gmt":"2023-12-04T14:00:38","slug":"tackling-sign-language-data-inequity","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/tackling-sign-language-data-inequity\/","title":{"rendered":"Tackling sign language data inequity"},"content":{"rendered":"\n
\"Blue<\/figure>\n\n\n\n

Access to information is considered a human right by many global organizations and governments. But even though at least 71 countries mandate the provision of services in sign language, most information resources (like search engines or news sites) are presented in written language only. Sign languages are the primary means of communication for about 70 million d\/Deaf people worldwide, and are also used by hearing family members, friends, and colleagues.<\/p>\n\n\n\n

While over 300 sign languages are in use worldwide, American Sign Language (ASL) is the primary sign language used in the United States. For many deaf people, English and other written languages are actually secondary languages. Requiring signing deaf people to navigate information in a written language like English forces them to operate in a different, and potentially non-fluent language. Adapting text resources for sign language input and output introduces significant technical challenges. Automatically recognizing or translating sign language could help expand access, but AI development has been blocked by lack of high-quality data.<\/p>\n\n\n\n

To help make technical systems more accessible to people with disabilities, Danielle Bragg, a senior researcher at Microsoft Research, has been leading efforts to build systems<\/a> that better support sign language. This blog post provides an update on their progress, with a focus on their recent paper: ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition,<\/a> which introduces the first crowdsourced sign language dataset. Advancing the state of the art in sign recognition, the project demonstrates that community-centered data curation<\/strong> is not only the right thing to do, but also advances machine learning.<\/p>\n\n\n\n

Limitations of prior datasets<\/h2>\n\n\n\n

ASL Citizen supports machine learning methods to overcome limitations of prior Isolated Sign Language Recognition (ISLR) datasets. Model development typically requires a large, high-quality training set (i.e. large vocabulary, minimal label noise, representation of diverse signers and environments). Lack of appropriate sign language data collected with consent has been a major limitation to development of real-world sign language systems.<\/p>\n\n\n\n

Prior sign language datasets have been collected in two main ways: 1) by scraping the internet for videos or 2) by inviting people to a lab for recording. While scraping can result in large collections, the videos are typically collected without consent from video creators, and scraping violates many websites\u2019 terms of service. On the other hand, lab collections typically come with written consent from participants, but they are generally small, limited by the human hours required to record participants and the small pool of potential contributors located nearby. Lab collections also fail to capture diverse real-world settings, and it is difficult to identify and label content in scraped videos. To enable real-world sign language AI, sign language datasets need to additionally capture real-world settings, include diverse people, and be accurately labelled.<\/p>\n\n\n\n\t

\n\t\t\n\n\t\t

\n\t\tMicrosoft research podcast<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"photo\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

What\u2019s Your Story: Lex Story<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Model maker and fabricator Lex Story helps bring research to life through prototyping. He discusses his take on failure; the encouragement and advice that has supported his pursuit of art and science; and the sabbatical that might inspire his next career move.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

Designing the ASL Citizen collection<\/h2>\n\n\n\n

To overcome the limitations of past datasets, the research team designed a novel sign language crowdsourcing platform. The platform was web-based and enabled people who wanted to contribute to log in, engage in consent, and record videos. Web collection opened the project to a larger, more diverse audience, including anyone with internet access. It also enabled capturing everyday environments, which real-world systems need to learn to handle. By enabling people to contribute wherever and whenever they want, but still providing an explicit consent process, crowdsourcing enables scale with consent.<\/p>\n\n\n\n

The platform design also solved labeling problems. Labeling challenges have limited past dataset size due to the large amount of time required to identify video contents. For example, for a dataset of sign language monologues, each video must be carefully watched, and the signed contents must be identified, marked down, and time-aligned with the video. Only highly skilled experts can annotate sign language videos, and the process often takes more time and resources than the video collection itself. To completely avoid such labelling overhead, the platform was designed to collect pre-labelled contents. It accomplishes this by prompting users with sign videos that have known contents and asking them to record their own version. This enabled the team to automatically label videos that users created with the prompt videos.<\/p>\n\n\n\n

The platform was community-focused in multiple ways. All website content was presented in both English and ASL; the recording prompts were in ASL; the project goals were shared explicitly; participants were able to verify one another\u2019s contributions; and the community-sourced dataset was made available as a community resource in the form of a dictionary.<\/p>\n\n\n\n

In designing the platform, the team engaged in an iterative process, incorporating feedback from community stakeholders and testers, and ran a pilot study with the platform to better understand the user experience and quality of collected data. For more about the platform design and pilot study, see: Exploring Collection of Sign Language Videos through Crowdsourcing<\/a>. The team also experimented with crowdsourcing videos of complete sentences, which are required for more complete sign language modeling, as discussed in ASL Wiki: An Exploratory Interface for Crowdsourcing ASL Translations<\/a>.<\/p>\n\n\n\n

Improving models using ASL Citizen<\/h2>\n\n\n\n

A diverse group of experts helped bring ASL Citizen to life. Engineers and a designer at Microsoft helped build and scale the platform design; a well-known ASL professional recorded the prompt videos; and collaborators at Boston University\u2019s Deaf Center (opens in new tab)<\/span><\/a> provided feedback and managed participant recruitment and engagement. Deaf research team members were involved throughout. Consisting of about 84,000 videos of 2,700 distinct signs from ASL, the resulting dataset is the largest labelled ISLR dataset and the first crowdsourced ISLR dataset.<\/p>\n\n\n\n

Using the new dataset, the researchers adapted previous approaches to ISLR to the real-world task of looking up signs in a dictionary, and released a set of baselines for machine learning researchers to build upon, focusing on supervised deep learning methods. To establish baseline models, the team partnered with collaborators from the Paul G. Allen School of Computer Science and Engineering (opens in new tab)<\/span><\/a> at the University of Washington. Comparison to prior datasets was difficult because each dataset consists of a different vocabulary. However, compared to the best prior dataset, using just overlapping vocabulary with one baseline, the new dataset boosts performance from 16% to 71% accuracy. Even without algorithmic advances, training and testing on ASL Citizen improves ISLR accuracy compared to prior work, despite spanning a larger vocabulary and testing on completely unseen users.<\/p>\n\n\n\n

\n
\n
\n
\n\t