A grid of screenshots from the ASL Citizen dataset, showing different people performing different signs in American Sign Language.

ASL Citizen

A Community-sourced Dataset for Advancing Isolated Sign Language Recognition

Signed languages are the primary languages of about 70 million D/deaf people worldwide (opens in new tab). Despite their importance, existing information and communication technologies are primarily designed for written or spoken language. Though automated solutions might help address such accessibility gaps, the state of sign language modeling is far behind that of spoken language modeling, largely due to lack of appropriate training data. Prior datasets suffer from a combination of small size, limited diversity, lack of real-world recording settings, poor labels, and lack of consent for video use. 

To help advance the state of sign language modeling, we created ASL Citizen — the first crowdsourced isolated sign language dataset, containing about 84k videos of 2.7k distinct signs from American Sign Language (ASL). This dataset is the largest-to-date Isolated Sign Language Recognition (ISLR) dataset. In addition to its size, unlike prior datasets, it contains everyday signers in everyday recording scenarios, and was collected with consent from each contributor under IRB approval. Deaf research team members were involved throughout.

This dataset is released alongside our paper (opens in new tab) reframing ISLR as a dictionary retrieval task and establishing state-of-the-art baselines. In dictionary retrieval, someone sees or thinks of a sign that they would like to look up; they try to repeat the sign in front of an everyday (RGB) camera; and then an ISLR algorithm returns a list of signs that are closest to the demonstrated sign. The results list may be accompanied by sign definitions in text or sign language video. This framing grounds ISLR research in a meaningful, real-world application. Our baselines leverage existing appearance and pose-based techniques, and with our dataset improve the state-of-the-art in ISLR from about 32% to 62% accuracy.

This project was conducted at Microsoft Research with collaborators at multiple organizations.

  • Microsoft: Danielle Bragg (PI), Mary Bellard, Hal Daumé III, Alex Lu, Vanessa Milan, Fyodor Minakov, Paul Oka, Philip Rosenfield, Chinmay Singh, William Thies
  • Boston University: Lauren Berger, Naomi Caselli, Miriam Goldberg, Hannah Goldblatt, Kriston Pumphrey
  • University of Washington: Aashaka Desai, Richard Ladner
  • Rochester Institute of Technology: Abraham Glasser

Dataset License: Please see the supporting tab. If you are interested in commercial use, please contact ASL_Citizen@microsoft.com

Dataset Download:

To download via web interface, please visit: Download ASL Citizen from Official Microsoft Download Center

To download via command line, please execute: wget https://download.microsoft.com/download/b/8/8/b88c0bae-e6c1-43e1-8726-98cf5af36ca4/ASL_Citizen.zip

Open-source Repo: https://github.com/microsoft/ASL-citizen-code (opens in new tab)

Citation: If you use this dataset in your work, please cite our paper (opens in new tab).

@article{desai2023asl,
  title={ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition},
  author={Desai, Aashaka and Berger, Lauren and Minakov, Fyodor O and Milan, Vanessa and Singh, Chinmay and Pumphrey, Kriston and Ladner, Richard E and Daum{\'e} III, Hal and Lu, Alex X and Caselli, Naomi and Bragg, Danielle},
  journal={arXiv preprint arXiv:2304.05934},
  year={2023}
}

Acknowledgements: We are deeply grateful to all community members who participated in this dataset project.