{"id":1005396,"date":"2024-09-26T12:36:22","date_gmt":"2024-09-26T19:36:22","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=1005396"},"modified":"2024-11-11T09:31:10","modified_gmt":"2024-11-11T17:31:10","slug":"asl-stem-wiki","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/asl-stem-wiki\/","title":{"rendered":"ASL STEM Wiki"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"Screenshot\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

ASL STEM Wiki<\/h1>\n\n\n\n

Dataset and Benchmark for Interpreting STEM Articles<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

To help advance the state of sign language modeling, we created ASL STEM Wiki \u2014 the first continuous signing dataset focused on Science, Technology, Engineering, and Math (STEM). The corpus contains 254 Wikipedia articles on STEM topics in English, interpreted into 300 hours of American Sign Language (ASL). In addition to its size and topic, unlike many prior datasets, it contains videos of professional signers, including many CDIs (Certified Deaf Interpreters), and was collected with consent from each contributor under IRB approval. Deaf research team members were involved throughout.<\/p>\n\n\n\n

This dataset is released alongside our paper identifying several use cases for ASL STEM Wiki and providing baselines for one of these tasks — fingerspelling detection and identification. Because the dataset focuses on STEM, and STEM terminology often lacks standardized signs, fingerspelling of technical terms appears frequently in our dataset. To help identify fingerspellings, we provide models for fingerspelling detection and alignment, and release benchmark performance on the ASL STEM Wiki dataset for the research community to build on. Our models highlight the difficulty of the detection and alignment task, and provide the first evidence that self-supervised contrastive pretraining can improve fingerspelling detection.<\/p>\n\n\n\n

Our dataset empowers a small bilingual resource for students, providing full English texts for STEM articles alongside professional ASL interpretations. This resource enables students and other readers to access spot-translations for select sentences, and to play through entire articles as desired. We release this resource as well.<\/p>\n\n\n\n

This project was conducted at Microsoft Research with collaborators.<\/p>\n\n\n\n