This dataset empowers researchers and developers to pursue a range of new directions, including but not limited to:
- Fingerspelling detection and recognition – Our dataset reflects an increased usage of fingerspelling in interpretations of STEM documents. Identifying instances of fingerspelling and mapping these instances onto the corresponding English words can help researchers better understand signing patterns. We provide benchmarks for fingerspelling detection and recognition in our paper accompanying the dataset release. The ability to detect and recognize fingerspelling can also enable richer downstream applications, such as automatic sign suggestion (described subsequently).
- Automatic sign suggestion – Also motivated by the prevalence of fingerspelling in STEM interpretations, we suggest developing systems to detect when fingerspelling is used and suggest appropriate ASL signs to use instead. Suggestions would be dependent on the domain and context (e.g. «protein» in the context of nutrition, structural biology, or protein engineering may have distinct ASL signs), as well as on the audience (e.g. the sign to use for an elementary school class may be different from the sign to use with a college audience). Fingerspelling may be appropriate in some cases as well, for example when introducing a new sign that is not well-known.
- Translationese/Interpretese – Because our dataset is prompted from an English source sentence, it is prone to having effects of translationese [1], such as English-influenced word order, segmentation of ASL into English sentence boundaries, signs for English homonyms being used instead of the appropriate sign, and increased fingerspelling. We propose training models to detect and repair translationese, as well as potential translation and interpretation studies around interpretese [2] of ASL.
- Sign variation – Five of our articles are interpreted by all 37 ASL interpreters in our study. These articles provide a unique opportunity to study variations in how individuals sign and interpret the same English sentence, especially STEM concepts where ASL signs are not stabilized.
- Sign linking/retrieval – Related to sign variation, our dataset contains examples of English words that may be interpreted differently across interpreters and context. This data can be used to train models that links different versions of ASL signs for the same concept (e.g. one interpreter may sign «electromagnetism» using the signs for ELECTRICITY and MAGNET, another interpreter may interpret the same word using a sign that visually describes an electromagnetic field).
- Automatic STEM translation – Our dataset can be used to train, fine-tune, and/or evaluate model capabilities in translating technical content from English to ASL. Technically, our dataset could be used to develop models to translate from ASL to English, however, this direction is not preferred since our dataset contains interpreted ASL which may differ from unprompted ASL [2].
[1] Moshe Koppel and Noam Ordan. 2011. Translationese and its dialects. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 1318–1326, Portland, Oregon, USA. Association for Computational Linguistics.
[2] Miriam Shlesinger. 2009. Towards a definition of interpretese. Benjamins Translation Library (BTL).