{"id":598567,"date":"2019-07-19T11:36:08","date_gmt":"2019-07-19T18:36:08","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=598567"},"modified":"2019-08-05T05:07:24","modified_gmt":"2019-08-05T12:07:24","slug":"ms-asl","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ms-asl\/","title":{"rendered":"MS-ASL"},"content":{"rendered":"
Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language due to the enormous cost of transcribing these unwritten languages. We propose the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which we thoroughly evaluate with state-of-the-art methods from sign and related action recognition. Unlike the current state-of-the-art, the data set allows to investigate the generalization to unseen individuals (signer-independent test) in a realistic setting with over 200 signers. Previous work mostly deals with limited vocabulary tasks, while here, we cover a large class count of 1000 signs in challenging and unconstrained real-life recording conditions. We further propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition, outperforming the current state-of-the-art by a large margin. The data set is publicly available to the community.<\/p>\n
Set<\/strong><\/td>\nClasses<\/strong><\/td>\n | Subjects<\/strong><\/td>\n | Samples<\/strong><\/td>\n | Duration<\/strong><\/td>\n | Sample per<\/strong> class<\/strong><\/td>\n<\/tr>\n | MS-ASL100<\/td>\n | 100<\/td>\n | 189<\/td>\n | 5736<\/td>\n | 5:33<\/td>\n | 57.4<\/td>\n<\/tr>\n | MS-ASL200<\/span><\/td>\n | 200<\/td>\n | 196<\/td>\n | 9719<\/td>\n | 9:31<\/td>\n | 48.6<\/td>\n<\/tr>\n | MS-ASL500<\/span><\/td>\n | 500<\/td>\n | 222<\/td>\n | 17823<\/td>\n | 17:19<\/td>\n | 35.6<\/td>\n<\/tr>\n | MS-ASL1000<\/span><\/td>\n | 1000<\/td>\n | 222<\/td>\n | 25513<\/td>\n | 24:39<\/td>\n | 25.5<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n | <\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":" Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13562],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-598567","msr-project","type-msr-project","status-publish","hentry","msr-research-area-computer-vision","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2018-06-15","related-publications":[556347],"related-downloads":[604641],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Oscar Koller","user_id":38493,"people_section":"Section name 1","alias":"oskoller"}],"msr_research_lab":[212740],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567\/revisions"}],"predecessor-version":[{"id":601704,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567\/revisions\/601704"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=598567"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=598567"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=598567"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=598567"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=598567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}} |