{"id":598567,"date":"2019-07-19T11:36:08","date_gmt":"2019-07-19T18:36:08","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=598567"},"modified":"2019-08-05T05:07:24","modified_gmt":"2019-08-05T12:07:24","slug":"ms-asl","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/ms-asl\/","title":{"rendered":"MS-ASL"},"content":{"rendered":"<p>Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language due to the enormous cost of transcribing these unwritten languages. We propose the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which we thoroughly evaluate with state-of-the-art methods from sign and related action recognition. Unlike the current state-of-the-art, the data set allows to investigate the generalization to unseen individuals (signer-independent test) in a realistic setting with over 200 signers. Previous work mostly deals with limited vocabulary tasks, while here, we cover a large class count of 1000 signs in challenging and unconstrained real-life recording conditions. We further propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition, outperforming the current state-of-the-art by a large margin. The data set is publicly available to the community.<\/p>\n<div><\/div>\n<h3><strong>Data Set and its subsets statistics<\/strong><\/h3>\n<div><\/div>\n<div>\n<table style=\"width: 59.57%;height: 145px;border-collapse: collapse;border-spacing: inherit\" border=\"1\">\n<tbody>\n<tr style=\"height: 47px\">\n<td style=\"padding: inherit;border: 1px solid;width: 11.73%;height: 43px;text-align: center\"><strong>Set<\/strong><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.84%;height: 43px;text-align: center\"><strong>Classes<\/strong><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.38%;height: 43px;text-align: center\"><strong>Subjects<\/strong><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 43px;text-align: center\"><strong>Samples<\/strong><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 43px;text-align: center\"><strong>Duration<\/strong><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 10.39%;height: 43px;text-align: center\"><strong>Sample per<\/strong> <strong>class<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"padding: inherit;border: 1px solid;width: 11.73%;height: 23px\">MS-ASL100<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.84%;height: 23px;text-align: center\">100<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.38%;height: 23px;text-align: center\">189<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">5736<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">5:33<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 10.39%;height: 23px;text-align: center\">57.4<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"padding: inherit;border: 1px solid;width: 11.73%;height: 23px\"><span style=\"float: none;background-color: #ffffff;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px;text-transform: none\">MS-ASL200<\/span><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.84%;height: 23px;text-align: center\">200<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.38%;height: 23px;text-align: center\">196<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">9719<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">9:31<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 10.39%;height: 23px;text-align: center\">48.6<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"padding: inherit;border: 1px solid;width: 11.73%;height: 23px\"><span style=\"float: none;background-color: #ffffff;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px;text-transform: none\">MS-ASL500<\/span><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.84%;height: 23px;text-align: center\">500<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.38%;height: 23px;text-align: center\">222<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">17823<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">17:19<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 10.39%;height: 23px;text-align: center\">35.6<\/td>\n<\/tr>\n<tr style=\"height: 23px\">\n<td style=\"padding: inherit;border: 1px solid;width: 11.73%;height: 23px\"><span style=\"float: none;background-color: #ffffff;color: #333333;cursor: text;font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif;font-size: 16px;font-style: normal;font-variant: normal;font-weight: 400;letter-spacing: normal;text-align: left;text-decoration: none;text-indent: 0px;text-transform: none\">MS-ASL1000<\/span><\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.84%;height: 23px;text-align: center\">1000<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 11.38%;height: 23px;text-align: center\">222<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">25513<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 9.63%;height: 23px;text-align: center\">24:39<\/td>\n<td style=\"padding: inherit;border: 1px solid;width: 10.39%;height: 23px;text-align: center\">25.5<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13562],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-598567","msr-project","type-msr-project","status-publish","hentry","msr-research-area-computer-vision","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2018-06-15","related-publications":[556347],"related-downloads":[604641],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Oscar Koller","user_id":38493,"people_section":"Section name 1","alias":"oskoller"}],"msr_research_lab":[212740],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567\/revisions"}],"predecessor-version":[{"id":601704,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/598567\/revisions\/601704"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=598567"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=598567"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=598567"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=598567"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=598567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}