{"id":657576,"date":"2020-06-16T10:16:54","date_gmt":"2020-06-16T17:16:54","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=657576"},"modified":"2020-07-22T08:26:38","modified_gmt":"2020-07-22T15:26:38","slug":"learning-local-and-compositional-representations-for-zero-shot-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/learning-local-and-compositional-representations-for-zero-shot-learning\/","title":{"rendered":"Learning local and compositional representations for zero-shot learning"},"content":{"rendered":"\n
<\/p>\n\n\n\n<\/figure>\n\n\n\n
In computer vision, one key property we expect of an intelligent artificial model, agent, or algorithm is that it should be able to correctly recognize the type, or class<\/em>, of objects it encounters. This is critical in numerous important real-world scenarios\u2014from biomedicine, where an intelligent system might be tasked with distinguishing between cancerous cells and healthy ones, to self-driving cars, where being able to discriminate between pedestrians, other vehicles, and road signs is crucial to successfully and safely navigating roads.<\/p>\n\n\n\n
Deep learning is one of the most significant tools for state-of-the-art systems in computer vision, and its use has resulted in models that have reached or can even exceed human-level performance in important and challenging real-world image classification tasks. Despite their successes, these models still have difficulty generalizing<\/em>, or adapting to tasks in testing or deployment scenarios that don\u2019t closely resemble the tasks they were trained on. For example, a visual system trained under typical weather conditions in Northern California may fail to properly recognize pedestrians in Quebec because of differences in weather, clothes, demographics, and other features. As it\u2019s difficult to predict\u2014if not impossible to collect\u2014all the possible data that might be present at deployment, there\u2019s a natural interest in testing model classification performance under deployment scenarios in which very few examples of test classes are available, a scenario captured under the framework of few-shot learning<\/em>. Zero-shot<\/em> learning (ZSL) goes a step further: No examples of test classes are available when training. The model must instead rely on semantic information, such as attributes or text descriptions, associated with each class it encounters in training to correctly classify new classes.<\/p>\n\n\n\n\n\t