{"id":212087,"date":"2016-01-14T20:03:50","date_gmt":"2016-01-14T20:03:50","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/video-and-language\/"},"modified":"2021-05-13T02:20:33","modified_gmt":"2021-05-13T09:20:33","slug":"video-and-language","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/video-and-language\/","title":{"rendered":"Vision and Language"},"content":{"rendered":"

Automatically describing visual content with natural language is a fundamental challenge of computer vision and multimedia. Sequence learning (e.g., Recurrent Neural Networks), attention mechanism, memory networks,\u00a0etc.,\u00a0have attracted increasing attention on visual interpretation. In this project, we are focusing on the following topics related to the emerging topic of “vision and language”:<\/p>\n