{"id":394646,"date":"2017-07-19T07:24:45","date_gmt":"2017-07-19T14:24:45","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=394646"},"modified":"2024-02-25T13:55:43","modified_gmt":"2024-02-25T21:55:43","slug":"vision-and-language-intelligence","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/vision-and-language-intelligence\/","title":{"rendered":"Vision and Language Intelligence"},"content":{"rendered":"

This project aims at driving disruptive advances in\u00a0vision and language intelligence. We believe future breakthroughs in multimodal intelligence will empower smart communications between humans and the world and enable next-generation scenarios such as a universal chatbot and intelligent augmented reality. To these ends, we are focusing on understanding, reasoning, and generation across language and vision, and creation of intelligent services, including vision-to-text captioning, text-to-vision generation, and question answering\/dialog about images and videos.<\/p>\n","protected":false},"excerpt":{"rendered":"

We are focusing on understanding, reasoning, and generation across language and vision, and creation of intelligent services, including vision-to-text captioning, text-to-vision generation, and question answering\/dialog about images and videos.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13556,13562,13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-394646","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2017-06-28","related-publications":[490436,506933,314984,553113,626973,626988,681282,706384,846181],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"Talks and Tutorials","content":"

Invited Talks<\/h2>\r\n
    \r\n \t
  1. Multimodal Learning for Image Captioning and Visual Question Answering (invited talk at UC Berkeley, BVLC)<\/a>, Xiaodong He, invited talk at UC Berkeley, BVLC, April 1, 2016, View abstract<\/a>, Download PDF<\/a><\/li>\r\n \t
  2. Towards Human-level Quality Image Captioning: Deep Semantic Learning of Text and Images (Invited Talk)<\/a>, Xiaodong He, Invited Talk at INNS Deep Learning Workshop, August 1, 2015, View abstract<\/a>, Download PDF<\/a><\/li>\r\n \t
  3. Deep Semantic Learning: Teach machines to understand text, image, and knowledge graph (Invited talk at CVPR DeepVision workshop)<\/a>, Invited talk at CVPR DeepVision workshop, Xiaodong He, June 1, 2015, View abstract<\/a>, Download PDF<\/a><\/li>\r\n<\/ol>\r\n

    Tutorial<\/h2>\r\n