{"id":394646,"date":"2017-07-19T07:24:45","date_gmt":"2017-07-19T14:24:45","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=394646"},"modified":"2024-02-25T13:55:43","modified_gmt":"2024-02-25T21:55:43","slug":"vision-and-language-intelligence","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/vision-and-language-intelligence\/","title":{"rendered":"Vision and Language Intelligence"},"content":{"rendered":"<p>This project aims at driving disruptive advances in\u00a0vision and language intelligence. We believe future breakthroughs in multimodal intelligence will empower smart communications between humans and the world and enable next-generation scenarios such as a universal chatbot and intelligent augmented reality. To these ends, we are focusing on understanding, reasoning, and generation across language and vision, and creation of intelligent services, including vision-to-text captioning, text-to-vision generation, and question answering\/dialog about images and videos.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We are focusing on understanding, reasoning, and generation across language and vision, and creation of intelligent services, including vision-to-text captioning, text-to-vision generation, and question answering\/dialog about images and videos.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13556,13562,13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-394646","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2017-06-28","related-publications":[490436,506933,314984,553113,626973,626988,681282,706384,846181],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"Talks and Tutorials","content":"<h2>Invited Talks<\/h2>\r\n<ol>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/multimodal-learning-for-image-captioning-and-visual-question-answering-talk-at-uc-berkeley-bvlc\/\">Multimodal Learning for Image Captioning and Visual Question Answering (invited talk at UC Berkeley, BVLC)<\/a>, Xiaodong He, invited talk at UC Berkeley, BVLC, April 1, 2016, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/multimodal-learning-for-image-captioning-and-visual-question-answering-talk-at-uc-berkeley-bvlc\/\">View abstract<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/UCB_XiaodongHe.6-1.pdf\">Download PDF<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-human-quality-image-captioning-deep-semantic-learning-of-text-and-images-invited-talk\/\">Towards Human-level Quality Image Captioning: Deep Semantic Learning of Text and Images (Invited Talk)<\/a>, Xiaodong He, Invited Talk at INNS Deep Learning Workshop, August 1, 2015, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/towards-human-quality-image-captioning-deep-semantic-learning-of-text-and-images-invited-talk\/\">View abstract<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/Xiaodong_He_ImageCaptioning_InvitedTalk.pdf\">Download PDF<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deep-semantic-learning-teach-machines-to-understand-text-image-and-knowledge-graph-invited-talk-at-cvpr-deepvision-workshop\/\">Deep Semantic Learning: Teach machines to understand text, image, and knowledge graph (Invited talk at CVPR DeepVision workshop)<\/a>, Invited talk at CVPR DeepVision workshop, Xiaodong He, June 1, 2015, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deep-semantic-learning-teach-machines-to-understand-text-image-and-knowledge-graph-invited-talk-at-cvpr-deepvision-workshop\/\">View abstract<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/CVPR15_DeepVision_XiaodongHe.pdf\">Download PDF<\/a><\/li>\r\n<\/ol>\r\n<h2>Tutorial<\/h2>\r\n<ul>\r\n \t<li><a href=\"https:\/\/github.com\/scottyih\/Slides\/blob\/master\/IJCAI-2016_tutorial.pdf\">Deep Learning and Continuous Representations for NLP<\/a>, Tutorial, Scott Wen-tau Yih, Xiaodong He, Jianfeng Gao, July, IJCAI 2016, <a href=\"https:\/\/github.com\/scottyih\/Slides\/blob\/master\/IJCAI-2016_tutorial.pdf\">PDF<\/a><\/li>\r\n<\/ul>"},{"id":1,"name":"Additional information","content":"<ol start=\"1\" type=\"1\">\r\n \t<li>This project leads to the <a href=\"http:\/\/blogs.microsoft.com\/next\/2016\/03\/30\/decades-of-computer-vision-research-one-swiss-army-knife\/\">image captioning cloud service<\/a> now is part of <a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/computer-vision-api\">Microsoft Cognitive Services<\/a>\u00a0and <a href=\"https:\/\/www.captionbot.ai\/\">CaptionBot<\/a>. The work was widely covered in media including\u00a0<em><a href=\"http:\/\/www.businessinsider.com\/microsoft-caption-bot-sees-satya-nadella-2016-3\">Business Insider<\/a><\/em>,\u00a0<em><a href=\"https:\/\/techcrunch.com\/2016\/03\/30\/microsoft-caption-bot\/\">TechCrunch<\/a><\/em>,\u00a0<em><a href=\"http:\/\/www.forbes.com\/sites\/paulmonckton\/2016\/03\/31\/captionbot-describes-your-images\/\">Forbes<\/a><\/em>, <em><a href=\"https:\/\/www.washingtonpost.com\/news\/the-intersect\/wp\/2016\/04\/13\/microsofts-caption-bot-is-the-latest-bot-were-mocking-on-the-Internet\/\">The Washington Post<\/a><\/em>, <a href=\"http:\/\/money.cnn.com\/2016\/04\/14\/technology\/microsoft-hitler\/index.html\"><em>CNN<\/em><\/a>, <a href=\"http:\/\/www.bbc.com\/news\/technology-36052738\"><em>BBC<\/em><\/a>. The services also support applications such as <a href=\"https:\/\/www.youtube.com\/watch?v=rVF2duPVUTY&amp;list=PLD7HFcN7LXRdHkFBFu4stPPeWJcQ0VFLx\">Seeing AI<\/a>, Microsoft <a href=\"http:\/\/venturebeat.com\/2016\/12\/02\/microsoft-word-and-powerpoint-will-use-ai-to-automatically-write-photo-descriptions\/\">Word and PowerPoint<\/a>, etc. Lots of fun stories are also shared at\u00a0<em><a href=\"https:\/\/www.engadget.com\/2016\/04\/14\/microsoft-captionbot-ai\/\">Engadget<\/a><\/em>,\u00a0 <a href=\"http:\/\/gizmodo.com\/microsofts-new-ai-writes-captions-for-your-photos-and-i-1770907670\"><em>Gizmodo<\/em><\/a>,\u00a0<a href=\"http:\/\/www.telegraph.co.uk\/technology\/2016\/04\/14\/microsofts-new-bot-isnt-as-rude-as-hitler-loving-tay-but-its-bor\/\"><em>The Telegraph<\/em><\/a>,\u00a0<a href=\"http:\/\/www.dailymail.co.uk\/sciencetech\/article-3517592\/Would-trust-Microsoft-s-AI-caption-selfies-Firm-reveals-CaptionBot-try-online.html\"><em>Daily Mail<\/em><\/a>, <em><a href=\"https:\/\/www.theguardian.com\/technology\/2016\/apr\/14\/captionbot-microsoft-latest-ai-experiment-it-isnt-racist\">The Guardian<\/a><\/em>, <a href=\"http:\/\/mashable.com\/2016\/04\/14\/microsoft-captionbot\/#oeYvtCXbRPqF\"><em>Mashable<\/em><\/a>, and <a href=\"http:\/\/www.newsjs.com\/us\/microsofts-latest-ai-party-trick-is-a-captionbot-for-photos\/dWTSSoiTPjN5jBMwNdF3gJkZFjjtM\/\">more<\/a>.<\/li>\r\n \t<li>Our <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/CVPR15_0866.pdf\">MSR entry <\/a>won the <a href=\"http:\/\/mscoco.org\/dataset\/#leaderboard-cap\">1st Prize<\/a>, tied with Google, at the <a href=\"http:\/\/mscoco.org\/dataset\/#cap2015\">MS COCO Captioning Challenge 2015<\/a>, achieved the highest\u00a0score in\u00a0the Turing Test among all submissions. More details in the CVPR <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/CVPR15_0866.pdf\">paper<\/a> , <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/from-captions-to-visual-concepts-and-back\/\">demo<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/Xiaodong_He_ImageCaptioning_InvitedTalk.pdf\">relevant talk<\/a>, and recent media coverage by <a href=\"http:\/\/blogs.microsoft.com\/next\/2015\/05\/28\/picture-this-microsoft-research-project-can-interpret-caption-photos\/\">Microsoft blog<\/a>, <a href=\"http:\/\/blogs.technet.com\/b\/inside_microsoft_research\/archive\/2015\/06\/11\/microsoft-researchers-tie-for-best-image-captioning-technology.aspx\">TechNet<\/a>, <a href=\"http:\/\/www.slashgear.com\/microsoft-auto-photo-captioning-research-has-eye-on-ai-28385605\/\">SlashGear<\/a>, <a href=\"http:\/\/www.engadget.com\/2015\/05\/28\/microsoft-imaging-caption\/\">Engadget<\/a>, <a href=\"http:\/\/venturebeat.com\/2015\/06\/09\/google-ties-with-microsoft-in-microsofts-own-contest-for-generating-image-captions\/\">ventureBeat<\/a>, <a href=\"http:\/\/www.androidheadlines.com\/2015\/06\/google-and-microsoft-tie-in-robotic-caption-contest.html\">androidHeadlines<\/a>.<\/li>\r\n<\/ol>"}],"slides":[],"related-researchers":[{"type":"guest","display_name":"Yonatan Bisk","user_id":788162,"people_section":"Section name 0","alias":""},{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"Section name 0","alias":"jfgao"}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/394646"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":10,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/394646\/revisions"}],"predecessor-version":[{"id":1009722,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/394646\/revisions\/1009722"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=394646"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=394646"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=394646"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=394646"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=394646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}