{"id":568788,"date":"2019-02-22T09:30:26","date_gmt":"2019-02-22T17:30:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=568788"},"modified":"2019-02-22T09:30:26","modified_gmt":"2019-02-22T17:30:26","slug":"winners-announced-in-multi-agent-reinforcement-learning-challenge","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/winners-announced-in-multi-agent-reinforcement-learning-challenge\/","title":{"rendered":"Winners announced in multi-agent reinforcement learning challenge"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-568803 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/mobchase1_1200x783.png\" alt=\"\" width=\"1200\" height=\"783\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/mobchase1_1200x783.png 1200w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/mobchase1_1200x783-300x196.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/mobchase1_1200x783-768x501.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/mobchase1_1200x783-1024x668.png 1024w\" sizes=\"auto, (max-width: 1200px) 100vw, 1200px\" \/><\/p>\n<p>Reinforcement learning, a major machine learning technique, is increasingly making an impact, producing fruitful results in such areas as gaming, financial markets, autonomous driving, and robotics. But there remain many challenges and opportunities before AI, using such techniques and others, can move beyond solving specific tasks to exhibiting more general intelligence, and we in the research community have a long tradition of relying on the more controlled environments of play as a benchmark for this work.<\/p>\n<p>For the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/machine-intelligence-and-perception\/\">Machine Intelligence and Perception group<\/a> here at Microsoft Research, our sandbox of choice has been <em>Minecraft<\/em>. Its complex 3D world and flexibility to create a wide variety of game-play scenarios make it ideal for exploring the potential of artificial intelligence, and the platform we\u2019ve built to do so\u2014<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-malmo\/\">Project Malmo<\/a>\u2014has allowed us to capitalize on these strengths.<\/p>\n<p>In <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.crowdai.org\/challenges\/marlo-2018\">Learning to Play: The Multi-Agent Reinforcement Learning in Malm\u00d6 (MARL\u00d6) Competition<\/a>, we invited programmers into this digital world to help tackle multi-agent reinforcement learning. This challenge, the second competition using the Project Malmo platform, tasked participants with designing learning agents capable of collaborating with or competing against other agents to complete tasks across three different games within <em>Minecraft<\/em>.<\/p>\n<p>The competition\u2014<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-malmo-reinforcement-learning-in-a-complex-world\/\">co-hosted by Microsoft, Queen Mary University of London, and CrowdAI<\/a>\u2014drew strong interest from people and teams worldwide. At the end of the five-month submission period in December, we had 133 participants.<\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kahofman\/\">Katja Hofmann, Senior Researcher and research lead of Project Malmo<\/a>, announced the five competition winners at the Applied Machine Learning Days conference in Lausanne, Switzerland, last month.<\/p>\n<div id=\"attachment_568848\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-568848\" class=\"wp-image-568848 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Katja_Hoffmann_at_Applied_Machine_Learning_Days-1024x683.jpg\" alt=\"\" width=\"1024\" height=\"683\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Katja_Hoffmann_at_Applied_Machine_Learning_Days-1024x683.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Katja_Hoffmann_at_Applied_Machine_Learning_Days-300x200.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Katja_Hoffmann_at_Applied_Machine_Learning_Days-768x512.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Katja_Hoffmann_at_Applied_Machine_Learning_Days.jpg 1500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-568848\" class=\"wp-caption-text\">Katja Hofmann at Applied Machine Learning Days \u00a9samueldevantery.com<\/p><\/div>\n<h3>And the winners are &#8230;<\/h3>\n<p>First place was awarded to a team from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.uni-bielefeld.de\/(en)\/technische-fakultaet\/forschungszentren\/CITEC\/\">Bielefeld University\u2019s Cluster of Excellence Center in Cognitive Interactive Technology<\/a> in Germany.<\/p>\n<p>\u201cWhat we like about the MARL\u00d6 competition best is that it highlights fundamental problems of general AI, like multitasking behavior in multi-agent 3D environments,\u201d said team supervisor Dr. Andrew Melnik. \u201cThe most challenging aspect of the competition was the changes in the appearance of the target objects, as well as environment tilesets and weather and light conditions\u2014rainy and sunny, noon and dawn. It was quite a unique feature of the competition, and we enjoyed tackling the problem.\u201d<\/p>\n<div id=\"attachment_568809\" style=\"width: 534px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-568809\" class=\"wp-image-568809 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/1st-place-team.jpg\" alt=\"\" width=\"524\" height=\"393\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/1st-place-team.jpg 524w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/1st-place-team-300x225.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/1st-place-team-80x60.jpg 80w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/1st-place-team-240x180.jpg 240w\" sizes=\"auto, (max-width: 524px) 100vw, 524px\" \/><p id=\"caption-attachment-568809\" class=\"wp-caption-text\">First place winners from left to right: Lennart Bramlage, Andrew Melnik, and Hendric Vo\u00df<\/p><\/div>\n<p>With their agent, Melnik and team members Lennart Bramlage and Hendric Vo\u00df employed an approach called \u201cMimicStates,\u201d a type of imitation learning in which they exploited a collection of mimic actions from real players performing catching, attacking, and defensive action patterns in different environments.<\/p>\n<p>\u201cLeveraging human data is a very promising direction for tackling the hardest AI challenges,\u201d said Hofmann. \u201cThe success of this approach suggests that there are many interesting directions to explore. For example, how to best leverage human data to further improve performance.\u201d<\/p>\n<div id=\"attachment_568851\" style=\"width: 489px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-568851\" class=\"wp-image-568851 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-first-runners-up.jpg\" alt=\"\" width=\"479\" height=\"269\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-first-runners-up.jpg 479w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-first-runners-up-300x168.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-first-runners-up-343x193.jpg 343w\" sizes=\"auto, (max-width: 479px) 100vw, 479px\" \/><p id=\"caption-attachment-568851\" class=\"wp-caption-text\">First runner-up, from left to right: Rodrigo de Moura Canaan, Michael Cerny Green, and Philip Bontrager<\/p><\/div>\n<div id=\"attachment_568854\" style=\"width: 500px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-568854\" class=\"wp-image-568854 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-second-runner-up.jpg\" alt=\"Second runner-up: Linjie Xu\" width=\"490\" height=\"275\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-second-runner-up.jpg 490w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-second-runner-up-300x168.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-second-runner-up-343x193.jpg 343w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><p id=\"caption-attachment-568854\" class=\"wp-caption-text\">Second runner-up: Linjie Xu<\/p><\/div>\n<p>The competition\u2019s first runner-up was the New York University team of Rodrigo de Moura Canaan, Michael Cerny Green, and Philip Bontrager, and second runner-up was Linjie Xu from Nanchang University, Jiangxi Province, China. Motoki Omura from The University of Tokyo, Japan, and Timothy Craig Momose, Izumi Karino, and Yuma Suzuki, also from the University of Tokyo, received honorable mention.<\/p>\n<h3>From the well-established to the novel<\/h3>\n<p>Many competitors used variations of well-established deep reinforcement learning approaches\u2014deep-Q networks (DQN) and proximal policy optimization (PPO) were particularly popular\u2014while others incorporated more recent innovations, including <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/papers.nips.cc\/paper\/7512-recurrent-world-models-facilitate-policy-evolution.pdf\">recurrent world models<\/a>, and even developed novel approaches (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/sisrel.kaist.ac.kr\/papers\/message_dropout_19AAAI.pdf\">&#8220;Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning,\u201d published at the 2019 AAAI Conference on Artificial Intelligence<\/a>).<\/p>\n<p>For future research and competitions on these tasks, we expect that current approaches could significantly improve through methods combining both the imitation learning techniques used by the winning team and reinforcement learning, benefiting from the prior knowledge of human gameplay and the further fine-tuning of behaviors through reinforcement learning. It is our belief that competitions like MARL\u00d6 will continue to play a key role in innovating in this area.<\/p>\n<p>\u201cMulti-agent reinforcement learning has been picking up traction in the research community over the last couple of years, and what we need right now is a series of ambitious tasks that the community can use to measure our collective progress,\u201d said <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/spmohanty.com\/\">Sharada Mohanty, a Ph.D. student at \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne, Switzerland, and co-founder of CrowdAI<\/a>, which co-hosted MARL\u00d6. \u201cThis competition establishes a series of such ambitious tasks that act as good baselines that will catalyze a lot of meaningful progress in multi-agent reinforcement learning over the next few years.\u201d<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1901.08129\">MARL\u00d6<\/a> is part of our ongoing engagement with the multi-agent reinforcement learning community to help further advance general artificial intelligence.<\/p>\n<p>\u201cGeneralization across multiple task variants and agents is very hard and nowhere near solved,\u201d said Hofmann. \u201cWith <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/crowdAI\/marLo\">the MARL\u00d6 starter kit and competition tasks online<\/a>, we invite the community to try and tackle this open challenge.\u201d<\/p>\n<p>Try the tutorial, experiment, and keep the conversation going by sending your ideas to <a href=\"mailto:malmoadm@microsoft.com\">malmoadm@microsoft.com<\/a>. We look forward to hearing from you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Learning to Play: The Multi-Agent Reinforcement Learning in Malm\u00d6 (MARL\u00d6) Competition, we invited programmers into this digital world to help tackle multi-agent reinforcement learning. This challenge, the second competition using the Project Malmo platform, tasked participants with designing learning agents capable of collaborating with or competing against other agents to complete tasks across three different games within Minecraft.<\/p>\n","protected":false},"author":38022,"featured_media":568836,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194467],"tags":[],"research-area":[13556],"msr-region":[197900],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-568788","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-region-north-america","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[235753],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"502\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-Announcement_Blog_Social_02_2019_1200x627.png\" class=\"img-object-cover\" alt=\"a person standing in front of Josh Buice et al. posing for the camera\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-Announcement_Blog_Social_02_2019_1200x627.png 1200w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-Announcement_Blog_Social_02_2019_1200x627-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-Announcement_Blog_Social_02_2019_1200x627-768x401.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/02\/Malmo-Announcement_Blog_Social_02_2019_1200x627-1024x535.png 1024w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"Noboru Sean Kuno","formattedDate":"February 22, 2019","formattedExcerpt":"In Learning to Play: The Multi-Agent Reinforcement Learning in Malm\u00d6 (MARL\u00d6) Competition, we invited programmers into this digital world to help tackle multi-agent reinforcement learning. This challenge, the second competition using the Project Malmo platform, tasked participants with designing learning agents capable of collaborating with&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/568788","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=568788"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/568788\/revisions"}],"predecessor-version":[{"id":569073,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/568788\/revisions\/569073"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/568836"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=568788"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=568788"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=568788"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=568788"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=568788"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=568788"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=568788"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=568788"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=568788"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=568788"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=568788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}