{"id":602541,"date":"2019-08-09T09:02:08","date_gmt":"2019-08-09T16:02:08","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=602541"},"modified":"2019-09-12T14:35:21","modified_gmt":"2019-09-12T21:35:21","slug":"project-malmo-competition-returns-with-student-organizers-and-a-new-mission-to-democratize-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-malmo-competition-returns-with-student-organizers-and-a-new-mission-to-democratize-reinforcement-learning\/","title":{"rendered":"Project Malmo competition returns with student organizers and a new mission: To democratize reinforcement learning"},"content":{"rendered":"

\"\" (opens in new tab)<\/span><\/a><\/p>\n

When I was asked about my favorite movie in a game with friends after my wedding ceremony, I replied Star Wars<\/em>. That was about two decades ago, and, yes, it\u2019s still the case. I especially like Return of the Jedi<\/em>. The third installment in the original trilogy is almost perfect to me. Luke Skywalker returns to fight back against the Empire as a member of the Rebel Alliance with the help of his old friend Han Solo and new friends the Ewoks. It\u2019s must-see as far as I\u2019m concerned. Third stories have proven to be special in other franchise masterpieces, too, such as The Lord of the Rings<\/em>, Back to the Future<\/em>, and Indiana Jones<\/em>.<\/p>\n

The MineRL competition (opens in new tab)<\/span><\/a> is the third in a trilogy of a different sort\u2014contests based on Project Malmo (opens in new tab)<\/span><\/a>, an AI experimentation platform built on top of Minecraft\u2014and it\u2019s distinguishing itself from other contests and its Malmo predecessors in really exciting ways.<\/p>\n

MineRL is the first of its kind to put a premium on agent training efficiency, and we believe it\u2019s the first competition to explicitly take advantage of an approach that combines reinforcement learning and imitation learning with a large dataset. And while The Malmo Collaborative AI Challenge (opens in new tab)<\/span><\/a> in 2017 was organized by Microsoft and The Multi-Agent Reinforcement Learning In Malmo (MARLO) Competition (opens in new tab)<\/span><\/a> in 2018 was co-organized by Microsoft, Queen Mary University of London, and CrowdAI, now AIcrowd, this year\u2019s competition was proposed by and is based on the work of students from Carnegie Mellon University.<\/p>\n

The power of competition<\/h3>\n

CMU PhD student William Guss (opens in new tab)<\/span><\/a>, the competition\u2019s lead organizer, has long been interested in doing machine learning in Minecraft, drawn to the game by the ability of its open-world environment to reflect the nature of real-world tasks and challenges. It\u2019s why researchers here at Microsoft Research like it, too. William was intrigued by Project Malmo, but saw there were limitations in current reinforcement learning tools and methods that were making it difficult to fully take advantage of the unique training ground provided by the game and platform. State-of-the-art reinforcement learning systems require rapidly increasing amounts of samples and computing resources, making it hard to replicate and improve those systems let alone apply them in the real world. Additionally, the reward functions reinforcement learning employs aren\u2019t conducive to specifying the kind of general intelligence researchers hope their agents can eventually achieve.<\/p>\n

In response, William, Brandon Houghton, and several other CMU students developed technology to record the completion of various tasks in Minecraft, creating a large-scale dataset of human demonstrations called MineRL-v0 (opens in new tab)<\/span><\/a>. They realized, though, the dataset wouldn\u2019t be nearly as valuable without more efficient algorithms to use it. Having seen the success of machine learning competitions such as the ImageNet challenge (opens in new tab)<\/span><\/a> in galvanizing research in a particular direction, they began considering a competition designed around sample-efficient and imitation-based reinforcement learning using their dataset. With this in the back of their minds and ready to release the dataset, they reached out to Microsoft about collaborating in general. Both parties came to the realization that partnering for a competition was a natural fit, and the MineRL competition was born.<\/p>\n

Making AI more inclusive<\/h3>\n

In the competition\u2014which is in partnership with Queen Mary University of London (opens in new tab)<\/span><\/a>, AIcrowd (opens in new tab)<\/span><\/a>, Preferred Networks (opens in new tab)<\/span><\/a>, and Microsoft\u2014participants have to develop a system to obtain a diamond in Minecraft using only four days of training time and no more than 10 million samples. To put the challenge into perspective, it\u2019s taken between 44 million and more than 200 million samples to train deep reinforcement learning models to play ATARI 2600 games as well as a person. These imposed training limitations are important to encouraging efficiency, which the CMU team envisions serving the larger goal behind the competition\u2019s design: the democratization of reinforcement learning.<\/p>\n

Reinforcement learning is so data-dependent that only those with access to such resources are able to work in and make contributions to the space, limiting the scope and pace of advancement. Inclusivity is so integral to what the competition is trying to accomplish that its infrastructure includes computational and travel grants, provided by Microsoft, to support those underrepresented in the research community in participating in the competition and traveling to the 2019 Conference on Neural Information Processing Systems (NeurIPS) (opens in new tab)<\/span><\/a>. MineRL is part of the NeurIPS competition track (opens in new tab)<\/span><\/a>, and the CMU team will host a workshop showcasing methods from the competition at the conference. As William put it, \u201cThe concentration of computational power and resources to those currently within the field and already with the means to research reinforcement learning, in some sense, impacts those underrepresented communities the most.\u201d With MineRL, the CMU team hopes to lower the barriers of entry by changing the current state of reinforcement learning by making it more sample efficient.<\/p>\n

Get in on the competition<\/h3>\n

The first round of the competition is open on the AIcrowd platform (opens in new tab)<\/span><\/a>, and submissions are being accepted until October 25, 2019. The CMU team\u2019s MineRL Python package, which includes a Malmo extension and tools for downloading the MineRL-v0 dataset, has already been downloaded more than 10,000 times, and more than 700 teams have signed up for the competition, the most sign-ups for a NeurIPS competition. \u201cSeeing the work that we\u2019ve put into this competition having a tangible effect on the research community has been the most fulfilling aspect of organizing,\u201d William told us.<\/p>\n

If you want to learn more about MineRL-v0\u2014which is more than 60 million samples strong\u2014check out the paper \u201cMineRL: A Large-Scale Dataset of Minecraft Demonstrations (opens in new tab)<\/span><\/a>.\u201d The CMU team will be presenting the paper at the 2019 International Joint Conference on Artificial Intelligence (opens in new tab)<\/span><\/a> Aug. 10\u201316 in Macao, China. To contribute to the dataset, visit the MineRL server (opens in new tab)<\/span><\/a> that has been set up for data collection.<\/p>\n

\u201cOur collaboration with the team led by CMU has been fantastic,\u201d said Katja Hofmann, Research Lead of Project Malmo and Principal Research Manager of Microsoft Research Cambridge (opens in new tab)<\/span><\/a>. \u201cI am very happy to see such an exciting competition being organized on Project Malmo, which we have developed and made open source (opens in new tab)<\/span><\/a> to the research community. This competition is a great example of how the platform enables a very wide range of research.\u201d<\/p>\n

Return of the Jedi<\/em> is not just the third story of the original trilogy; it opened up the prequel trilogy and the sequel trilogy. We\u2019re looking forward to seeing another story of ambitious students who take advantage of the Malmo platform to pursue their research agenda.<\/p>\n

The MineRL competition organizing team<\/h3>\n

William H. Guss, Carnegie Mellon University
\nMario Ynocente Castro, Preferred Networks
\nCayden Codel, Carnegie Mellon University
\nKatja Hofmann, Microsoft Research
\nBrandon Houghton, Carnegie Mellon University
\nNoboru Kuno, Microsoft Research
\nCrissman Loomis, Preferred Networks
\nKeisuke Nakata, Preferred Networks
\nStephanie Milani, University of Maryland, Baltimore County and Carnegie Mellon University
\nSharada Mohanty, AIcrowd
\nDiego Perez Liebana, Queen Mary University of London
\nRuslan Salakhutdinov, Carnegie Mellon University
\nShinya Shiroshita, Preferred Networks
\nNicholay Topin, Carnegie Mellon University
\nAvinash Ummadisingu, Preferred Networks
\nManuela Veloso, Carnegie Mellon University
\nPhillip Wang, Carnegie Mellon University<\/p>\n","protected":false},"excerpt":{"rendered":"

When I was asked about my favorite movie in a game with friends after my wedding ceremony, I replied Star Wars. That was about two decades ago, and, yes, it\u2019s still the case. I especially like Return of the Jedi. The third installment in the original trilogy is almost perfect to me. Luke Skywalker returns […]<\/p>\n","protected":false},"author":38022,"featured_media":602544,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194467,194455],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-602541","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","category-machine-learning","msr-research-area-artificial-intelligence","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[346139],"related-groups":[],"related-projects":[235753],"related-events":[238931],"related-researchers":[{"type":"user_nicename","value":"Noboru Sean Kuno","user_id":33122,"display_name":"Noboru Sean Kuno","author_link":"Noboru Sean Kuno<\/a>","is_active":false,"last_first":"Kuno, Noboru Sean","people_section":0,"alias":"nkuno"}],"msr_type":"Post","featured_image_thumbnail":"\"\"","byline":"Noboru Sean Kuno<\/a>","formattedDate":"August 9, 2019","formattedExcerpt":"When I was asked about my favorite movie in a game with friends after my wedding ceremony, I replied Star Wars. That was about two decades ago, and, yes, it\u2019s still the case. I especially like Return of the Jedi. The third installment in the…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/602541"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=602541"}],"version-history":[{"count":6,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/602541\/revisions"}],"predecessor-version":[{"id":608469,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/602541\/revisions\/608469"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/602544"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=602541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=602541"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=602541"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=602541"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=602541"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=602541"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=602541"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=602541"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=602541"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=602541"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=602541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}