{"id":577638,"date":"2019-05-02T12:12:59","date_gmt":"2019-05-02T19:12:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=577638"},"modified":"2020-11-12T13:42:34","modified_gmt":"2020-11-12T21:42:34","slug":"hybrid-reward-architecture","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/hybrid-reward-architecture\/","title":{"rendered":"Hybrid Reward Architecture"},"content":{"rendered":"<h2>Using Hybrid Reward Architecture (HRA) to solve Ms. Pac-Man<\/h2>\n<p>Games are popular as a test-bed for new machine learning techniques because they can be very challenging and allow for easy analysis of new learning techniques in a controlled environment. For reinforcement learning (RL), where the goal is to learn good behavior in a data-driven way, the Arcade Learning Environment (ALE), which provides access to a large number of Atari 2600 games, has been a popular test-bed.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-577650\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/04\/HRA-levels-animation-300x171.gif\" alt=\"Animation showing Ms Pac-Man screen and HRA code\" width=\"480\" height=\"273\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/04\/HRA-levels-animation-300x171.gif 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/04\/HRA-levels-animation-768x437.gif 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/04\/HRA-levels-animation-1024x582.gif 1024w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/p>\n<p>In 2015, Mnih et al. achieved a breakthrough in RL research: by combining standard RL techniques with deep neural networks, they outperformed humans on a large number of ALE games. Since then, many new methods have been developed based on the same principles, improving performance even further. Nonetheless, for some of the ALE games, DQN and its successors are unsuccessful, achieving only a fraction of the score that a human gets. One of these hard games is the classical game Ms. Pac-Man.<\/p>\n<p>In our blog post we look deeper into the reason of why Ms. Pac-Man is hard and propose a new technique, called Hybrid Reward Architecture, to deal with the underlying challenge of Ms. Pac-Man.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For reinforcement learning (RL), where the goal is to learn good behavior in a data-driven way, the Arcade Learning Environment (ALE), which provides access to a large number of Atari 2600 games, has been a popular test-bed.<\/p>\n","protected":false},"featured_media":582757,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-577638","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[401741],"related-downloads":[],"related-videos":[442317],"related-groups":[395930,663258],"related-events":[],"related-opportunities":[],"related-posts":[455721,447303,709156,720673],"related-articles":[],"tab-content":[],"related-researchers":[{"type":"user_nicename","display_name":"Mahmoud Adada","user_id":37176,"people_section":"Section name 1","alias":"maadada"}],"msr_research_lab":[437514,1148609],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/577638","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":11,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/577638\/revisions"}],"predecessor-version":[{"id":654048,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/577638\/revisions\/654048"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/582757"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=577638"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=577638"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=577638"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=577638"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=577638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}