Hybrid Reward Architecture

Using Hybrid Reward Architecture (HRA) to solve Ms. Pac-Man

Games are popular as a test-bed for new machine learning techniques because they can be very challenging and allow for easy analysis of new learning techniques in a controlled environment. For reinforcement learning (RL), where the goal is to learn good behavior in a data-driven way, the Arcade Learning Environment (ALE), which provides access to a large number of Atari 2600 games, has been a popular test-bed.

Animation showing Ms Pac-Man screen and HRA code

In 2015, Mnih et al. achieved a breakthrough in RL research: by combining standard RL techniques with deep neural networks, they outperformed humans on a large number of ALE games. Since then, many new methods have been developed based on the same principles, improving performance even further. Nonetheless, for some of the ALE games, DQN and its successors are unsuccessful, achieving only a fraction of the score that a human gets. One of these hard games is the classical game Ms. Pac-Man.

In our blog post we look deeper into the reason of why Ms. Pac-Man is hard and propose a new technique, called Hybrid Reward Architecture, to deal with the underlying challenge of Ms. Pac-Man.