Win or Learn Fast Proximal Policy Optimisation

IEEE Conference on Games |

AI agents within video games are often required to compete within an environment shared by many other agents. This problem can be tackled by multi-agent reinforcement learning (MARL). One solution to MARL is to learn a Nash Equilibrium Strategy (NES) that guarantees a known minimum payoff when playing against other rational agents. We focus on one approach for learning a NES, Win or Learn Fast (WoLF), WoLF has been shown to converge towards a NES in a variety of matrix-games and grid based games. Research into Deep MARL has focused on performance against opponent agents and with limited quantitative results regarding learning a NES. We present a systematic empirical investigation into the ability of Proximal Policy Optimisation (PPO) to learn a NES, showing instability in certain matrix games. We then present an extension, WoLF-PPO, that is able to learn a policy that is closer to the NES.