Information Directed Reward Learning for Reinforcement Learning
- David Lindner ,
- Matteo Turchetta ,
- Sebastian Tschiatschek ,
- Kamil Ciosek ,
- Andreas Krause
For many reinforcement learning (opens in new tab) (RL) applications, specifying a reward is difficult. In this paper, we consider an RL setting where the agent can obtain information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. For this purpose, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward function and selects queries that maximize the information gain about the difference in return between potentially optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model, it achieves similar or better performance with significantly fewer queries. We support our findings with extensive evaluations in multiple environments and with different types of queries.
Information Directed Reward Learning for Reinforcement Learning | JRC Workshop 2021
Artificial Intelligence (AI) 20 May 2021 Speaker: David Lindner, ETH Zurich (collaboration with Andreas Krause, ETH Zurich and Katja Hofmann, Microsoft) This virtual event brought together the PhD students and postdocs working on collaborative research engagements with Microsoft via the Swiss Joint Research Center, Mixed Reality & AI Zurich Lab, Mixed Reality & AI Cambridge Lab, Inria Joint Center, their academic and Microsoft supervisors as well as the wider research community. The event continued in the tradition of the annual Swiss JRC Workshops. PhD students and postdocs presented project updates and discussed their research with their supervisors and other attendants. In addition, Microsoft speakers provided updates on relevant Microsoft projects and initiatives. There were four event sessions according to research themes: Computer Vision, Systems, and…