{"id":937629,"date":"2023-05-04T09:00:00","date_gmt":"2023-05-04T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=937629"},"modified":"2023-05-04T08:17:52","modified_gmt":"2023-05-04T15:17:52","slug":"inferring-rewards-through-interaction","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/inferring-rewards-through-interaction\/","title":{"rendered":"Inferring rewards through interaction"},"content":{"rendered":"\n

This research was accepted by the 2023 International Conference on Learning Representations (ICLR) (opens in new tab)<\/span><\/a>, which is dedicated to the advancement of the branch of artificial intelligence generally referred to as deep learning.<\/em><\/p>\n\n\n\n

\"A<\/figure>\n\n\n\n

Reinforcement learning (RL) hinges on the power of rewards, driving agents<\/em>\u2014or the models doing the learning\u2014to explore and learn valuable actions. The feedback received through rewards shapes their behavior, culminating in effective policies. Yet, crafting reward functions is a complex, laborious task, even for experts. A more appealing option, particularly for the people ultimately using systems that learn from feedback over time, is an agent that can automatically infer a reward function. The interaction-grounded learning (IGL) paradigm<\/a> from Microsoft Research enables agents to infer rewards through the very process of interaction, utilizing diverse feedback signals rather than explicit numeric rewards. Despite the absence of a clear reward signal, the feedback relies on a binary latent reward through which the agent masters a policy that maximizes this unseen latent reward using environmental feedback.<\/p>\n\n\n\n

In our paper \u201cPersonalized Reward Learning with Interaction-Grounded Learning,\u201d (opens in new tab)<\/span><\/a> which we\u2019re presenting at the 2023 International Conference on Learning Representations (ICLR) (opens in new tab)<\/span><\/a>, we propose a novel approach to solve for the IGL paradigm: IGL-P.<\/em> IGL-P is the first IGL strategy for context-dependent feedback, the first use of inverse kinematics as an IGL objective, and the first IGL strategy for more than two latent states. This approach provides a scalable alternative to current personalized agent learning methods, which can require expensive high-dimensional parameter tuning, handcrafted rewards, and\/or extensive and costly user studies.<\/p>\n\n\n\n

\n
\n
\n\t