{"id":793949,"date":"2021-11-16T08:00:51","date_gmt":"2021-11-16T16:00:51","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=793949"},"modified":"2021-11-09T09:47:11","modified_gmt":"2021-11-09T17:47:11","slug":"research-talk-reinforcement-learning-with-preference-feedback","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/research-talk-reinforcement-learning-with-preference-feedback\/","title":{"rendered":"Research talk: Reinforcement learning with preference feedback"},"content":{"rendered":"<p>Speaker: Aadirupa Saha, Postdoctoral Researcher, Microsoft Research NYC<\/p>\n<p>In Preference-based Reinforcement Learning (PbRL), an agent receives feedback only in terms of rank-ordered preferences over a set of selected actions, unlike the absolute reward feedback in traditional reinforcement learning. This is relevant in settings where it is difficult for the system designer to explicitly specify a reward function to achieve a desired behavior, but instead possible to elicit coarser feedback, say from an expert, about actions preferred over other actions at states. The success of the traditional reinforcement learning framework crucially hinges on the underlying agent-reward model. This, however, depends on how accurately a system designer can express an appropriate reward function, which is often a non-trivial task. The main novelty of the mobility-aware centralized reinforcement learning (MCRL) framework is the ability to learn from non-numeric, preference-based feedback that eliminates the need to handcraft numeric reward models. We will set up a formal framework for PbRL and discuss different real-world applications. Though introduced almost a decade ago, we will also discuss a problem here\u2014that most work in PbRL has been primarily applied or experimental in nature, barring a handful of very recent ventures on the theory side. Finally, we will discuss the limitations of the existing techniques and the scope of future developments.<\/p>\n<p>Learn more about the 2021 Microsoft Research Summit: https:\/\/Aka.ms\/researchsummit<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Speaker: Aadirupa Saha, Postdoctoral Researcher, Microsoft Research NYC In Preference-based Reinforcement Learning (PbRL), an agent receives feedback only in terms of rank-ordered preferences over a set of selected actions, unlike the absolute reward feedback in traditional reinforcement learning. This is relevant in settings where it is difficult for the system designer to explicitly specify a [&hellip;]<\/p>\n","protected":false},"featured_media":793952,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-video-type":[261263,261311],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-793949","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-video-type-research-summit-2021","msr-video-type-reinforcement-learning","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/MJzBUNtv0Ho","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/793949"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/793949\/revisions"}],"predecessor-version":[{"id":793955,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/793949\/revisions\/793955"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/793952"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=793949"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=793949"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=793949"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=793949"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=793949"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=793949"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=793949"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}