{"id":395930,"date":"2017-07-07T13:14:57","date_gmt":"2017-07-07T20:14:57","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-group&p=395930"},"modified":"2022-12-21T10:30:45","modified_gmt":"2022-12-21T18:30:45","slug":"reinforcement-learning-group","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/theme\/reinforcement-learning-group\/","title":{"rendered":"Reinforcement Learning"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"MSR\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t<\/span>\n\t\t\t\t\t\t\t\t\tReturn to Microsoft Research Lab – New York City\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Reinforcement Learning<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

Reinforcement learning is the study of decision making over time with consequences. The field has developed systems to make decisions in complex environments based on external, and possibly delayed, feedback.<\/p>\n\n\n\n

At Microsoft Research, we are working on building the reinforcement learning theory, algorithms and systems for technology that learns from its own successes (and failures), explores the world \u201cjust enough\u201d to learn, and can infer which decisions have led to those outcomes. Our primary goal is reinforcement learning in the real world: understanding how to build systems that work, even when simulation is unavailable, and samples are scarce.<\/p>\n\n\n\n

We are working to create the future of reinforcement learning across a broad range of applications, including dialogue systems, game playing, content placement, program synthesis, recommendations, web search, natural language processing, and systems optimization.<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"

The Reinforcement Learning research group works on theoretical foundations, algorithms, and systems for autonomous decision making. Our main research areas include exploration-exploitation trade-offs, off-policy learning, and generalization for contextual bandits, Markov decision processes, and contextual decision processes.<\/p>\n","protected":false},"featured_media":627639,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_group_start":"","footnotes":""},"research-area":[13556],"msr-group-type":[243688],"msr-locale":[268875],"msr-impact-theme":[],"class_list":["post-395930","msr-group","type-msr-group","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-group-type-theme","msr-locale-en_us"],"msr_group_start":"","msr_detailed_description":"","msr_further_details":"","msr_hero_images":[],"msr_research_lab":[199571],"related-researchers":[{"type":"user_nicename","display_name":"Jacob Alber","user_id":36747,"people_section":"Research Team","alias":"jaalber"},{"type":"user_nicename","display_name":"Jordan Ash","user_id":39826,"people_section":"Research Team","alias":"joash"},{"type":"user_nicename","display_name":"Griffin Bassman","user_id":40288,"people_section":"Research Team","alias":"gbassman"},{"type":"user_nicename","display_name":"Rajan Chari","user_id":36765,"people_section":"Research Team","alias":"ranaras"},{"type":"user_nicename","display_name":"Ching-An Cheng","user_id":38991,"people_section":"Research Team","alias":"chinganc"},{"type":"user_nicename","display_name":"Sam Devlin","user_id":37550,"people_section":"Research Team","alias":"sadevlin"},{"type":"user_nicename","display_name":"Miro Dud\u00edk","user_id":32867,"people_section":"Research Team","alias":"mdudik"},{"type":"user_nicename","display_name":"Dylan Foster","user_id":40330,"people_section":"Research Team","alias":"dylanfoster"},{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"Research Team","alias":"jfgao"},{"type":"user_nicename","display_name":"Jack Gerrits","user_id":37628,"people_section":"Research Team","alias":"jagerrit"},{"type":"user_nicename","display_name":"Katja Hofmann","user_id":32468,"people_section":"Research Team","alias":"kahofman"},{"type":"user_nicename","display_name":"Rafah Hosn","user_id":36783,"people_section":"Research Team","alias":"raaboulh"},{"type":"user_nicename","display_name":"Andrey Kolobov","user_id":30910,"people_section":"Research Team","alias":"akolobov"},{"type":"user_nicename","display_name":"Akshay Krishnamurthy","user_id":30913,"people_section":"Research Team","alias":"akshaykr"},{"type":"guest","display_name":"Rodrigo Kumpera","user_id":633948,"people_section":"Research Team","alias":""},{"type":"user_nicename","display_name":"John Langford","user_id":32204,"people_section":"Research Team","alias":"jcl"},{"type":"user_nicename","display_name":"Ricky Loynd","user_id":33406,"people_section":"Research Team","alias":"riloynd"},{"type":"user_nicename","display_name":"Paul Mineiro","user_id":33272,"people_section":"Research Team","alias":"pmineiro"},{"type":"user_nicename","display_name":"Ida Momennejad","user_id":39832,"people_section":"Research Team","alias":"idamo"},{"type":"user_nicename","display_name":"Marco Rossi","user_id":36741,"people_section":"Research Team","alias":"marossi"},{"type":"user_nicename","display_name":"Robert Schapire","user_id":33549,"people_section":"Research Team","alias":"schapire"},{"type":"user_nicename","display_name":"Shital Shah","user_id":35435,"people_section":"Research Team","alias":"shitals"},{"type":"user_nicename","display_name":"Alex Slivkins","user_id":33685,"people_section":"Research Team","alias":"slivkins"},{"type":"user_nicename","display_name":"Cheng Tan","user_id":37953,"people_section":"Research Team","alias":"chetan"},{"type":"user_nicename","display_name":"Alexey Taymanov","user_id":37616,"people_section":"Research Team","alias":"ataymano"},{"type":"user_nicename","display_name":"Olga Vrousgou","user_id":37998,"people_section":"Research Team","alias":"olvrousg"},{"type":"user_nicename","display_name":"Cyril Zhang","user_id":39829,"people_section":"Research Team","alias":"cyrilzhang"}],"related-publications":[591757,580987,580996,572352,581008,580846,580858,574983,574956,574974,502184,552729,574965,568497,575001,543723,555672,482721,500372,565749,580960,580969,580978,487844,499955,489419,489461,493946,489407,489062,481188,487826,481131,437235,487835,438609,454764,453927,297716,438015,377222,401762,401753,425196,425190,425169,383549,383546,379010,376403,575010,487817,425202,425256,401741,401996,402005,575022,401711,401657,442749,401645,347972,401960,401939,297656,297665,326474,294722,294719,238331,237369,238203,246479,398366,297719,402272,305843,238204,238202,297671,579922,297692,237377,168795,168796,168899,489401,575040,167900,579928,164060,580345,579937,580375,580369,168793,621201,687570,606525,669948,688176,606735,687183,785407,168798,622881,687579,606531,669957,688182,606756,687189,832543,330596,624720,687585,606537,672558,688188,606762,687195,848887,330632,630102,687591,606546,672681,688203,606771,687201,887205,475113,630108,687597,606555,672696,692043,606777,687381,904659,503861,630114,687612,606696,672732,692349,164061,607332,687387,904701,503879,630174,687618,606702,673500,700012,166358,607362,687396,909801,543621,636378,687624,606708,687114,747847,166480,609729,687405,981240,576708,641709,687630,606714,687144,753058,168751,610218,687558,981693,600384,642180,687636,606720,687150,764023,168752,617298,687564,606519,663825,687648,606729,687165,764857],"related-downloads":[595906,770374],"related-videos":[],"related-projects":[],"related-events":[],"related-opportunities":[],"related-posts":[],"tab-content":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/395930"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-group"}],"version-history":[{"count":31,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/395930\/revisions"}],"predecessor-version":[{"id":909810,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/395930\/revisions\/909810"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/627639"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=395930"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=395930"},{"taxonomy":"msr-group-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group-type?post=395930"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=395930"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=395930"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}