{"id":568491,"date":"2019-05-03T10:02:09","date_gmt":"2019-05-03T17:02:09","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=568491"},"modified":"2024-01-16T11:11:48","modified_gmt":"2024-01-16T19:11:48","slug":"real-world-reinforcement-learning","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/real-world-reinforcement-learning\/","title":{"rendered":"Real World Reinforcement Learning"},"content":{"rendered":"

\"ReinforcementReal World Reinforcement Learning (Real-World RL) projects enable the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems. The heart of the Real-World RL projects and applications is a platform striving to enable people and organizations to continuously learn and adapt.<\/p>\n","protected":false},"excerpt":{"rendered":"

The mission of Real World Reinforcement Learning (Real-World RL) team is to develop learning methods, from foundations to real world applications, to empower people and organizations to make better decisions. The research enables the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems.<\/p>\n","protected":false},"featured_media":584956,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-568491","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2019-05-03","related-publications":[574983,574956,574974,574965,568497,575001,543723,555672,489407,297716,425190,575010,575022,297656,297671,579922,575040,167900,166480,579928,164060,580345,579937,580375,580369,164061,887205,904659,904701,981693,991641,481188,607332,630861,672732,700012],"related-downloads":[595906],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[{"id":0,"name":"Incubations","content":"[row]\r\n\r\n[column class=\"m-col-4-24\"]\"Blue [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide MSN personalization for their news articles.\r\n\r\nApproach<\/strong>: The Real-World RL platform was deployed inside of MSN\u2019s infrastructure to enable a very rapid personalization rate across their world-wide deployments.\r\n\r\nResults<\/strong>: The RL-based based personalization at MSN provides, on average, a 26% Click Through Rate (CTR) improvement.\r\n\r\n


\r\n\r\n[\/column]\r\n\r\n[\/row]\r\n[row]\r\n\r\n[column class=\"m-col-4-24\"]\"COMPLEX.com [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide complex.com, an external client, a means to personalize various areas of their website.\r\n\r\nApproach<\/strong>: Complex.com used the web interface (rest-API) of the Real-World RL platform to personalize their Top News articles, videos, and suggested articles.\r\n\r\nResults<\/strong>: The Real-World RL platform ran for over two years and provided complex.com, on average, a 30% CTR improvement over their baseline (editor\u2019s suggested rank).\r\n\r\n
\r\n\r\n[\/column]\r\n\r\n[\/row]\r\n\r\n[row]\r\n\r\n[column class=\"m-col-4-24\"]\"Xbox [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide the Microsoft marketing team \"Top Of Home\" personalized ad campaigns.\r\n\r\nApproach<\/strong>: Microsoft\u2019s internal marketing campaign manager (IRIS) used the Real-World RL platform to personalize two of the three Xbox \"Top of Home\" slots.\r\n\r\nResults<\/strong>: The pilot had two phases: 1) The Microsoft Research RL team ran counterfactual evaluation to estimate user\u2019s engagement based on real-world data collected for two weeks in June 2018. 2) The Real-World RL system was deployed in production for two weeks in November 2018, resulting in a 60% CTR improvement over a baseline random policy and increased user\u2019s engagement metrics.\r\n\r\n
\r\n\r\n[\/column]\r\n\r\n[\/row][row]\r\n\r\n[column class=\"m-col-4-24\"]\"Microsoft [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide the marketing team (MLEDCOP) the ability to perform website layout personalization. The pilot specifically targeted the Surface.com<\/a> page layout for Japan.\r\n\r\nApproach<\/strong>: The Real-World RL platform was used to personalize different calls-to-action in three different webpages on the Surface.com Japan website. The pilot was run in an A\/B fashion, where the control used the original layout as provided by Design, and the treatment used the Real-World RL platform to personalize the layout based on the users accessing the website.\r\n\r\nResults<\/strong>: The RL-based based personalization provided an 80% CTR improvement over the control.\r\n\r\n
\r\n\r\n[\/column]\r\n\r\n[\/row][row]\r\n\r\n[column class=\"m-col-4-24\"]\"Skype [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide Skype a means to optimize the length of their jitter buffer on a per-call basis in order to provide the best call quality possible to their end users.\r\n\r\nApproach<\/strong>: The Skype team ran the Real-World RL platform on a subset of their call agents for a few weeks.\r\n\r\nResults<\/strong>: When comparing results on their \u201ctreatment\u201d traffic, the Skype team saw a 1.5% improvement on the Poor Call Quality Metric, a metric that is typically used to proxy how users felt about the quality of the call.\r\n\r\n
\r\n\r\n[\/column]\r\n\r\n[\/row]\r\n\r\n[row]\r\n\r\n[column class=\"m-col-4-24\"]\"Black [\/column] [column class=\"m-col-20-24\"]\r\n\r\nGoal<\/strong>: To provide the AFD Frontier team a means to optimize the tcp\/ip setting of their clusters to provide the best server configuration.\r\n\r\nApproach<\/strong>: The AFD Frontier team used the Real-World RL platform for a 3-month pilot as part of the 2017 AI School.\r\n\r\nResults<\/strong>: The Real-World RL system provided considerable lift over default behavior. The AI School project won \u201cBest project award,\u201d and it is now used as the basis for an extended pilot between AFD and Microsoft Research.\r\n\r\n
\r\n\r\n[\/column]\r\n\r\n[\/row]"},{"id":1,"name":"Product Integration","content":"[row]\r\n\r\n[column class=\"m-col-4-24\"]\"3 [\/column] [column class=\"m-col-20-24\"]\r\n\r\n
Custom Decision Service<\/a>, a Microsoft Research project, uses reinforcement learning for a cloud-based, contextual decision-making API that sharpens with experience in order to provide personalized content. The research pilot was successful and released as Azure Cognitive Services Personalizer Preview<\/a>, enabling enterprises and application developers to create rich, personalized experiences for every user."}],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Jacob Alber","user_id":36747,"people_section":"Research Team","alias":"jaalber"},{"type":"user_nicename","display_name":"Griffin Bassman","user_id":40288,"people_section":"Research Team","alias":"gbassman"},{"type":"user_nicename","display_name":"Peter Chang","user_id":38187,"people_section":"Research Team","alias":"petchang"},{"type":"user_nicename","display_name":"Rajan Chari","user_id":36765,"people_section":"Research Team","alias":"ranaras"},{"type":"user_nicename","display_name":"Hal Daum\u00e9 III","user_id":36768,"people_section":"Research Team","alias":"hal3"},{"type":"user_nicename","display_name":"Miro Dud\u00edk","user_id":32867,"people_section":"Research Team","alias":"mdudik"},{"type":"user_nicename","display_name":"Dylan Foster","user_id":40330,"people_section":"Research Team","alias":"dylanfoster"},{"type":"user_nicename","display_name":"Jack Gerrits","user_id":37628,"people_section":"Research Team","alias":"jagerrit"},{"type":"user_nicename","display_name":"Rafah Hosn","user_id":36783,"people_section":"Research Team","alias":"raaboulh"},{"type":"user_nicename","display_name":"Akshay Krishnamurthy","user_id":30913,"people_section":"Research Team","alias":"akshaykr"},{"type":"user_nicename","display_name":"John Langford","user_id":32204,"people_section":"Research Team","alias":"jcl"},{"type":"user_nicename","display_name":"Paul Mineiro","user_id":33272,"people_section":"Research Team","alias":"pmineiro"},{"type":"user_nicename","display_name":"Lekan Molu","user_id":40555,"people_section":"Research Team","alias":"lekanmolu"},{"type":"user_nicename","display_name":"Marco Rossi","user_id":36741,"people_section":"Research Team","alias":"marossi"},{"type":"user_nicename","display_name":"Eduardo Salinas","user_id":38371,"people_section":"Research Team","alias":"edus"},{"type":"user_nicename","display_name":"Siddhartha Sen","user_id":33656,"people_section":"Research Team","alias":"sidsen"},{"type":"user_nicename","display_name":"Alex Slivkins","user_id":33685,"people_section":"Research Team","alias":"slivkins"},{"type":"user_nicename","display_name":"Adith Swaminathan","user_id":36392,"people_section":"Research Team","alias":"adswamin"},{"type":"user_nicename","display_name":"Cheng Tan","user_id":37953,"people_section":"Research Team","alias":"chetan"},{"type":"user_nicename","display_name":"Alexey Taymanov","user_id":37616,"people_section":"Research Team","alias":"ataymano"},{"type":"user_nicename","display_name":"Olga Vrousgou","user_id":37998,"people_section":"Research Team","alias":"olvrousg"}],"msr_research_lab":[199571,992148],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/568491"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":14,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/568491\/revisions"}],"predecessor-version":[{"id":999291,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/568491\/revisions\/999291"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/584956"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=568491"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=568491"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=568491"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=568491"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=568491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}