{"id":602214,"date":"2019-08-16T09:16:11","date_gmt":"2019-08-16T16:16:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&p=602214"},"modified":"2020-03-26T08:51:16","modified_gmt":"2020-03-26T15:51:16","slug":"reinforcement-learning-day-2019","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day-2019\/","title":{"rendered":"Reinforcement Learning Day 2019"},"content":{"rendered":"

Venue:<\/strong>\u00a011 Times Square
\nRoom 6501
\nNew York, NY 10036<\/p>\n

Contact:<\/strong> For event questions, please contact msrevent@microsoft.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

Reinforcement Learning Day 2019 will share the latest research on learning to make decisions based on feedback. This workshop features talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective is to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.<\/p>\n","protected":false},"featured_media":602238,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr_startdate":"2019-10-03","msr_enddate":"2019-10-03","msr_location":"New York, NY","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":false,"footnotes":""},"research-area":[13556],"msr-region":[197900],"msr-event-type":[197944],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-602214","msr-event","type-msr-event","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-region-north-america","msr-event-type-hosted-by-microsoft","msr-locale-en_us"],"msr_about":"Venue:<\/strong>\u00a011 Times Square\r\nRoom 6501\r\nNew York, NY 10036\r\n\r\nContact:<\/strong> For event questions, please contact msrevent@microsoft.com<\/a>","tab-content":[{"id":0,"name":"About","content":"We shared the latest research on learning to make decisions based on feedback at Reinforcement Learning Day 2019\r\n\r\nReinforcement learning is the study of decision making with consequences over time. The topic draws together multi-disciplinary efforts from computer science, cognitive science, mathematics, economics, control theory, and neuroscience. The common thread through all of these studies is: how do natural and artificial systems learn to make decisions in complex environments based on external, and possibly delayed, feedback.\r\n\r\nThis workshop featured talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective is to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.\r\n\r\n[msr-button text=\"See last year's event\" url=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/reinforcement-learning-day\/\" ]\r\n

<\/div>\r\n

Committee Chairs<\/h3>\r\nHal Daum\u00e9 III<\/a>, Microsoft Research\r\nAkshay Krishnamurthy<\/a>, Microsoft Research\r\n

Speakers<\/h3>\r\nAsli Celikyilmaz<\/a>, MSR Redmond\r\nChristopher Amato<\/a>, Northeastern University\r\nFinale Doshi-Velez<\/a>, Harvard University\r\nGeoff Gordon<\/a>, MSR Montr\u00e9al\r\nMengdi Wang<\/a>, Princeton University\r\nPhilip Thomas<\/a>, University of Massachusetts, Amherst\r\nSam Devlin<\/a>, MSR Cambridge\r\nSheila McIlraith<\/a>, University of Toronto\r\n
<\/div>\r\n

Microsoft\u2019s Event Code of Conduct<\/h3>\r\nMicrosoft\u2019s mission is to empower every person and every organization on the planet to achieve more. This includes events Microsoft hosts and participates in, where we seek to create a respectful, friendly, and inclusive experience for all participants. As such, we do not tolerate harassing or disrespectful behavior, messages, images, or interactions by any event participant, in any form, at any aspect of the program including business and social activities, regardless of location. We do not tolerate any behavior that is degrading to any gender, race, sexual orientation or disability, or any behavior that would violate\u00a0Microsoft\u2019s Anti-Harassment and Anti-Discrimination Policy, Equal Employment Opportunity Policy, or Standards of Business Conduct<\/a>. In short, the entire experience at the venue must meet our culture standards. We encourage everyone to assist in creating a welcoming and safe environment. Please report any concerns, harassing behavior, or suspicious or disruptive activity to venue staff, the event host or owner, or event staff. Microsoft reserves the right to refuse admittance to or remove any person from company-sponsored events at any time in its sole discretion.\r\n\r\n[msr-button text=\"Report a concern\" url=\"https:\/\/app.convercent.com\/en-us\/Anonymous\/IssueIntake\/LandingPage\/65d3b907-0933-e611-8105-000d3ab03673\" new-window=\"true\" ]"},{"id":1,"name":"Agenda","content":"All registered guests must check-in at the Microsoft Welcome Center (entrance on 8th Avenue between 41st and 42nd Streets) on the day of the event with a government-issued photo ID to receive an elevator pass. Please plan on arriving 15 minutes early to ensure you have enough time to acquire the requisite pass\/badge.\r\n
<\/div>\r\n

Thursday, October 3, 2019<\/h2>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n
Time (PDT)<\/strong><\/td>\r\nSession<\/strong><\/td>\r\n<\/td>\r\nSpeaker<\/strong><\/td>\r\n<\/tr>\r\n
8:00 AM<\/td>\r\nBreakfast<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
9:00 AM<\/td>\r\nWelcome<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
9:10 AM<\/td>\r\nReward Machines: Structuring reward function specifications and reducing sample complexity in reinforcement learning | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nSheila McIlraith<\/a>, University of Toronto<\/td>\r\n<\/tr>\r\n
9:50 AM<\/td>\r\nGeneralization in Reinforcement Learning with Selective Noise Injection | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nSam Devlin<\/a>, MSR Cambridge<\/td>\r\n<\/tr>\r\n
10:15 AM<\/td>\r\nBreak<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
10:45 AM<\/td>\r\nScalable and Robust Multi-Agent Reinforcement Learning | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nChristopher Amato<\/a>, Northeastern University<\/td>\r\n<\/tr>\r\n
11:25 AM<\/td>\r\nReinforcement Learning From Small Data In Feature Space | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nMengdi Wang<\/a>, Princeton University<\/td>\r\n<\/tr>\r\n
12:05 PM<\/td>\r\nLunch<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
2:05 PM<\/td>\r\nSafe and Fair Reinforcement Learning | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nPhilip Thomas<\/a>, University of Massachusetts, Amherst<\/td>\r\n<\/tr>\r\n
2:45 PM<\/td>\r\nGrounding Natural Language for Embodied Agents | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nAsli Celikyilmaz<\/a>, MSR Redmond<\/td>\r\n<\/tr>\r\n
3:10 PM<\/td>\r\nBreak<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
3:40 PM<\/td>\r\nLearning for policy improvement | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nGeoff Gordon<\/a>, MSR Montr\u00e9al<\/td>\r\n<\/tr>\r\n
4:05 PM<\/td>\r\nTowards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare | slides<\/a> | video<\/a><\/td>\r\n\"Portrait<\/a><\/td>\r\nFinale Doshi-Velez<\/a>, Harvard University<\/td>\r\n<\/tr>\r\n
4:45 PM<\/td>\r\nConcluding remarks<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n
5:00 PM<\/td>\r\nEnd of day<\/td>\r\n<\/td>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>"},{"id":2,"name":"Abstracts","content":"[accordion] [panel header=\"Structuring reward function specifications and reducing sample complexity in reinforcement learning\"] Speaker:<\/strong> Sheila McIlraith<\/a> Humans have evolved languages over thousands of years to provide useful abstractions for understanding and interacting with each other and with the physical world. Such languages include natural languages, mathematical languages and calculi, and most recently formal languages that enable us to interact with machines via human-interpretable abstractions. In this talk, I present the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions. Reward Machines can be used natively to specify complex, non-Markovian reward-worthy behavior. Furthermore, a variety of compelling human-friendly (formal) languages can be used as reward specification languages and straightforwardly translated into Reward Machines, including variants of Linear Temporal Logic (LTL), and a variety of regular languages. Reward Machines can also be learned and can be used as memory for interaction in partially-observable environments. By exposing reward function structure, Reward Machines enable reward-function-tailored reinforcement learning, including tailored reward shaping and Q-learning. Experiments show that such reward-function-tailored algorithms significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise can't reasonably be solved and critically reducing the sample complexity. [SLIDES]<\/a> [\/panel] [panel header=\"Generalization in Reinforcement Learning with Selective Noise Injection\"] Speaker:<\/strong> Sam Devlin<\/a> Reinforcement learning is the only form of machine learning that has been commonly allowed to train on its test set. Deep reinforcement learning in particular has been show to overfit the environments it trains on. In this talk I will discuss results from two of our recent papers (1) showing the application of domain randomization to navigation in unseen 3D mazes (published at the IEEE Conference on Games 2019); and (2) proposing selective noise injection via a variational information bottleneck to improve generalization to unseen test levels of the 2D platformer CoinRun (to be published at the Thirty-third Conference on Neural Information Processing Systems - NeurIPS 2019). [SLIDES]<\/a> [\/panel] [panel header=\"Scalable and Robust Multi-Agent Reinforcement Learning\"] Speaker:<\/strong> Christopher Amato<\/a> This talk will cover our recent multi-agent reinforcement learning methods for coordinating teams of agents with limited or no communication. The methods will include deep multi-agent reinforcement learning approaches and hierarchical methods that learn asynchronous policies that realistically allow learning and\/or execution to take place at different times for different agents. The approaches are scalable to large spaces and horizons and robust to non-stationarity caused by other agents learning. Results from benchmark and multi-robot domains will be shown. [SLIDES]<\/a> [\/panel] [panel header=\"Reinforcement Learning From Small Data In Feature Space\"] Speaker:<\/strong> Mengdi Wang<\/a> Recent years have witnessed increasing empirical successes in reinforcement learning (RL). However, many theoretical questions about RL were not well understood. For example, how many observations are necessary and sufficient for learning a good policy? How to learn to control using structural information with provable regret? In this talk, we discuss the statistical efficiency of RL, with and without structural information such as linear feature representation, and show how to algorithmically learn the optimal policy with nearly minimax-optimal complexity. Complexity of RL algorithms largely depend on dimension of state features. Towards reducing the dimension of RL, we discuss a state embedding learning method that automatically learns state features and aggregation structures from trajectory data. [\/panel] [panel header=\"Safe and Fair Reinforcement Learning\"] Speaker:<\/strong> Philip Thomas<\/a> In this talk I will discuss some of our upcoming work on a new framework for designing machine learning algorithms that both 1) makes it easier for the user of the algorithm to define what they consider to be undesirable behavior (e.g., what they consider to be unfair, unsafe, or costly) and 2) provides a high-confidence guarantee that it will not produce a solution that exhibits the user-defined undesirable behavior. [SLIDES]<\/a> [\/panel] [panel header=\"Grounding Natural Language for Embodied Agents\"] Speaker:<\/strong> Asli Celikyilmaz<\/a> The last two years have seen the introduction of several new tasks at the intersection of language and vision. The most popular of which is the Vision-Language Navigation (VLN) task introduced in 2018. The task places an agent randomly inside a home and instructs them to navigate to a target destination based on a natural language command. Success in this domain requires building multimodal language groundings that allow the agent to successfully navigate while reasoning about vision-language dynamics. Within MSR, we have significantly pushed the state-of-the-art in this space with a combination of approaches that utilize search, imitation learning, and pretraining. The fundamental underlying assumption about the tasks like VLN is that we will build agents that execute our commands. The way we train these agents is by providing examples of observation-action tuples, turning it into a unidirectional language. We train our agents to execute our commands but not necessary teach the agents how to react when uncertainties in the environment arise. In this talk I will present our recent work on reinforcement, and imitation learning and pretraining methods for VLN task, and present our new thinking into a more general problem for understanding how a system asks for and receives assistance with the goal of exploring techniques to transfer and generalize for vision-language navigation research field. [SLIDES]<\/a> [\/panel] [panel header=\"Learning for policy improvement\"] Speaker:<\/strong> Geoff Gordon<\/a> Reinforcement learning has had many successes in domains where experience is inexpensive, such as video games or board games. RL algorithms for such domains are often based on gradient descent: they make many noisy updates with a small learning rate. We instead examine algorithms that spend more computation per update, in an attempt to reduce noise and make larger updates; such algorithms are appropriate when experience is more expensive than compute time. In particular we look at several methods based on approximate policy iteration. [SLIDES]<\/a>[\/panel] [panel header=\"Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare\"] Speaker:<\/strong> Finale Doshi-Velez<\/a> In many health settings, we have available a large amounts of longitudinal, partial views of a patient (e.g. what has been coded in health records, or recorded from various monitors). How can this information be used to improve patient care? In this talk, I'll present work that our lab has done in batch reinforcement learning, an area of reinforcement learning that assumes that the agent may access data but not explore actions. I will discuss not only algorithms for optimization and off-policy evaluation, but also ways in which we are working toward providing making it easier for clinical experts to specify problems as well as process the outputs for validity. In this way, I will also touch on problems that we can expect batch reinforcement learning to solve, and problems that it cannot. This work is in collaboration with Srivatsan Srinivasan, Isaac Lage, Dafna Lifshcitz, Ofra Amir, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Xuefeng Peng, David Wihl, Yi Ding, Omer Gottesman, Liwei Lehman, Matthieu Komorowski, Aldo Faisal, David Sontag, Fredrik Johansson, Leo Celi, Aniruddh Raghu, Yao Liu, Emma Brunskill, and the CS282 2017 Course. [SLIDES]<\/a> [\/panel] [\/accordion]"}],"msr_startdate":"2019-10-03","msr_enddate":"2019-10-03","msr_event_time":"","msr_location":"New York, NY","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"October 3, 2019","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":"\"Reinforcement","event_excerpt":"Reinforcement Learning Day 2019 will share the latest research on learning to make decisions based on feedback. This workshop features talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective is to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.","msr_research_lab":[199571],"related-researchers":[{"type":"user_nicename","display_name":"Hal Daum\u00e9 III","user_id":36768,"people_section":"Section name 1","alias":"hal3"},{"type":"user_nicename","display_name":"Akshay Krishnamurthy","user_id":30913,"people_section":"Section name 1","alias":"akshaykr"}],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[395930],"related-projects":[235753],"related-opportunities":[],"related-publications":[],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/602214"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/602214\/revisions"}],"predecessor-version":[{"id":607437,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/602214\/revisions\/607437"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/602238"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=602214"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=602214"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=602214"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=602214"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=602214"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=602214"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=602214"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=602214"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=602214"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}