Reinforcement Learning Day 2019

Speaker: Sheila McIlraith Humans have evolved languages over thousands of years to provide useful abstractions for understanding and interacting with each other and with the physical world. Such languages include natural languages, mathematical languages and calculi, and most recently formal languages that enable us to interact with machines via human-interpretable abstractions. In this talk, I present the notion of a Reward Machine, an automata-based structure that provides a normal form representation for reward functions. Reward Machines can be used natively to specify complex, non-Markovian reward-worthy behavior. Furthermore, a variety of compelling human-friendly (formal) languages can be used as reward specification languages and straightforwardly translated into Reward Machines, including variants of Linear Temporal Logic (LTL), and a variety of regular languages. Reward Machines can also be learned and can be used as memory for interaction in partially-observable environments. By exposing reward function structure, Reward Machines enable reward-function-tailored reinforcement learning, including tailored reward shaping and Q-learning. Experiments show that such reward-function-tailored algorithms significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise can’t reasonably be solved and critically reducing the sample complexity. [SLIDES]

Speaker: Sam Devlin Reinforcement learning is the only form of machine learning that has been commonly allowed to train on its test set. Deep reinforcement learning in particular has been show to overfit the environments it trains on. In this talk I will discuss results from two of our recent papers (1) showing the application of domain randomization to navigation in unseen 3D mazes (published at the IEEE Conference on Games 2019); and (2) proposing selective noise injection via a variational information bottleneck to improve generalization to unseen test levels of the 2D platformer CoinRun (to be published at the Thirty-third Conference on Neural Information Processing Systems – NeurIPS 2019). [SLIDES]

Speaker: Christopher Amato This talk will cover our recent multi-agent reinforcement learning methods for coordinating teams of agents with limited or no communication. The methods will include deep multi-agent reinforcement learning approaches and hierarchical methods that learn asynchronous policies that realistically allow learning and/or execution to take place at different times for different agents. The approaches are scalable to large spaces and horizons and robust to non-stationarity caused by other agents learning. Results from benchmark and multi-robot domains will be shown. [SLIDES]

Speaker: Mengdi Wang Recent years have witnessed increasing empirical successes in reinforcement learning (RL). However, many theoretical questions about RL were not well understood. For example, how many observations are necessary and sufficient for learning a good policy? How to learn to control using structural information with provable regret? In this talk, we discuss the statistical efficiency of RL, with and without structural information such as linear feature representation, and show how to algorithmically learn the optimal policy with nearly minimax-optimal complexity. Complexity of RL algorithms largely depend on dimension of state features. Towards reducing the dimension of RL, we discuss a state embedding learning method that automatically learns state features and aggregation structures from trajectory data.

Speaker: Philip Thomas In this talk I will discuss some of our upcoming work on a new framework for designing machine learning algorithms that both 1) makes it easier for the user of the algorithm to define what they consider to be undesirable behavior (e.g., what they consider to be unfair, unsafe, or costly) and 2) provides a high-confidence guarantee that it will not produce a solution that exhibits the user-defined undesirable behavior. [SLIDES]

Speaker: Asli Celikyilmaz The last two years have seen the introduction of several new tasks at the intersection of language and vision. The most popular of which is the Vision-Language Navigation (VLN) task introduced in 2018. The task places an agent randomly inside a home and instructs them to navigate to a target destination based on a natural language command. Success in this domain requires building multimodal language groundings that allow the agent to successfully navigate while reasoning about vision-language dynamics. Within MSR, we have significantly pushed the state-of-the-art in this space with a combination of approaches that utilize search, imitation learning, and pretraining. The fundamental underlying assumption about the tasks like VLN is that we will build agents that execute our commands. The way we train these agents is by providing examples of observation-action tuples, turning it into a unidirectional language. We train our agents to execute our commands but not necessary teach the agents how to react when uncertainties in the environment arise. In this talk I will present our recent work on reinforcement, and imitation learning and pretraining methods for VLN task, and present our new thinking into a more general problem for understanding how a system asks for and receives assistance with the goal of exploring techniques to transfer and generalize for vision-language navigation research field. [SLIDES]

Speaker: Geoff Gordon Reinforcement learning has had many successes in domains where experience is inexpensive, such as video games or board games. RL algorithms for such domains are often based on gradient descent: they make many noisy updates with a small learning rate. We instead examine algorithms that spend more computation per update, in an attempt to reduce noise and make larger updates; such algorithms are appropriate when experience is more expensive than compute time. In particular we look at several methods based on approximate policy iteration. [SLIDES]

Speaker: Finale Doshi-Velez In many health settings, we have available a large amounts of longitudinal, partial views of a patient (e.g. what has been coded in health records, or recorded from various monitors). How can this information be used to improve patient care? In this talk, I’ll present work that our lab has done in batch reinforcement learning, an area of reinforcement learning that assumes that the agent may access data but not explore actions. I will discuss not only algorithms for optimization and off-policy evaluation, but also ways in which we are working toward providing making it easier for clinical experts to specify problems as well as process the outputs for validity. In this way, I will also touch on problems that we can expect batch reinforcement learning to solve, and problems that it cannot. This work is in collaboration with Srivatsan Srinivasan, Isaac Lage, Dafna Lifshcitz, Ofra Amir, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Xuefeng Peng, David Wihl, Yi Ding, Omer Gottesman, Liwei Lehman, Matthieu Komorowski, Aldo Faisal, David Sontag, Fredrik Johansson, Leo Celi, Aniruddh Raghu, Yao Liu, Emma Brunskill, and the CS282 2017 Course. [SLIDES]

Structuring reward function specifications and reducing sample complexity in reinforcement learning

Generalization in Reinforcement Learning with Selective Noise Injection

Scalable and Robust Multi-Agent Reinforcement Learning

Reinforcement Learning From Small Data In Feature Space

Safe and Fair Reinforcement Learning

Grounding Natural Language for Embodied Agents

Learning for policy improvement

Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare