Projects
MSR contributions in the space of theoretical foundation for Offline RL Globally, MSR has made some recent advances in the space of the statistical foundations of Offline RL (opens in new tab), where a central question is to understand what…
In this page, we describe the algorithmic landscape of Offline RL and enumerate some algorithmic development efforts made by MSR in this space In a tutorial lecture (opens in new tab) on Offline RL (opens in new tab), we analyze its…
We intend to advance the theoretical understanding of actor-critic algorithms under the lens of policy iteration. Policy Iteration consists in a loop over two processing steps: policy evaluation and policy improvement. Policy Iteration has strong convergence properties when the policy…
Galena uses imitation learning to provide predictions of controller inputs to compensate for poor network conditions. If client-to-server lag occurs, predictions are used instead of user input.
We take a human-centered approach to transparency, empirically studying how to provide stakeholders of ML systems with the right information to achieve their goals.
This page introduces the research area of Offline Reinforcement Learning (also sometimes called Batch Reinforcement Learning). It consists in training a target policy from a fixed dataset of trajectories collected with a behavioral policy. In comparison to classic Reinforcement Learning…
Established:
Machine Reading Comprehension (MRC) is a growing field of research, due to its potential in various enterprise applications, as well as the availability of MRC benchmarking datasets.
A framework to host and train publicly available machine learning models while crowdsourcing a dataset. Ideally, using a model for prediction is free. An incentive mechanism validates added data.
The Generative Neural Visual Artist (GeNeVA) task The GeNeVA task involves a Teller giving a sequence of linguistic instructions to a Drawer for the ultimate goal of image generation. The Teller is able to gauge progress through visual feedback of…