{"id":647928,"date":"2020-04-21T02:56:43","date_gmt":"2020-04-21T09:56:43","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research?p=647928&post_type=msr-group"},"modified":"2024-03-06T09:41:57","modified_gmt":"2024-03-06T17:41:57","slug":"bair","status":"publish","type":"msr-group","link":"https:\/\/www.microsoft.com\/en-us\/research\/collaboration\/bair\/","title":{"rendered":"Microsoft Research & Berkeley AI Research (BAIR)"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"illustration\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Microsoft Research & Berkeley AI Research (BAIR)<\/h1>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n
\n

Microsoft Research is a proud partner of the Berkeley AI Research Open Commons (BAIR (opens in new tab)<\/span><\/a>), forging strong collaborations between researchers, students and faculty pursuing research on fundamental advances in computer vision, machine learning, natural language processing, planning, control, and robotics.<\/p>\n<\/div>

\"Berkeley<\/a><\/figure><\/div>\n\n\n\n
\n

“BAIR and Microsoft share a passion for responsible, human-focused AI, from foundations to applications. It’s very exciting to bring together amazing researchers from both institutions to help each other tackle the big challenges in this domain.”<\/i><\/p>\n\u2013 Professor Anca Dragan (opens in new tab)<\/span><\/a>, BAIR<\/cite><\/blockquote>\n\n\n\n

The goal of the collaboration is to create and contribute new data and research for the open research community while advancing state-of-the-art AI research.<\/p>\n\n\n\n

\n

\u201cAt Microsoft Research we are committed to accelerating advancements in AI while collaborating with top academic institutions to support the broader AI research community,\u201d said Susan Dumais<\/a>, Microsoft Technical Fellow. \u201cI believe the BAIR partnership creates a collaborative environment with tremendous potential for researchers, students and faculty to solve fundamental problems in AI.\u201d<\/em><\/p>\n<\/blockquote>\n\n\n\n

<\/div>\n\n\n\n

Student support and compute resources<\/h2>\n\n\n\n

As part of our partnership, Microsoft Research is pleased to support the work of the PhD students participating in our collaborative projects, and provide faculty and students with the necessary Azure compute resources to assist in the pursuit of advancing state-of-the-art AI research.<\/p>\n\n\n\n\n\n

Phase 1 collaborations<\/h2>\n\n\n\n

Combining Causal Reasoning and Information Theory to Empower Humans through Human-Agent Collaboration<\/h3>\n\n\n\n

Emre Kiciman (MSR AI), Professor Anca Dragan (BAIR), Professor Pieter Abbeel (BAIR), Stas Tiomkin (Postdoc), Yuqing Du (PhD student)<\/em><\/p>\n\n\n\n

We seek to create new collaboration strategies between a human and an artificial agent in which the agent enhances the human’s capabilities through actions that increase the causal leverage, or empowerment to influence the environment, of both robot and human. That is, the agent will act to increase its own, as well as the human\u2019s, future options. The key motivations behind this strategy are the inherent safety considerations \u2013 the agent will not limit the human\u2019s actions \u2013 and the ability to provide goal-agnostic, seamless assistance. We will also explore assistance that transitions between goal-oriented and empowerment-based behaviors, depending on the agent\u2019s confidence in the human\u2019s goals.<\/p>\n\n\n\n


\n\n\n\n

Learning to Collaborate with Human Players<\/h3>\n\n\n\n

Katja Hofmann (MSR Cambridge), Sam Devlin (MSR Cambridge), Kamil Ciosek (MSR Cambridge), Professor Anca Dragan (BAIR), Micah Carroll (PhD student)<\/em><\/p>\n\n\n\n

We study how reinforcement learning approaches can enable agents in video games to learn to collaborate with human players. Current approaches are limited in their ability to generalize to real human play, for example when human players make unexpected game moves. We start by analyzing these shortcomings and consider several directions for improving over the current state of the art, including improved human behavior modeling, incorporating human biases, and improving generalization of reinforcement learning approaches in multi-agent settings.<\/p>\n\n\n\n


\n\n\n\n

Agnostic Reinforcement Learning<\/h3>\n\n\n\n

Alekh Agarwal (MSR AI), Professor Peter Bartlett (BAIR), Professor Moritz Hardt (BAIR), Juan Perdomo (PhD student)<\/em><\/p>\n\n\n\n

Our goal is to advance our theoretical understanding of reinforcement learning in domains where the available set of control policies has limited expressivity relative to the complexity of the environment. A cornerstone of statistical learning theory for supervised learning is the existence of distribution agnostic performance guarantees. We want to understand when the agent can hope to find an approximately best policy from this class, irrespective of whether better policies exist outside the set or not.<\/p>\n\n\n\n


\n\n\n\n

Accurate Inference with Adaptively Collected Data<\/h3>\n\n\n\n

Lester Mackey (MSR New England), Professor Martin Wainwright (BAIR), Koulik Khamaru (PhD student), Yash Deshpande (Postdoc)<\/em><\/p>\n\n\n\n

Estimators computed from adaptively collected data do not behave like non-adaptive estimators. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. In past work, we developed a general method \u2013 W-decorrelation \u2013 for transforming the bias of adaptive linear regression estimators into variance. We now aim to expand the scope and impact of this line of work by moving beyond the linear model setting and developing more powerful procedures that exploit additional knowledge of the data collection process.<\/p>\n\n\n\n


\n\n\n\n

Investigations into the Complexity of Nonconvex Optimization<\/h3>\n\n\n\n

Sebastien Bubeck (MSR AI), Professor Peter Bartlett (BAIR), Yeshwanth Cherapanamjeri (PhD student)<\/em><\/p>\n\n\n\n

Over the past decade, the adoption of increasingly complex machine learning models, most notably neural networks, has enabled astonishing progress in a range of applications spanning computer vision, natural language, and speech processing. However, the use of these models also presents new algorithmic challenges. In particular, the loss functions that arise in the use of such models tend to be non-convex, while existing algorithms are designed to operate on well-behaved loss surfaces. We will pursue a new angle on non-convex optimization to obtain better algorithms more suited to the non-convex nature of such problems.<\/p>\n\n\n\n


\n\n\n\n

Robustness for Deep Learning<\/h3>\n\n\n\n

Jerry Li (MSR AI), Professor Dawn Song (BAIR), Professor Jacob Steinhardt (BAIR), Dan Hendrycks (PhD student)<\/em><\/p>\n\n\n\n

This collaboration seeks to teach AI systems right from wrong and considers the impacts and dangers of future AI systems. We will develop a suite of benchmarks which measure the ethical knowledge of current ML systems, in particular, for language processing. For example, they will test outcomes from providing a contemporary NLP model a text scenario and having it predict whether everyday people would find the scenario morally objectionable or morally permissible. The more accurate the model, the more it demonstrates ethical knowledge. When commonsense moral intuitions are ambivalent, precise theories of normative ethics are necessary. To this end, we test how well systems demonstrate knowledge of moral duties, factors determining wellbeing, and finally virtues, corresponding to classical deontological, utilitarian, and Aristotelian theories of morality. We are working toward ethical AI by the modeling of human values and ethical principles.<\/p>\n\n\n\n


\n\n\n\n

Enabling Broader Learning through Simplified and Personalized Summaries<\/h3>\n\n\n\n

Paul Bennett (MSR AI), Tobias Schnabel (MSR AI), Professor Marti Hearst (BAIR), Philippe Laban (PhD student)<\/em><\/p>\n\n\n\n

Whether it is coming up to speed on the news, learning about a new topic at university, or reading up on a passionate hobby, everyone has experienced the frustration that comes with finding a potentially great resource text \u2013 only to find the text is written for the wrong audience and so long that the investment in time may not be worth the payoff. We seek to use AI techniques to enable summarizing documents in a way that: (1) simplifies the text to fit the reader\u2019s background; (2) adapts to the reader\u2019s previous knowledge; (3) accounts for the reader\u2019s overall goals. Each goal targets an increasingly ambitious step forward in AI research that holds the potential to change how we learn from text documents in almost every setting.<\/p>\n\n\n\n\n\n

Phase 2 collaborations<\/h2>\n\n\n\n

Secure and Privacy-Preserving Federated Learning<\/h3>\n\n\n\n

Kim Laine (MSR), Rob Sim (MSR), Dawn Song (BAIR), Lun Wang (PhD student), Xiaoyuan Liu (PhD student) <\/em><\/p>\n\n\n\n

Federated learning (FL) proposes a powerful new distributed learning paradigm and has grown as an active research field with large-scale real-world deployment in the last several years. In FL, participants collaboratively train a model when all the data is held locally to preserve data privacy. Despite its success, FL still faces a variety of security challenges, among which inference attacks and poisoning attacks are the most notable two categories. How to ensure privacy and model integrity under these attacks remains an open question of critical importance.<\/p>\n\n\n\n

We propose to further explore inference attacks and poisoning attacks in FL and design countermeasures. Specifically, we plan to explore the attack landscape under novel and stronger threat models, for example, malicious participants for inference attacks, and design and develop new defense approaches against these attacks.<\/p>\n\n\n\n


\n\n\n\n

Causal and Interpretable Machine Learning for Medical Images<\/h3>\n\n\n\n

Emre Kiciman (MSR), Bin Yu (BAIR), Robert Netzorg (PhD student) <\/em><\/p>\n\n\n\n

The goal of this collaboration is to develop methods that learn predictive and stable, if not causal, representations on data from various domains, medical and beyond, and then understand how the method uses those representations to make predictions, when those representations are not known beforehand. Specifically, we plan to investigate approaches to improve out-of-distribution performance or transfer learning performance in the realm of precision medicine, where it is often not clear to humans what the high-level features driving, say, a tumor (label) classification are. For doctors seeking to apply machine learning techniques to precision medicine, black-box models making predictions on spurious correlations are not sufficient. Making and communicating medical diagnoses at the individual level requires that those decisions are made on an interpretable and stable, and hopefully causal, basis. In our proposed setting, namely tumor identification from medical images, there are a multitude of deviances from previous settings: Doctors themselves develop heuristic rules for classifying tumors, images taken on different machines result in images with very different characteristics, and the learned model must be interpretable and medically meaningful for doctors to use in practice.<\/p>\n\n\n\n


\n\n\n\n

Distributed learning: privacy and data summarization<\/h3>\n\n\n\n

Lester Mackey (MSR), Nika Haghtalab (BAIR), Abishek Shetty (PhD student) <\/em><\/p>\n\n\n\n

This project will explore a wide range of multi-agent learning tasks under the lens of privacy. We will consider distributed learning (e.g., where the objective is to learn a model that performs well over the agents) and collaborative learning (e.g., where the objective is to learn a model that performs well for each agent). Our goal is to design algorithms that preserve the privacy of the data with guarantees that significantly outperform those that agents can achieve on their own. <\/b>We plan to address these challenges by bridging between privacy and data summarization. We expect privacy and data summarization to become closely related when agents have sufficiently large data sets. That is, preserving the privacy of the data will approximately translate to creating a synthetic data set that summarizes the data effectively. By exploring these connections further, our project will build synergies between two well established areas and will contribute to their further progress by providing a unified perspective for the use of ML on important and sensitive data sets.<\/p>\n\n\n\n


\n\n\n\n

Realistic Large-Scale Benchmark for Adversarial Patch<\/h3>\n\n\n\n

Jerry Li (MSR), David Wagner (BAIR), Chawin Sitawarin (PhD student), Nabeel Hingun (undergraduate student) <\/em><\/p>\n\n\n\n

The goal of our study is to make machine learning models robust against patch attacks. In particular, we will develop the first standardized benchmark for security against patch attacks, under realistic threat models. Our benchmark will cover two important aspects often ignored in past work: (1) realistic patches that must work under multiple camera angles, lighting conditions, etc., and (2) realistic constraints on the location of the patch. Then, we will develop better patch attacks, and use them together with adversarial training to improve the defense side.<\/p>\n\n\n\n


\n\n\n\n

Video Representation Learning for Global and Local Features<\/h3>\n\n\n\n

Yale Song (MSR), Avideh Zakhor (BAIR), Franklin Wang (PhD student) <\/em><\/p>\n\n\n\n

Existing video representation learning frameworks generally learn global representations of videos, usually at the clip-level. These types of representations are generally evaluated on action recognition baselines (which have a strong bias towards global appearance information), and are ill-suited for local tasks involving fine details and dense prediction, like action segmentation and tracking. In this work, we propose to learn representations that are optimized both for global tasks and local tasks by developing contrastive learning methods that operate at a spatiotemporally denser regime beyond the clip-level. Our self-supervised framework will learn global and local representations for RGB frames and motion features like optical flow to learn coarse and fine-grained representations of appearance and motion.<\/p>\n\n\n\n


\n\n\n\n

Active Visual Planning: Handling Uncertainty in Perception, Prediction, and Planning Pipelines<\/h3>\n\n\n\n

Xin Wang (MSR), Joey Gonzalez (BAIR), Charles Packer (PhD student) <\/em><\/p>\n\n\n\n

Our goal is to develop an Active Visual Planner (AVP) for multi-agent planning in environments with partial observability (e.g., limited visibility). Recent work on Active Perception has studied improving perception via prediction (e.g., repositioning a LiDAR sensor to improve object detection), however, existing approaches generally assume control of a specific sensor, and do not enable a planner to plan entire future states (e.g., vehicle position and multiple sensor configurations) with respect to uncertainty in perception. Whereas Active Perception is primarily concerned with reducing perception uncertainty as an end in itself, an AVP will plan to improve perception only if it aids the planning objective. Prior work on limited visibility reasoning is either intractable (POMDP methods), has constraints that severely limit real-world application (game-theoretic methods), or is overly conservative and does not account for the effect of an agent\u2019s future actions on its own perception, or on other agents\u2019 future state (Forward Reachable Set methods). Prior work on contingency planning can explicitly reason about uncertainty in agent behavior but does not account for uncertainty in perception.<\/p>\n\n\n\n


\n\n\n\n

Nonconvex Optimization and Robustness of Neural Networks<\/h3>\n\n\n\n

Sebastien Bubeck (MSR), Peter Bartlett (BAIR), Yeshwanth Cherapanamjeri (PhD student) <\/em><\/p>\n\n\n\n

Machine learning based systems are set to play an increasing role in everyday life due to the increased abundance of large-scale training data and the use of sophisticated statistical models such as neural networks. Considering these recent trends, much recent attention has been devoted towards understanding both how robust and reliable these methods are when deployed in the real world and the computational complexity of actually learning them from data. In our collaboration so far, we have adopted a theoretical perspective on each of these questions and plan to explore and empirically validate them in future work.<\/p>\n\n\n\n


\n\n\n\n

Reinforcement Learning in High Dimensional Systems<\/h3>\n\n\n\n

Sham Kakade (MSR), Akshay Krishnamurthy (MSR), Peter Bartlett (BAIR), Juan C Pedromo (PhD student) <\/em><\/p>\n\n\n\n

The goal of this collaboration is to explore the limits and possibilities of sequential decision making in complex, high-dimensional environments. Compared with more classical settings such as supervised learning, relatively little is known regarding the minimal assumptions, representational conditions, and algorithmic principles needed to enable sample-efficient learning in complex control systems with rich sets of actions and observations. Given recent empirical breakthroughs in robotics and game playing ([SHM+16], [MKS+15]), we believe that it is a timely moment to develop our understanding of the theoretical foundations of reinforcement learning (RL). In doing so, we aim to identify new algorithmic techniques and theoretical insights which may serve to democratize RL into a well-founded, mature technology that can be routinely used in practice by non-experts.<\/p>\n\n\n\n


\n\n\n\n

Towards Human-like Attention<\/h3>\n\n\n\n

Xin Wang (MSR), Trevor Darrell (BAIR), Baifeng Shi (PhD student) <\/em><\/p>\n\n\n\n

Transformers are shown to achieve stronger performance and robustness to distribution shift and image perturbation and random\/adversarial noise than CNN models, thanks to the self-attention mechanism. However, there is increasing evidence suggesting some flaws of self-attention, for example,<\/p>\n\n\n\n

    \n
  • The output of self-attention tends to converge to a rank-1 matrix, thus losing most of the information in the features.<\/li>\n\n\n\n
  • Self-attention tends to look at different parts of an object separately based on pixel\/patch similarity, instead of attending to the whole object based on semantics.<\/span><\/li>\n\n\n\n
  • Self-attention can accurately recognize patch-shuffled images, suggesting that it\u2019s neglecting position information even with positional encoding added.<\/span><\/li>\n<\/ul>\n\n\n\n

    All these phenomena make us wonder if the current self-attention mechanism is the best we could do, or even the right thing to do. Specifically, self-attention seems different from the attention mechanism in the human visual system (which will be elaborated in \u201cNovelty and Innovation\u201d). To this end, we propose to improve the current self-attention mechanism (ideally borrowing ideas from the human visual system) to fix its flaws and boost the performance and robustness.<\/p>\n\n\n\n


    \n\n\n\n

    Towards GPT-3 Style Vision-and-Language Model Scaling<\/h3>\n\n\n\n

    Pengchuan Zhao (MSR), Trevor Darrell (BAIR) <\/em><\/p>\n\n\n\n

    The one-year goal of this project is to scale a GPT-3 vision-and-language model that will be pre-trained over large-scale vision-and-language datasets while validating the generalization\/expandability\/debuggability of the model over a range of vision-language tasks (including phrase grounding task, visual question answering, referring expression comprehension, referring expression segmentation, and instruction following).<\/p>\n\n\n\n


    \n\n\n\n

    Pretraining Efficient Vision-Language Transformers with View- and Region-level Supervision That Encourage Cross-modal Alignment<\/h3>\n\n\n\n

    Pengchuan Zhao (MSR), Trevor Darrell (BAIR), Dong Huk Park (PhD student) <\/em><\/p>\n\n\n\n

    The goals of this research include (1) Maintaining or reducing the number of the model parameters compared to ViLT; (2) Maintaining or increasing the throughput compared to ViLT; (3) Outperforming ViLT and other region feature-based VLP models on various downstream tasks; (4) Compare with MDETR on V+L grounding tasks and open-vocabulary object detection tasks.<\/p>\n\n\n\n


    \n\n\n\n

    ML-Based Robotic Manipulation via the Use of Language-Annotated Diverse Datasets<\/h3>\n\n\n\n

    Andrey Kolobov (MSR), Sergey Levine (BAIR), Frederik Ebert (PhD student) <\/em><\/p>\n\n\n\n

    Thanks to advances in robotic manipulation hardware, its physical capabilities are approaching those of a human hand, potentially enabling robots to assist people in tedious or undesirable tasks from datacenter maintenance to sorting trash. However, the difficulty of constructing robust policies that enable versatile manipulation hardware to operate in weakly structured environments is still prohibitive. The focus of our project is overcoming this formidable challenge by pretraining robotic manipulation agents using diverse data that is only weakly relevant to the target task and operating environment. Our approach relies on leveraging language in task demonstrations to train a robot to induce dense reward functions for other, previously unseen tasks, and using these reward functions for learning manipulation policies via a combination of reinforcement and imitation learning.<\/p>\n\n\n\n


    \n\n\n\n

    Enabling Non-Experts to Annotate Complex Logical Forms at Scale<\/h3>\n\n\n\n

    Jason Eisner (MSR), Dan Klein (BAIR), Ruiqi Zhong (PhD student) <\/em><\/p>\n\n\n\n

    We propose to use non-expert annotators, who do not understand logical forms. We hope our research can lead to better semantic parsers with lower cost, and that it can be incorporated into tools — such as the Semantic Machines SDK — that allow non-experts to collect data and build their own semantic parsers. We will take an active learning approach with indirect supervision.<\/p>\n\n\n\n


    \n\n\n\n

    Pre-trained Representations for Language-Guided Web Navigation<\/h3>\n\n\n\n

    Xiaodong Liu (MSR), Dan Klein (BAIR), Kevin Lin (PhD student), Cathy Chen (PhD student), Nikita Kitaev (PhD student), Jessy Lin (PhD student) <\/em><\/p>\n\n\n\n

    Most existing web navigation assistants rely on text-only pretrained representations, which do not take advantage of structural information in web pages. However, rich structured representations of webpages are needed to ground natural language requests more effectively into webpage elements.<\/p>\n\n\n\n

    To create more effective web assistants, we propose to (a) determine a good architecture for constructing context-dependent representations of webpages and their text, (b) train this architecture in a self-supervised manner, using only raw webpages obtained from the Internet, and (c) demonstrate that the provided representations provide benefits on tasks involving webpages.<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"

    Microsoft Research is a proud partner of the Berkeley AI Research Open Commons (BAIR),\u00a0forging strong collaborations between researchers, students and faculty pursuing research on fundamental advances in\u00a0computer vision, machine learning, natural language processing, planning, control, and robotics. The goal of the collaboration is to create and contribute new data and research for the open research community while\u00a0advancing state-of-the-art AI research.<\/p>\n","protected":false},"featured_media":649728,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":true,"_classifai_error":"","msr_group_start":"","footnotes":""},"research-area":[13556,13562,13545],"msr-group-type":[243721],"msr-locale":[268875],"msr-impact-theme":[],"class_list":["post-647928","msr-group","type-msr-group","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-group-type-collaboration","msr-locale-en_us"],"msr_group_start":"","msr_detailed_description":"","msr_further_details":"","msr_hero_images":[],"msr_research_lab":[199565],"related-researchers":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-projects":[],"related-events":[],"related-opportunities":[],"related-posts":[],"tab-content":[{"id":0,"name":"Phase 1 collaborations","content":"Combining Causal Reasoning and Information Theory to Empower Humans through Human-Agent Collaboration<\/b>\r\nEmre Kiciman (MSR AI), Professor Anca Dragan (BAIR), Professor Pieter Abbeel (BAIR), Stas Tiomkin (Postdoc), Yuqing Du (PhD student)<\/em>\r\n\r\nWe seek to create new collaboration strategies between a human and an artificial agent in which the agent enhances the human's capabilities through actions that increase the causal leverage, or empowerment to influence the environment, of both robot and human. That is, the agent will act to increase its own, as well as the human\u2019s, future options. The key motivations behind this strategy are the inherent safety considerations \u2013 the agent will not limit the human\u2019s actions \u2013 and the ability to provide goal-agnostic, seamless assistance. We will also explore assistance that transitions between goal-oriented and empowerment-based behaviors, depending on the agent\u2019s confidence in the human\u2019s goals.\r\n\r\nLearning to Collaborate with Human Players<\/b>\r\nKatja Hofmann (MSR Cambridge), Sam Devlin (MSR Cambridge), Kamil Ciosek (MSR Cambridge), Professor Anca Dragan (BAIR), Micah Carroll (PhD student)<\/em>\r\n\r\nWe study how reinforcement learning approaches can enable agents in video games to learn to collaborate with human players. Current approaches are limited in their ability to generalize to real human play, for example when human players make unexpected game moves. We start by analyzing these shortcomings and consider several directions for improving over the current state of the art, including improved human behavior modeling, incorporating human biases, and improving generalization of reinforcement learning approaches in multi-agent settings.\r\n\r\nAgnostic Reinforcement Learning<\/b>\r\nAlekh Agarwal (MSR AI), Professor Peter Bartlett (BAIR), Professor Moritz Hardt (BAIR), Juan Perdomo (PhD student)<\/em>\r\n\r\nOur goal is to advance our theoretical understanding of reinforcement learning in domains where the available set of control policies has limited expressivity relative to the complexity of the environment. A cornerstone of statistical learning theory for supervised learning is the existence of distribution agnostic performance guarantees. We want to understand when the agent can hope to find an approximately best policy from this class, irrespective of whether better policies exist outside the set or not.\r\n\r\nAccurate Inference with Adaptively Collected Data<\/strong>\r\nLester Mackey (MSR New England), Professor Martin Wainwright (BAIR), Koulik Khamaru (PhD student), Yash Deshpande (Postdoc)<\/em>\r\n\r\nEstimators computed from adaptively collected data do not behave like non-adaptive estimators. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. In past work, we developed a general method \u2013 W-decorrelation \u2013 for transforming the bias of adaptive linear regression estimators into variance. We now aim to expand the scope and impact of this line of work by moving beyond the linear model setting and developing more powerful procedures that exploit additional knowledge of the data collection process.\r\n\r\nInvestigations into the Complexity of Nonconvex Optimization<\/b>\r\nSebastien Bubeck (MSR AI), Professor Peter Bartlett (BAIR), Yeshwanth Cherapanamjeri (PhD student)<\/em>\r\n\r\nOver the past decade, the adoption of increasingly complex machine learning models, most notably neural networks, has enabled astonishing progress in a range of applications spanning computer vision, natural language, and speech processing. However, the use of these models also presents new algorithmic challenges. In particular, the loss functions that arise in the use of such models tend to be non-convex, while existing algorithms are designed to operate on well-behaved loss surfaces. We will pursue a new angle on non-convex optimization to obtain better algorithms more suited to the non-convex nature of such problems.\r\n\r\nRobustness for Deep Learning<\/b>\r\nJerry Li (MSR AI), Professor Dawn Song (BAIR), Professor Jacob Steinhardt (BAIR), Dan Hendrycks (PhD student)<\/em>\r\n\r\nThis collaboration seeks to teach AI systems right from wrong and considers the impacts and dangers of future AI systems. We will develop a suite of benchmarks which measure the ethical knowledge of current ML systems, in particular, for language processing. For example, they will test outcomes from providing a contemporary NLP model a text scenario and having it predict whether everyday people would find the scenario morally objectionable or morally permissible. The more accurate the model, the more it demonstrates ethical knowledge. When commonsense moral intuitions are ambivalent, precise theories of normative ethics are necessary. To this end, we test how well systems demonstrate knowledge of moral duties, factors determining wellbeing, and finally virtues, corresponding to classical deontological, utilitarian, and Aristotelian theories of morality. We are working toward ethical AI by the modeling of human values and ethical principles.\r\n\r\nEnabling Broader Learning through Simplified and Personalized Summaries<\/b>\r\nPaul Bennett (MSR AI), Tobias Schnabel (MSR AI), Professor Marti Hearst (BAIR), Philippe Laban (PhD student)<\/em>\r\n\r\nWhether it is coming up to speed on the news, learning about a new topic at university, or reading up on a passionate hobby, everyone has experienced the frustration that comes with finding a potentially great resource text \u2013 only to find the text is written for the wrong audience and so long that the investment in time may not be worth the payoff. We seek to use AI techniques to enable summarizing documents in a way that: (1) simplifies the text to fit the reader\u2019s background; (2) adapts to the reader\u2019s previous knowledge; (3) accounts for the reader\u2019s overall goals. Each goal targets an increasingly ambitious step forward in AI research that holds the potential to change how we learn from text documents in almost every setting."},{"id":1,"name":"Phase 2 collaborations","content":"Secure and Privacy-Preserving Federated Learning\r\n<\/b>Kim Laine (MSR), Rob Sim (MSR), Dawn Song (BAIR), Lun Wang (PhD student), Xiaoyuan Liu (PhD student)\u00a0<\/em>\r\n\r\nFederated learning (FL) proposes a powerful new distributed learning paradigm and has grown as an active research field with large-scale real-world deployment in the last several years. In FL, participants collaboratively train a model when all the data is held locally to preserve data privacy. Despite its success, FL still faces a variety of security challenges, among which inference attacks and poisoning attacks are the most notable two categories. How to ensure privacy and model integrity under these attacks remains an open question of critical importance.\r\n\r\nWe propose to further explore inference attacks and poisoning attacks in FL and design countermeasures. Specifically, we plan to explore the attack landscape under novel and stronger threat models, for example, malicious participants for inference attacks, and design and develop new defense approaches against these attacks.\r\n\r\nCausal and Interpretable Machine Learning for Medical Images\r\n<\/b>Emre Kiciman (MSR), Bin Yu (BAIR), Robert Netzorg (PhD student)\u00a0<\/em>\r\n\r\nThe goal of this collaboration is to develop methods that learn predictive and stable, if not causal, representations on data from various domains, medical and beyond, and then understand how the method uses those representations to make predictions, when those representations are not known beforehand. Specifically, we plan to investigate approaches to improve out-of-distribution performance or transfer learning performance in the realm of precision medicine, where it is often not clear to humans what the high-level features driving, say, a tumor (label) classification are. For doctors seeking to apply machine learning techniques to precision medicine, black-box models making predictions on spurious correlations are not sufficient. Making and communicating medical diagnoses at the individual level requires that those decisions are made on an interpretable and stable, and hopefully causal, basis. In our proposed setting, namely tumor identification from medical images, there are a multitude of deviances from previous settings: Doctors themselves develop heuristic rules for classifying tumors, images taken on different machines result in images with very different characteristics, and the learned model must be interpretable and medically meaningful for doctors to use in practice.\r\n\r\nDistributed learning: privacy and data summarization\r\n<\/b>Lester Mackey (MSR), Nika Haghtalab (BAIR), Abishek Shetty (PhD student)\u00a0<\/em>\r\n\r\nThis project will explore a wide range of multi-agent learning tasks under the lens of privacy. We will consider distributed learning (e.g., where the objective is to learn a model that performs well over the agents) and collaborative learning (e.g., where the objective is to learn a model that performs well for each agent). Our goal is to design algorithms that preserve the privacy of the data with guarantees that significantly outperform those that agents can achieve on their own.\u00a0<\/b>We plan to address these challenges by bridging between privacy and data summarization. We expect privacy and data summarization to become closely related when agents have sufficiently large data sets. That is, preserving the privacy of the data will approximately translate to creating a synthetic data set that summarizes the data effectively. By exploring these connections further, our project will build synergies between two well established areas and will contribute to their further progress by providing a unified perspective for the use of ML on important and sensitive data sets.\r\n\r\nRealistic Large-Scale Benchmark for Adversarial Patch\r\n<\/b>Jerry Li (MSR), David Wagner (BAIR), Chawin Sitawarin (PhD student), Nabeel Hingun (undergraduate student)\u00a0<\/em>\r\n\r\nThe goal of our study is to make machine learning models robust against patch attacks.\u00a0In particular, we\u00a0will develop the first standardized benchmark for security against patch attacks, under realistic threat models. Our benchmark will cover two important aspects often ignored in past work: (1) realistic patches that must work under multiple camera angles, lighting conditions, etc., and (2) realistic constraints on the location of the patch. Then, we will develop better patch attacks, and use them together with adversarial training to improve the defense side.\r\n\r\nVideo Representation Learning for Global and Local Features\r\n<\/b>Yale Song (MSR), Avideh Zakhor (BAIR), Franklin Wang (PhD student)\u00a0<\/em>\r\n\r\nExisting video representation learning frameworks generally learn global representations of videos, usually at the clip-level. These types of representations are generally evaluated on action recognition baselines (which have a strong bias towards global appearance information), and\u00a0are ill-suited for local tasks involving fine details and dense prediction, like action segmentation and tracking. In this work, we propose to learn representations that are optimized both for global tasks and local tasks by developing contrastive learning methods that operate at a spatiotemporally denser regime beyond the clip-level. Our self-supervised framework will learn global and local representations for RGB frames and motion features like optical flow to learn coarse and fine-grained representations of appearance and motion\r\n\r\nActive Visual Planning: Handling Uncertainty in Perception, Prediction, and Planning Pipelines\r\n<\/b>Xin Wang (MSR), Joey Gonzalez (BAIR), Charles Packer (PhD student)\u00a0<\/em>\r\n\r\nOur goal is to develop an Active Visual Planner (AVP) for multi-agent planning in environments with partial observability (e.g., limited visibility). Recent work on Active Perception has studied improving perception via prediction (e.g., repositioning a LiDAR sensor to improve object detection), however, existing approaches generally assume control of a specific sensor, and do not enable a planner to plan entire future states (e.g., vehicle position and multiple sensor configurations) with respect to uncertainty in perception. Whereas Active Perception is primarily concerned with reducing perception uncertainty as an\u00a0end in itself, an\u00a0AVP will plan to improve perception only if it aids the planning objective. Prior work on limited visibility reasoning is either intractable (POMDP methods), has constraints that severely limit real-world application (game-theoretic methods), or is overly conservative and does not account for the effect of an agent\u2019s future actions on its own perception, or on other agents\u2019 future state (Forward Reachable Set methods). Prior work on contingency planning can explicitly reason about uncertainty in agent\u00a0behavior but\u00a0does not account for uncertainty in perception.\r\n\r\nNonconvex Optimization and Robustness of Neural Networks\r\n<\/b>Sebastien Bubeck (MSR), Peter Bartlett (BAIR), Yeshwanth Cherapanamjeri (PhD student)\u00a0<\/em>\r\n\r\nMachine learning based systems are set to play an increasing role in everyday life due to the increased abundance of\u00a0large-scale\u00a0training data and the use of sophisticated statistical models such as neural networks.\u00a0Considering\u00a0these recent trends, much recent attention has been devoted towards understanding both how robust and reliable these methods are when deployed in the real world and the computational complexity of\u00a0actually learning\u00a0them from data. In our collaboration so far, we have adopted a theoretical perspective on each of these questions and plan to explore and empirically validate them in future work.\r\n\r\nReinforcement Learning in High Dimensional Systems\r\n<\/b>Sham Kakade (MSR), Akshay Krishnamurthy (MSR), Peter Bartlett (BAIR), Juan C Pedromo (PhD student)\u00a0<\/em>\r\n\r\nThe goal of this collaboration is to explore the limits and possibilities of sequential decision making in complex, high-dimensional environments. Compared with more classical settings such as supervised learning, relatively little is known regarding the minimal assumptions, representational conditions, and algorithmic principles needed to enable sample-efficient learning in complex control systems with rich sets of actions and observations. Given recent empirical breakthroughs in robotics and game playing ([SHM+16], [MKS+15]), we believe that it is a timely moment to develop our understanding of the theoretical foundations of reinforcement learning (RL). In doing so, we aim to identify new algorithmic techniques and theoretical insights which may serve to democratize RL into a well-founded, mature technology that can be routinely used in practice by non-experts.\r\n\r\nTowards Human-like Attention\r\n<\/b>Xin Wang (MSR), Trevor Darrell (BAIR), Baifeng Shi (PhD student)\u00a0<\/em>\r\n\r\nTransformers are shown to achieve stronger\u00a0performance and robustness to distribution shift and image perturbation and random\/adversarial noise than CNN models, thanks to the self-attention mechanism. However, there\u00a0is\u00a0increasing evidence suggesting some flaws of self-attention, for example,\r\n

      \r\n \t
    • The output of self-attention tends to converge to a rank-1 matrix, thus losing most of the information in the features.<\/li>\r\n \t
    • Self-attention tends to look at different parts of an object separately based on pixel\/patch similarity, instead of attending to the whole object based on semantics.<\/span><\/li>\r\n \t
    • Self-attention can accurately recognize patch-shuffled images, suggesting that it\u2019s neglecting position information even with positional encoding added.<\/span><\/li>\r\n<\/ul>\r\nAll these phenomena make us wonder if the current self-attention mechanism is the best we could do, or even the right thing to do. Specifically, self-attention seems different from the attention mechanism in the human visual system (which will be elaborated in \u201cNovelty and Innovation\u201d). To this end, we propose to improve the current self-attention mechanism (ideally borrowing ideas from the human visual system)\u00a0to\u00a0fix its flaws and boost\u00a0the performance\u00a0and robustness.\r\n\r\nTowards GPT-3 Style Vision-and-Language Model Scaling\r\n<\/b>Pengchuan Zhao (MSR), Trevor Darrell (BAIR)\u00a0<\/em>\r\n\r\nThe one-year goal of this project is to scale a GPT-3 vision-and-language model that will be pre-trained over large-scale vision-and-language datasets while validating the generalization\/expandability\/debuggability of the model over a range of vision-language tasks (including phrase grounding task, visual question answering, referring expression comprehension, referring expression segmentation, and instruction following).\r\n\r\nPretraining Efficient Vision-Language Transformers with View- and Region-level Supervision That Encourage Cross-modal Alignment\r\n<\/b>Pengchuan Zhao (MSR), Trevor Darrell (BAIR), Dong Huk Park (PhD student)\u00a0<\/em>\r\n\r\nThe goals\u00a0of this research\u00a0include\u00a0(1)\u00a0Maintaining or reducing the number of the model parameters compared to\u00a0ViLT; (2) Maintaining or increasing the throughput compared to\u00a0ViLT;\u00a0(3) Outperforming\u00a0ViLT\u00a0and other region feature-based VLP models on various downstream tasks; (4) Compare with MDETR on V+L grounding tasks and open-vocabulary object detection tasks.\r\n\r\nML-Based Robotic Manipulation via the Use of Language-Annotated Diverse Datasets\r\n<\/b>Andrey Kolobov (MSR), Sergey Levine (BAIR), Frederik Ebert (PhD student)\u00a0<\/em>\r\n\r\nThanks to advances in robotic manipulation hardware, its physical capabilities are approaching those of a human hand, potentially enabling robots to assist people in tedious or undesirable tasks from datacenter maintenance to sorting trash. However, the difficulty of constructing robust policies that enable versatile manipulation hardware to operate in weakly structured environments is still prohibitive. The focus of our project is overcoming this formidable challenge by pretraining robotic manipulation agents using diverse data that is only weakly relevant to the target task and operating environment. Our approach relies on leveraging language in task demonstrations to train a robot to induce dense reward functions for other, previously unseen tasks, and using these reward functions for learning manipulation policies via a combination of reinforcement and imitation learning.\r\n\r\nEnabling Non-Experts to Annotate Complex Logical Forms at Scale\r\n<\/b>Jason Eisner (MSR), Dan Klein (BAIR), Ruiqi Zhong (PhD student)\u00a0<\/em>\r\n\r\nWe propose to use non-expert annotators, who do not understand logical forms. We hope our research can lead to better semantic parsers with lower cost, and that it can be incorporated into tools -- such as the Semantic Machines SDK -- that allow non-experts to collect data and build their own semantic parsers. We will take an active learning approach with indirect supervision.\r\n\r\nPre-trained Representations for Language-Guided Web Navigation\r\n<\/b>Xiaodong Liu (MSR), Dan Klein (BAIR), Kevin Lin (PhD student), Cathy Chen (PhD student), Nikita Kitaev (PhD student), Jessy Lin (PhD student)\u00a0<\/em>\r\n\r\nMost existing web navigation assistants rely on text-only pretrained representations, which do not take advantage of structural information in web pages. However, rich structured representations of webpages are needed to\u00a0ground natural language requests more effectively\u00a0into webpage elements.\r\n\r\nTo create more effective web assistants, we propose to (a) determine a good architecture for constructing context-dependent representations of webpages and their text, (b) train this architecture in a self-supervised manner, using only raw webpages obtained from the Internet, and (c) demonstrate that the provided representations provide benefits on tasks involving webpages."}],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/647928"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-group"}],"version-history":[{"count":39,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/647928\/revisions"}],"predecessor-version":[{"id":1012353,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group\/647928\/revisions\/1012353"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/649728"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=647928"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=647928"},{"taxonomy":"msr-group-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-group-type?post=647928"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=647928"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=647928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}