{"id":902832,"date":"2022-12-05T09:00:00","date_gmt":"2022-12-05T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=902832"},"modified":"2022-12-01T14:09:44","modified_gmt":"2022-12-01T22:09:44","slug":"neurips-2022-seven-microsoft-research-papers-selected-for-oral-presentations","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/neurips-2022-seven-microsoft-research-papers-selected-for-oral-presentations\/","title":{"rendered":"NeurIPS 2022: Seven Microsoft Research Papers Selected for Oral Presentations"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1024x576.jpg\" alt=\"abstract banner for Microsoft at NeurIPS 2022\" class=\"wp-image-902853\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788.jpg 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Microsoft is proud to be a platinum sponsor of the 36th annual conference on&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/neurips.cc\/\" target=\"_blank\" rel=\"noreferrer noopener\">Neural Information Processing Systems<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (NeurIPS), which is widely regarded as the world\u2019s most prestigious research conference on artificial intelligence and machine learning.<\/p>\n\n\n\n<p>Microsoft has a strong presence at NeurIPS again this year, with more than 150 of our researchers participating in the conference and 122 of our research papers accepted. Our researchers are also taking part in 10 workshops, four competitions and a tutorial.<\/p>\n\n\n\n<p>In one of the workshops, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ai4sciencecommunity.github.io\/neurips22.html\" target=\"_blank\" rel=\"noreferrer noopener\">AI for Science: Progress and Promises<\/a>, a panel of leading researchers will discuss how artificial intelligence and machine learning have the potential to advance scientific discovery. The panel will include two Microsoft researchers: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/maxwelling\/\" target=\"_blank\" rel=\"noreferrer noopener\">Max Welling<\/a>, Vice President and Distinguished Scientist, Microsoft Research AI4Science, who will serve as moderator, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/petelee\/\" target=\"_blank\" rel=\"noreferrer noopener\">Peter Lee<\/a>, Corporate Vice President, Microsoft Research and Incubations.<\/p>\n\n\n\n<p>Of the 122 Microsoft research papers accepted for the conference, seven have been selected for oral presentations during the virtual NeurIPS experience the week of December 4<sup>th<\/sup>. The oral presentations provide a deeper dive into each of the featured research topics.<\/p>\n\n\n\n<p>In addition, two other Microsoft research papers received <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/neurips-2022\/\" target=\"_blank\" rel=\"noreferrer noopener\">Outstanding Paper Awards<\/a> for NeurIPS 2022. One of those papers, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/gradient-estimation-with-discrete-stein-operators\/\" target=\"_blank\" rel=\"noreferrer noopener\">Gradient Estimation with Discrete Stein Operators<\/a>, explains how researchers developed a gradient estimator that achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations, which has the potential to improve problem solving in machine learning. In the other paper, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/a-neural-corpus-indexer-for-document-retrieval\/\" target=\"_blank\" rel=\"noreferrer noopener\">A Neural Corpus Indexer for Document Retrieval<\/a>, researchers demonstrate that an end-to-end deep neural network that unifies training and indexing stages can significantly improve the recall performance of traditional document retrieval methods.<\/p>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1045008\">\n\t\t\n\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/gigapath-whole-slide-foundation-model-for-digital-pathology\/\" aria-label=\"GigaPath: Whole-Slide Foundation Model for Digital Pathology\" data-bi-cN=\"GigaPath: Whole-Slide Foundation Model for Digital Pathology\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/05\/GigaPath-TW_FB_LI.png\" alt=\"Digital pathology helps decode tumor microenvironments for precision immunotherapy. GigaPath is a novel vision transformer that can scale to gigapixel whole-slide images by adapting dilated attention for digital pathology. In joint work with Providence and UW, we\u2019re sharing Prov-GigaPath, the first whole-slide pathology foundation model pretrained on large-scale real-world data, for advancing clinical research and discovery.\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">GigaPath: Whole-Slide Foundation Model for Digital Pathology<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">Digital pathology helps decode tumor microenvironments for precision immunotherapy. In joint work with Providence and UW, we\u2019re sharing Prov-GigaPath, the first whole-slide pathology foundation model, for advancing clinical research.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/gigapath-whole-slide-foundation-model-for-digital-pathology\/\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Read more\" data-bi-cN=\"GigaPath: Whole-Slide Foundation Model for Digital Pathology\" target=\"_blank\">\n\t\t\t\t\t\t\tRead more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<p>Below we have provided the titles, authors and abstracts for all seven of the Microsoft research papers chosen for oral presentations at NeurIPS, with links to additional information for those who want to explore the topics more fully:<\/p>\n\n\n\n<h3 id=\"unimask-unified-inference-in-sequential-decision-problems\">Uni[MASK]: Unified Inference in Sequential Decision Problems<\/h3>\n\n\n\n<p><em>Micah Carroll, Orr Paradise, Jessy Lin, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rageorg\/\" target=\"_blank\" rel=\"noreferrer noopener\">Raluca Georgescu<\/a>, Mingfei Sun, David Bignell, Stephanie Milani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kahofman\/\" target=\"_blank\" rel=\"noreferrer noopener\">Katja Hofmann<\/a>, Matthew Hausknecht, Anca Dragan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sadevlin\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sam Devlin<\/a><\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>:<strong>&nbsp;<\/strong>Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks. We show that a single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine tuning, our UniMASK models consistently outperform comparable single-task models.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-1 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--cb7f7934d9f116cd9d59fb648c9870fb\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/unimask-unified-inference-in-sequential-decision-problems\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"k-lite-learning-transferable-visual-models-with-external-knowledge\">K-LITE: Learning Transferable Visual Models with External Knowledge<\/h3>\n\n\n\n<p><em>Sheng Shen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chunyl\/\" target=\"_blank\" rel=\"noreferrer noopener\">Chunyuan Li<\/a>, Xiaowei Hu, Yujia Xie, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jianwyan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jianwei Yang<\/a>, Pengchuan Zhang, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/zhgan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Zhe Gan<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/lijuanw\/\" target=\"_blank\" rel=\"noreferrer noopener\">Lijuan Wang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/luyuan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Lu Yuan<\/a>, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jianfeng Gao<\/a><\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, based on the broad concept coverage achieved through large-scale data collection process. Alternatively, we argue that learning with external knowledge about images is a promising way which leverages a much more structured source of supervision and offers sample efficiency.<\/p>\n\n\n\n<p>In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is released at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/klite\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/microsoft\/klite<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-2 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--7d27c82f49b34383174c1fb453db6b03\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/k-lite-learning-transferable-visual-models-with-external-knowledge\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"extreme-compression-for-pre-trained-transformers-made-simple-and-efficient\">Extreme Compression for Pre-trained Transformers Made Simple and Efficient<\/h3>\n\n\n\n<p><em>Xiaoxia Wu, Zhewei Yao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/MINJIAZ\/\" target=\"_blank\" rel=\"noreferrer noopener\">Minjia Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/conglli\/\" target=\"_blank\" rel=\"noreferrer noopener\">Conglong Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yuxhe\/\" target=\"_blank\" rel=\"noreferrer noopener\">Yuxiong He<\/a><\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: Extreme compression, particularly ultra-low bit precision (binary\/ternary) quantization, has been proposed to fit large NLP models on resource-constraint devices. However, to preserve the accuracy for such aggressive compression schemes, cutting-edge methods usually introduce complicated compression pipelines, e.g., multi-stage expensive knowledge distillation with extensive hyperparameter tuning. Also, they oftentimes focus less on smaller transformer models that have already been heavily compressed via knowledge distillation and lack a systematic study to show the effectiveness of their methods.<\/p>\n\n\n\n<p>In this paper, we perform a very comprehensive systematic study to measure the impact of many key hyperparameters and training strategies from previous. As a result, we find out that previous baselines for ultra-low bit precision quantization are significantly under-trained. Based on our study, we propose a simple yet effective compression pipeline for extreme compression.<\/p>\n\n\n\n<p>Our simplified pipeline demonstrates that:<\/p>\n\n\n\n<p>(1) we can skip the pre-training knowledge distillation to obtain a 5-layer \\bert while achieving better performance than previous state-of-the-art methods, like TinyBERT;<\/p>\n\n\n\n<p>(2) extreme quantization plus layer reduction is able to reduce the model size by 50x, resulting in new state-of-the-art results on GLUE tasks.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-3 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--a652a7d075ba5c2a5255feb5e1a8443f\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/extreme-compression-for-pre-trained-transformers-made-simple-and-efficient\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"on-the-complexity-of-adversarial-decision-making\">On the Complexity of Adversarial Decision Making<\/h3>\n\n\n\n<p><em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dylanfoster\/\" target=\"_blank\" rel=\"noreferrer noopener\">Dylan J Foster<\/a>, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan<\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: A central problem in online learning and decision making&#8212;from bandits to reinforcement learning&#8212;is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision-making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show\u2014via new upper and lower bounds\u2014that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results &#8211;both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and Gy\u00f6rgy.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-4 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--3bc723d4484213a4cf78b8b630eb278c\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-the-complexity-of-adversarial-decision-making\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"maximum-class-separation-as-inductive-bias-in-one-matrix\">Maximum Class Separation as Inductive Bias in One Matrix<\/h3>\n\n\n\n<p><em>Tejaswi Kasarla, Gertjan J. Burghouts, Max van Spengler, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/evanderpol\/\" target=\"_blank\" rel=\"noreferrer noopener\">Elise van der Pol<\/a>, Rita Cucchiara, Pascal Mettes<\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly.<\/p>\n\n\n\n<p>This paper proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form implementation and code to reproduce the experiments are available on GitHub.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-5 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--ceb06f7e96cab91dd7fce2d89b7c1685\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/maximum-class-separation-as-inductive-bias-in-one-matrix\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"censored-quantile-regression-neural-networks-for-distribution-free-survival-analysis\">Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis<\/h3>\n\n\n\n<p><em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/t-timpearce\/\" target=\"_blank\" rel=\"noreferrer noopener\">Tim Pearce<\/a>, Jong-Hyeon Jeong, Yichen Jia, Jun Zhu<\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: This paper considers doing quantile regression on censored data using neural networks (NNs). This adds to the survival analysis toolkit by allowing direct prediction of the target variable, along with a distribution-free characterization of uncertainty, using a flexible function approximator. We begin by showing how an algorithm popular in linear models can be applied to NNs. However, the resulting procedure is inefficient, requiring sequential optimization of an individual NN at each desired quantile. Our major contribution is a novel algorithm that simultaneously optimizes a grid of quantiles output by a single NN. To offer theoretical insight into our algorithm, we show firstly that it can be interpreted as a form of expectation-maximization, and secondly that it exhibits a desirable `self-correcting&#8217; property. Experimentally, the algorithm produces quantiles that are better calibrated than existing methods on 10 out of 12 real datasets.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-6 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--4cf957eed093794a958d7cc68e88f8d1\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/censored-quantile-regression-neural-networks\/\">Explore the topic<\/a><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<h3 id=\"learning-very-simple-generative-models-is-hard\">Learning (Very) Simple Generative Models Is Hard<\/h3>\n\n\n\n<p><em>Sitan Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jerrl\/\" target=\"_blank\" rel=\"noreferrer noopener\">Jerry Li<\/a>, Yuanzhi Li<\/em><\/p>\n\n\n\n<p><strong>Abstract<\/strong>: Motivated by the recent empirical successes of deep generative models, we study the computational complexity of the following unsupervised learning problem. For an unknown neural network \\(F:\\mathbb{R}^d\\to\\mathbb{R}^{d&#8217;}\\), let \\(D\\) be the distribution over \\(\\mathbb{R}^{d&#8217;}\\) given by pushing the standard Gaussian \\(\\mathcal{N}(0,\\textrm{Id}_d)\\) through \\(F\\). Given i.i.d. samples from \\(D\\), the goal is to output \\({any}\\) distribution close to \\(D\\) in statistical distance.<\/p>\n\n\n\n<p>We show under the statistical query (SQ) model that no polynomial-time algorithm can solve this problem even when the output coordinates of \\(F\\) are one-hidden-layer ReLU networks with \\(\\log(d)\\) neurons. Previously, the best lower bounds for this problem simply followed from lower bounds for \\(supervised\\) \\(learning\\) and required at least two hidden layers and \\(poly(d)\\) neurons [Daniely-Vardi &#8217;21, Chen-Gollakota-Klivans-Meka &#8217;22].<\/p>\n\n\n\n<p>The key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function \\(f\\) with polynomially-bounded slopes such that the pushforward of \\(\\mathcal{N}(0,1)\\) under \\(f\\) matches all low-degree moments of \\(\\mathcal{N}(0,1)\\).<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-7 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--64dd4f506fe5a9100704eb789f897ce5\"><a data-bi-type=\"button\" class=\"wp-block-button__link\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-very-simple-generative-models-is-hard\/\">Explore the topic<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft is proud to be a platinum sponsor of the 36th annual conference on&nbsp;Neural Information Processing Systems (opens in new tab) (NeurIPS), which is widely regarded as the world\u2019s most prestigious research conference on artificial intelligence and machine learning. Microsoft has a strong presence at NeurIPS again this year, with more than 150 of our [&hellip;]<\/p>\n","protected":false},"author":42183,"featured_media":902853,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-902832","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[873195],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-960x540.jpg\" class=\"img-object-cover\" alt=\"abstract banner for Microsoft at NeurIPS 2022\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2022\/11\/NeurIPS_Blog_Hero_1400x788.jpg 1400w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"December 5, 2022","formattedExcerpt":"Microsoft is proud to be a platinum sponsor of the 36th annual conference on&nbsp;Neural Information Processing Systems (opens in new tab) (NeurIPS), which is widely regarded as the world\u2019s most prestigious research conference on artificial intelligence and machine learning. Microsoft has a strong presence at&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/902832"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=902832"}],"version-history":[{"count":9,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/902832\/revisions"}],"predecessor-version":[{"id":903519,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/902832\/revisions\/903519"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/902853"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=902832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=902832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=902832"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=902832"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=902832"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=902832"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=902832"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=902832"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=902832"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=902832"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=902832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}