February 28, 2020

Machine Learning & Pizza

17:30-19:00

Location: Microsoft Research Cambridge

Feb 28, 2020:

Speaker: Eric Nalisnick (University of Cambridge)

Title: Dropout as a Structured Shrinkage Prior

Abstract: Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of “co-adapted” weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network’s weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior “automatic depth determination” as it is the natural analog of “automatic relevance determination” for network depth.

Speaker: Andreas Damianou (Amazon)

Title: Gaussian processes with neural network inductive biases for fast domain adaptation

Abstract: Recent advances in learning algorithms for deep neural networks allows us to train such models efficiently to obtain rich feature representations. However, when it comes to transfer learning, local optima in the high-dimensional parameter space still pose a severe problem. Motivated by this issue, we propose a framework for performing probabilistic transfer learning in the function space while, at the same time, leveraging the rich representations offered by deep neural networks. Our approach consists of linearizing neural networks to produce a Gaussian process model with covariance function given by the network’s Jacobian matrix. The result is a closed-form probabilistic model which allows fast domain adaptation with accompanying uncertainty estimation.

Jan 31, 2020:

Speaker: Marc Brockschmidt (MSR Cambridge)

Title: Editing Sequences by Copying Spans

Abstract: Neural sequence-to-sequence models are finding increasing use in editing of documents, for example in correcting a text document or repairing source code. In this paper, we argue that existing seq2seq models (with a facility to copy single tokens) are not a natural fit for such tasks, as they have to explicitly copy each unchanged token. We present an extension of seq2seq models capable of copying entire spans of the input to the output in one step, greatly reducing the number of decisions required during inference. This extension means that there are now many ways of generating the same output, which we handle by deriving a new objective for training and a variation of beam search for inference that explicitly handle this problem.

Speaker: Vincent Dutordoir (Prowler)

Title: Bayesian Image Classification with Deep Convolutional Gaussian Processes

Abstract: There is a lot of focus on Bayesian deep learning at the moment, with many researchers tackling this problem by building on top of neural networks and making the inference look more Bayesian. In this talk, I’m going to follow a different strategy and use a Gaussian process, which is a well-understood probabilistic method with many attractive properties, as a primitive building block to construct fully Bayesian deep learning models. We show that the accuracy of these Bayesian methods, and the quality of their posterior uncertainties, depend strongly on the suitability of the modelling assumptions made in the prior, and that Bayesian inference by itself is often not enough. This motivates the development of a novel convolutional kernel, which leads to improved uncertainty and accuracy on a range of different problems.