Portrait on green background, header for New England Machine Learning Day event page

May 12, 2017

New England Machine Learning Day 2017

Location: Cambridge, MA, USA

9:50 – 10:00
Opening remarks

10:00 – 10:30, Leslie Pack Kaelbling, Massachusetts Institute of Technology
Intelligent robots redux
The fields of AI and robotics have made great improvements in many individual subfields, including in motion planning, symbolic planning, probabilistic reasoning, perception, and learning. Our goal is to develop an integrated approach to solving very large problems that are hopelessly intractable to solve optimally. We make a number of approximations during planning, including serializing subtasks, factoring distributions, and determinizing stochastic dynamics, but regain robustness and effectiveness through a continuous state-estimation and replanning process. I will describe our initial approach to this problem, as well as recent work on improving effectiveness and efficiency through learning, and speculate a bit about the role of learning in generally intelligent robots.

10:35 – 11:05, Alexander Rush, Harvard University
Structured attention networks
Recent deep learning systems for NLP and related fields have relied heavily on the use of neural attention, which allows models to learn to focus on selected regions of their input or memory. The use of neural attention has proven to be a crucial component for advances in machine translation, image captioning, question answering, summarization, end-to-end speech recognition, and more. In this talk, I will give an overview of the current uses of neural attentionand memory, describe how the selection paradigm has provided NLP researchers flexibility in designing neural models, and demonstrate some fun applications of this approach from our group.

I will then argue that selection-based attention may be an unnecessarily simplistic approach for NLP, and discuss our recent work on Structured Attention Networks [Kim et al 2017]. These models integrate structured prediction as a hidden layer within deep neural networks to form a variant of attention that enables soft-selection over combinatorial structures, such as segmentations, labelings, and even parse trees. While this approach is inspired by structuredprediction methods in NLP, building structured attention layers within a deep network is quite challenging, and I will describe the interesting dynamic programming approach needed for exact computation. Experiments test the approach on a range of NLP tasks including translation, question answering, and natural language inference, demonstrating improvements upon standard attention in performance and interpretability.

11:10 – 11:40, Lester Mackey, Microsoft Research
Measuring sample quality with Stein’s method
Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernelevaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein’s method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.

11:40 – 1:45
Lunch and posters

1:45 – 2:15, Thomas Serre, Brown University
What are the visual features underlying human versus machine vision?

2:20 – 2:50, David Sontag, Massachusetts Institute of Technology
Causal inference via deep learning

2:50 – 3:20
Coffee break

3:20 – 3:50, Roni Khardon, Tufts University
Effective variational inference in non-conjugate 2-level latent variable models

3:55 – 4:25, Tina Eliassi-Rad, Northeastern University
Learning, mining and graphs

4:30 – 5:00, Erik Learned-Miller, University of Massachusetts Amherst
Bootstrapping intelligence with motion estimation