Opens in a new tab<\/span><\/p>\n9:50 – 10:00
\nOpening remarks<\/p>\n
10:00 – 10:30, Leslie Pack Kaelbling, Massachusetts Institute of Technology
\nIntelligent robots redux<\/em>
\nThe fields of AI and robotics have made great improvements in many individual subfields, including in motion planning, symbolic planning, probabilistic reasoning, perception, and learning. Our goal is to develop an integrated approach to solving very large problems that are hopelessly intractable to solve optimally. We make a number of approximations during planning, including serializing subtasks, factoring distributions, and determinizing stochastic dynamics, but regain robustness and effectiveness through a continuous state-estimation and replanning process. I will describe our initial approach to this problem, as well as recent work on improving effectiveness and efficiency through learning, and speculate a bit about the role of learning in generally intelligent robots.<\/p>\n10:35 – 11:05, Alexander Rush, Harvard University
\nStructured attention networks<\/em>
\nRecent deep learning systems for NLP and related fields have relied heavily on the use of neural attention, which allows models to learn to focus on selected regions of their input or memory. The use of neural attention has proven to be a crucial component for advances in machine translation, image captioning, question answering, summarization, end-to-end speech recognition, and more. In this talk, I will give an overview of the current uses of neural attentionand memory, describe how the selection paradigm has provided NLP researchers flexibility in designing neural models, and demonstrate some fun applications of this approach from our group.<\/p>\nI will then argue that selection-based attention may be an unnecessarily simplistic approach for NLP, and discuss our recent work on Structured Attention Networks [Kim et al 2017]. These models integrate structured prediction as a hidden layer within deep neural networks to form a variant of attention that enables soft-selection over combinatorial structures, such as segmentations, labelings, and even parse trees. While this approach is inspired by structuredprediction methods in NLP, building structured attention layers within a deep network is quite challenging, and I will describe the interesting dynamic programming approach needed for exact computation. Experiments test the approach on a range of NLP tasks including translation, question answering, and natural language inference, demonstrating improvements upon standard attention in performance and interpretability.<\/p>\n
11:10 – 11:40, Lester Mackey, Microsoft Research
\nMeasuring sample quality with Stein\u2019s method<\/em>
\nApproximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernelevaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein’s method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.<\/p>\n11:40 – 1:45
\nLunch and posters<\/p>\n
1:45 – 2:15, Thomas Serre, Brown University
\nWhat are the visual features underlying human versus machine vision?<\/em><\/p>\n2:20 – 2:50, David Sontag, Massachusetts Institute of Technology
\nCausal inference via deep learning<\/em><\/p>\n2:50 – 3:20
\nCoffee break<\/p>\n
3:20 – 3:50, Roni Khardon, Tufts University
\nEffective variational inference in non-conjugate 2-level latent variable models<\/em><\/p>\n3:55 – 4:25, Tina Eliassi-Rad, Northeastern University
\nLearning, mining and graphs<\/em><\/p>\n4:30 – 5:00, Erik Learned-Miller, University of Massachusetts Amherst
\nBootstrapping intelligence with motion estimation<\/em>Opens in a new tab<\/span><\/p>\n\n\n\n| Poster Title<\/th>\n | Presenting Author \/ Authors<\/th>\n<\/tr>\n<\/thead>\n |
\n\n\nRobust and Efficient Transfer Learning using Hidden Parameter Markov Decision Processes<\/div>\n<\/td>\n \nSam Daulton, Harvard University \/ Taylor Killian, Harvard University; Finale Doshi-Velez, Harvard University;\u00a0George Konidaris, Brown University<\/div>\n<\/td>\n<\/tr>\n \n\nMultimodal Sparse Representation Learning for Multimedia Applications<\/div>\n<\/td>\n Miriam Cha, Harvard University \/ Youngjune L. Gwon & H.T. Kung, Harvard University<\/td>\n<\/tr>\n | \n\nLearning Optimized Risk Scores on Large-Scale Datasets<\/div>\n<\/td>\n \nBerk Ustun, Massachusetts Institute of Technology \/ Cynthia Rudin, Duke University<\/div>\n<\/td>\n<\/tr>\n \n\nAccurate structure-based drug-protein binding energy prediction with deep convolutional neural networks<\/div>\n<\/td>\n \n\n Maksym Korablyov, Massachusetts Institute of Technology \/ \u00a0Xiao Luo,\u00a0Nilai Sarda, Mengyuan Sun, Tyson Chen, Lily Zhang, Ellen Shea,\u00a0Erica Weng, Brian Xie, Yejin You, Ryan Hays, Shuo Gu, Collin Stultz, & Gil Alterovitz, Harvard-MIT division, Boston Children\u2019s Hospital<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nKronecker Determinantal Point Processes<\/div>\n<\/td>\n Zelda Mariet,\u00a0Massachusetts Institute of Technology \/ Suvrit Sra, Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nSynthesizing 3D via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks<\/div>\n<\/td>\n \n\n Amir Arsalan Soltani,\u00a0Massachusetts Institute of Technology \/ Haibin Huang, University of Massachusetts, Amherst;\u00a0Jiajun Wu, Massachusetts Institute of Technology;\u00a0Tejas D. Kulkarni, Google DeepMind;\u00a0Joshua B. Tenenbaum, Massachusetts Institute of Technology<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nR-C3D: Region Convolutional 3D Network for Temporal Activity Detection<\/div>\n<\/td>\n Huijuan Xu,\u00a0Boston University \/ Abir Das, Boston University;\u00a0Kate Saenko, Boston University<\/td>\n<\/tr>\n | \n\nA Decentralized Cluster Primal Dual Splitting Method for Large-Scale Sparse Support Vector Machines with An Application to Hospitalization Prediction<\/div>\n<\/td>\n \n\n Theodora S. Brisimi,\u00a0Boston University \/ Alex Olshevsky, Ioannis Ch. Paschalidis, & Wei Shi, Boston University<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nSmartPlayroom: Semi-automated behavioral analysis of children with ASD in naturalistic environment<\/div>\n<\/td>\n Pankaj Gupta,\u00a0Brown University \/ Elena Tenenbaum, Stephen Sheinkopf, Thomas Serre, & Dima Amso, Brown University<\/td>\n<\/tr>\n | \n\nGuided Proofreading of Automatic Segmentations for Connectomics<\/div>\n<\/td>\n \n\n Daniel Haehn,\u00a0Harvard University \/ Verena Kaynig-Fittkau, Harvard University;\u00a0James Tompkin, Brown University;\u00a0Jeff W. Lichtman & Hanspeter Pfister, Harvard University<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nLie-Access Neural Turing Machines<\/div>\n<\/td>\n Greg Yang,\u00a0Harvard University \/\u00a0Alexander Rush, Harvard University<\/td>\n<\/tr>\n | \n\nDiscriminate-and-Rectify Encoders: Learning from Image Transformation Sets<\/div>\n<\/td>\n \n\n Andrea Tacchetti,\u00a0Massachusetts Institute of Technology \/ Stephen Voinea & Georgios Evangelopoulos,\u00a0Massachusetts Institute of Technology<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nTesting Ising Models<\/div>\n<\/td>\n Gautam Kamath,\u00a0Massachusetts Institute of Technology \/ Constantinos Daskalakis & Nishanth Dikkala,\u00a0Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nMutual Information Hashing<\/div>\n<\/td>\n \n\n Fatih Cakir,\u00a0Boston University \/ Kun He, Sarah Adel Bargal, & Stan Sclaroff, Boston University<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nDataflow Matrix Machines as a Model of Computations with Linear Streams<\/div>\n<\/td>\n Michael Bukatin,\u00a0HERE North America LLC \/\u00a0Jon Anthony, Boston College<\/td>\n<\/tr>\n | \n\nA Bandit Framework for Strategic Regression<\/div>\n<\/td>\n \n\n Yang Liu,\u00a0Harvard University \/\u00a0Yiling Chen, Harvard University<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nRobust Budget Allocation via Continuous Submodular Functions<\/div>\n<\/td>\n Matthew Staib,\u00a0Massachusetts Institute of Technology \/\u00a0Stefanie Jegelka, Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nValue Directed Exploration in Multi-Armed Bandits with Structured Priors<\/div>\n<\/td>\n \nBence Cserna,\u00a0University of New Hampshire \/ Marek Petrik, Reazul Hasan Russel, & Wheeler Ruml, University of New Hampshire<\/div>\n<\/td>\n<\/tr>\n \n\nDesigning Neural Network Architectures Using Reinforcement Learning<\/div>\n<\/td>\n Bowen Baker,\u00a0Massachusetts Institute of Technology \/ Otkrist Gupta, Nikhil Naik, & Ramesh Raskar, Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nWhat do Neural Machine Translation Models Learn about Morphology?<\/div>\n<\/td>\n \n\n Yonatan Belinkov,\u00a0Massachusetts Institute of Technology \/ Nadir Durrani, Fahim Dalvi, & Hassan Sajjad, Qatar Computing Research Institute;\u00a0James Glass, Massachusetts Institute of Technology<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nMessage-passing algorithms for synchronization problems<\/div>\n<\/td>\n Amelia Perry, Massachusetts Institute of Technology \/ Alexander S. Wein, Massachusetts Institute of Technology;\u00a0Afonso S. Bandeira, New York University;\u00a0Ankur Moitra, Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nNon-detection in spiked matrix models<\/div>\n<\/td>\n \n\n Alex Wein, Massachusetts Institute of Technology \/ Amelia Perry, Massachusetts Institute of Technology;\u00a0Afonso Bandeira, New York University Courant;\u00a0Ankur Moitra, Massachusetts Institute of Technology<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nCoarse-to-Fine Attention Models for Document Summarization<\/div>\n<\/td>\n Jeffrey Ling,\u00a0Harvard University \/ Alexander Rush, Harvard University<\/td>\n<\/tr>\n | \n\nTensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning<\/div>\n<\/td>\n \nShanqing Cai,\u00a0Google \/ Eric Breck, Eric Nielsen, Michael Salib, & D. Sculley, Google<\/div>\n<\/td>\n<\/tr>\n \n\nComputational Prediction of Neoantigens for Personalized Cancer Vaccines<\/div>\n<\/td>\n Michael Rooney,\u00a0Neon Therapeutics (formerly at Broad, MIT) \/ Jenn Abelin, Neon Therapeutics (formerly at Broad);\u00a0Derin Keskin, Dana\u2013Farber Cancer Institute;\u00a0Sisi Sarkizova, Harvard;\u00a0Nir Hacohen & Steve Carr, Broad Institute;\u00a0Cathy Wu,\u00a0Dana\u2013Farber Cancer Institute<\/td>\n<\/tr>\n | \n\nOn Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits<\/div>\n<\/td>\n \n\n Shahin Shahrampour,\u00a0Harvard University \/ Mohammad Noshad &\u00a0Vahid Tarokh, Harvard University<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nBayesian Group Decisions: Algorithms and Complexity<\/div>\n<\/td>\n Amin Rahimian,\u00a0University of Pennsylvania\/MIT Institute for Data, Systems, and Society \/ Ali Jadbabaie &\u00a0Elchanan Mossel, Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nNode Embedding for Network Community Discovery<\/div>\n<\/td>\n \nChristy Lin,\u00a0Boston University \/ Prakash Ishwar, Boston University;\u00a0Weicong Ding, Technicolor<\/div>\n<\/td>\n<\/tr>\n \n\nMax-value Entropy Search for Efficient Bayesian Optimization<\/div>\n<\/td>\n Zi Wang,\u00a0Massachusetts Institute of Technology \/\u00a0Stefanie Jegelka Professor,\u00a0Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nNetwork Analysis Identifies Regions of Chromosome Interactions in the Genome<\/div>\n<\/td>\n \n\n Anastasiya Belyaeva, Massachusetts Institute of Technology \/ Caroline Uhler, Massachusetts Institute of Technology;\u00a0Saradha Venkatachalapathy, GV Shivashankar, & Mallika Nagarajan, National University of Singapore<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nSoundNet: Learning Sound Representations from Unlabeled Video<\/div>\n<\/td>\n Carl Vondrick, Massachusetts Institute of Technology \/ Yusuf Aytar & Antonio Torralba,\u00a0Massachusetts Institute of Technology<\/td>\n<\/tr>\n | \n\nRecursive Sampling for the Nystrom Method<\/div>\n<\/td>\n \n\n Christopher Musco,\u00a0Massachusetts Institute of Technology<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nRobust Statistics in High Dimensions, Revisited<\/div>\n<\/td>\n Jerry Li,\u00a0Massachusetts Institute of Technology \/ Ilias Diakonikolas, University of Southern California;\u00a0Gautam Kamath, Massachusetts Institute of Technology;\u00a0Daniel M. Kane, University of California, San Diego;\u00a0Ankur Moitra, Massachusetts Institute of Technology;\u00a0Alistair Stewart, University of Southern California<\/td>\n<\/tr>\n | \n\nFrom Patches to Images: A Nonparametric Generative Model<\/div>\n<\/td>\n \n\n Geng Ji,\u00a0Brown University \/\u00a0Mike Hughes, Harvard University;\u00a0Erik Sudderth, Brown University\/University of California, Irvine<\/p>\n<\/div>\n<\/td>\n<\/tr>\n \n\nNucleotide-level Modeling of Genetic Regulation with Large Receptive Fields using Dilated Convolutions<\/div>\n<\/td>\n Ankit Gupta,\u00a0Harvard University \/\u00a0Alexander Rush, Harvard University<\/td>\n<\/tr>\n | \n\nPredicting the Quality of Short Narratives from Social Media<\/div>\n<\/td>\n \n Tong Wang,\u00a0University of Massachusetts Boston \/ Ping C., University of Massachusetts Boston;\u00a0Albert L., Disney Research<\/div>\n<\/td>\n<\/tr>\n | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |