April 23, 2013

Machine Learning Summit 2013

13:30–17:00 GMT

Location: Paris, France

Plenary Sessions

  • Speaker: Andrew Blake, Microsoft Research

    The world of computer science and artificial intelligence can indulge in a bit of cautious celebration. There are several examples of machines that have the gift of sight, even if to a degree that is primitive on the scale of human or animal abilities. Machines can: navigate using vision; separate object from background; recognise a variety of objects, including the limbs of the human body. These abilities are great spin-offs in their own right, but are also part of an extended adventure in understanding the nature of intelligence.

    One question is whether intelligent systems will turn out to depend more on theories and models, or simply on largely amorphous networks trained on data at ever greater scale? In vision systems this often boils down to the choice between two paradigms: analysis-by-synthesis versus empirical recognisers. Each approach has its strengths, and one can speculate about how deeply the two approaches may eventually be integrated.

  • Speaker: Judea Pearl, UCLA

    The development of graphical models and the logic of counterfactuals have had a marked effect on the way scientists treat problems involving cause-effect relationships. Practical problems requiring causal information, which long were regarded as either metaphysical or unmanageable can now be solved using elementary mathematics. Moreover, problems that were thought to be purely statistical, are beginning to benefit from analysing their causal roots.

    I will review concepts, principles and mathematical tools that were found useful in this transformation, and will demonstrate their applications in several data-intensive sciences.These include questions of confounding control, policy analysis, misspecification tests, mediation, heterogeneity, selection bias, missing data and the integration of findings from diverse studies.

    The following topics will be emphasised:

    • The 3-layer causal hierarchy association, intervention and counterfactuals.
    • What mathematics can tell us about “external validity” or “generalising across domains”
    • What causal analysis tells us about recovery from selection bias and missing data.
  • Speaker: Hermann Hauser, Amadeus Capital Partners

    Data is accumulating at such a rate that there are no longer enough qualified humans to analyse it. Machine learning is needed to make data useful in many sectors which are drowning in it. Examples abound from healthcare, genomics, oil exploration, marketing etc. There have been 5 distinct waves of computing which all had the human at the centre of the industry. The internet of things will change this. Most communication will be between machines. To make them useful to us again they will need machine learning. This puts machine learning at the centre of the next 6th wave of computing.

  • Chair: Jeannette Wing, Microsoft ResearchPanellists: Eric Horvitz, Microsoft Research; Michel Cosnard, Iria; Iian Buchan, University of Manchester; Lionel Tarassenko, University of Oxford

Session Abstracts

  • Chair: John Bronskill, Microsoft ResearchSpeakers: John Winn, Microsoft Research

    When faced with a machine learning problem, a common approach is to try to transform the data so that it can be solved using a standard tool, such as a classifier. Since each problem has unique characteristics, such a transformation will typically be imperfect and can lead to poor performance. Using model-based machine learning (MBML), we instead create a customised model which is exactly matched to the problem being solved. No transformation of the data is needed and the behaviour of the resulting machine learning algorithm can be finely tuned to the needs of the application. This talk will be a guided tour of model-based machine learning – what it is, what it can do and where it has already been put into practice.

    Presentations

  • Chair: Leon Bottou, Microsoft ResearchSpeakers: Francis Bach, Inria – Ecole Normale Superieure; Anatoli Juditsky, Joseph Fourier University; Alekh Agarwal, Microsoft Research

    Classical statisticians used to rely on paper and pencil for both data collection and computation. Transferring the computation to electronic computers led to machine learning methods able to manage the scarcity of manually collected data relative to the capacity of the computationally feasible models. During the last decade, networked and pervasive computers have dramatically changed the data collection process. New machine learning algorithms and technologies are required as the volume of data grows faster than the available computational power.

    Presentations

  • Chair: Laurent Massoulie, Microsoft Research-Inria Joint CentreSpeakers: Avrim Blum, Carnegie Mellon University; Amos Storkey, University of Edinburgh; Peter Key, Microsoft Research

    The growth in connectedness enabled by technology is creating situations where users or software agents are confronted by large or very large systems, for example, many on-line trading systems, on-line markets, sponsored search and online games. We would like to design large systems that both work well and are also efficient, but first we need to understand how users react to such systems. How should an agent reason about a large system and what types of behaviour of are “best” for the agent? Three promising approaches for tackling this are: mean-field games, which assume that an agent reacts to some average statistic of the other agents’ actions; regret minimisation based on local updating; and machine learning markets, where prediction markets offer a different model for aggregating information. This session seeks to explore and learn from the possible connections between these different approaches. The aim is to stimulate cross-discipline discussion and research, using these talks that take different approaches to act as catalysts for the discussion.

    Presentations

  • Chair: Yann LeCun, New York UniversitySpeakers: Samy Bengio, Google; Fei-Fei Li, Stanford University; Padmanabhan Anandan, Microsoft Research India

    The Internet is increasingly spawning challenging machine learning applications that could benefit from being formulated as supervised learning tasks with millions of categories or labels. For instance, photo and video annotation and Wikipedia article categorisation . The talks in the session will introduce, motivate and present state of the art techniques for learning with millions of categories and labels from the perspective of specific applications in computational advertising, computer vision and web search. The session should be of interest to researchers working on multi-class and multi-label classification, large scale learning, optimization and distributed machine learning and semi-supervised learning as well as domain experts in computational advertising and computer vision.

    Presentations

  • Chair: John Bronskill, Microsoft ResearchSpeaker: John Guiver, Microsoft Research

    This tutorial will expand on the theme of model-based machine learning by looking at how to go about designing and building a model in practice. We’ll start off with data visualisation and analysis which, coupled with knowledge of how the data was collected, will give us a rich understanding of the data. This will inform the choices we make in constructing initial models. As we think more about the data and evaluate our initial models, we will get fresh ideas for how to extend and improve our model. The tutorial will concentrate on one particular data set which will allow the audience the time to understand the data and to interactively make suggestions for modelling choices.

    Presentations

  • Chair: Pushmeet Kohli, Microsoft ResearchSpeakers: Foster Provost, New York University; Sharad Goel, Microsoft Research; Elad Yom-Tov, Microsoft Research

    We are seeing an unprecedented increase in the time people spend interacting with machines and computational systems. These interactions take the form of time spent on social networking websites like Facebook, activities on search engines such as Bing or Google, or actions performed on mobile devices, and produce a lot of data about the user. This data can be levered through Machine Learning to not only understand the behaviour and preferences of users, but to also build intelligent interactive systems that are more effective and easy to use. In this session, we will hear about what are the key challenges and opportunities in this space.

    Presentations

  • Chair: Silvia Chiappa, Microsoft ResearchSpeakers: David Page, University of Wisconsin-Madison; Antonio Criminisi, Microsoft Research; Bert Kappen, University of Radboud

    This session will consist of three talks covering the use of advanced Machine Learning techniques to analyse clinical, genetic and medical image data.

    The first talk will focus on the application of Machine Learning for predicting clinical events from electronic medical records. Specifically, it will discuss how to build predictive models of diseases and other health care events such as myocardial infarction, atrial fibrillation and adverse drug events from clinical and genetic data.

    The second talk will focus on the use of Machine Learning in two different areas of genetics: the use of Bonaparte for DNA matching with application to forensics; and an approach to learning non-linear interactions in genome-wide association studies with application to psychiatric disease.

    The third talk will focus on the use of Machine Learning in combination with medical expertise for automatic analysis of patients’ medical images with application to computer-aided diagnosis, personalised medicine and efficient data management.

    Presentations

  • Chair: Sebastian Nowozin, Microsoft ResearchSpeakers: Carsten Rother, Microsoft Research; Tomas Werner, Czech Technical University; Bill Freeman, Massachusetts Institute of Technology

    Interpreting images and extracting high-level semantic information about natural scenes is commonly formulated as an inference problem in a graphical model. In all but the most trivial situations model specification, inference and parameter estimation are challenging and we have to turn to approximations that are computationally efficient. This session is about the application of graphical models to computer vision problems.

    Presentations

  • Chair: Leon Bottou, Microsoft ResearchSpeakers: Isabelle Guyon, ChaLearn; Thomas Richardson, University of Washington; Leon Bottou, Microsoft Research

    Machine learning systems in the real world are never without a purpose; they take actions, such as displaying a specific ad on a specific web page or actuating the controls of a self-driving vehicle. The true performance of the system depends on both the short- and long-term consequences of these actions. Unfortunately, discovering statistical correlations in the training data is not sufficient to predict the consequence of known actions in new contexts. Machine learning techniques must therefore integrate causal inference and causal discovery techniques. Many important works on multi-armed bandits and reinforcement learning represent first steps in this direction.

    Presentations

  • Chair: Thore Graepel, Microsoft ResearchSpeakers: Andy Gordon, Microsoft Research; Vikash Mansinghka, Massachusetts Institute of Technology; Avi Pfeffer, Charles River Analytics; Christopher Re, University of Wisconsin-Madison

    Probabilistic programming constitutes the most universal and ambitious paradigm for machine learning and inference to date. New probabilistic programming languages and frameworks such as Church, Infer.NET, IBAL and Markov Logic Networks have been proposed to empower the new probabilistic programmer.

    In this session, we are planning to discuss the following questions: What progress has been made up to this point? Which probabilistic programming frameworks look most promising for which applications? What are the great research challenges that need to be addressed before probabilistic programming can go mainstream? What are the killer apps for probabilistic programming?

    Presentations

  • Chair: Ronan Collobert, Idiap Research InstituteSpeakers: Steve Renals, University of Edinburgh; Hermann Ney, RWTH Aachen University; Alex Acero, Microsoft Research

    Presentations

  • Chair: Yoram Bachrach, Microsoft ResearchSpeakers: Emre Kiciman, Microsoft Research; Eyal Amir, University of Illinois at Urbana-Champaign; David Parkes, Harvard University

    Data from interactions between humans and results gathered through crowdsourcing can be excellent sources of information. However, to gain knowledge from such information one must aggregate and analyse it, and the tasks must be structured so as to make the best use from the work. We will discuss techniques and approaches from artificial intelligence and machine learning which give an insight into such domains.

    Presentations