Faculty Summit 2017: The Edge of AI

Presenter: Dan Fay

Project Catapult connects FPGAs together through a network to create a hyperscale, reconfigurable accelerator fabric. See how to use the Project Catapult cluster at the Texas Advanced Computing Center (TACC) for research. Apply for access at aka.ms/catapult-academic.

Presenter: Lucas Joppa

[Video Abstract]

Understanding the land cover types and locations within specific regions enables effective environmental conservation. With sufficiently high spatial and temporal resolution, scientists and planners can identify which natural resources are at risk and the level of risk. This information helps inform decisions about how and where to focus conservation efforts. Current land cover products don’t meet these spatial and temporal requirements. Microsoft AI for Earth Program’s Land Cover Classification Project will use deep learning algorithms to deliver a scalable Azure pipeline for turning high-resolution US government images into categorized land cover data at regional and national scales. The first application of the platform will produce a land cover map for the Puget Sound watershed. This watershed is Microsoft’s own backyard and one of the nation’s most environmentally and economically complex and dynamic landscapes.

Presenter: Linjun Yang

[Video Abstract | Full Video]

Visual search, AKA search by image, is a new way of searching for information using an image or part of an image as the query. Similar to text search, which connects keyword queries to knowledge on the web, the ultimate goal of visual search is to connect camera captured data or images to web knowledge. Bing has been continuously improving its visual search feature, which is now available on Bing desktop, mobile, and apps, as well as Edge browser. It can be used not only for searching for similar images but also for task completion, such as looking for similar products while shopping. Bing image search now also features image annotation and object detection, to further improve the user experience. This demo will show these techniques and the scenarios for which the techniques were developed.

Presenter: Kate Kelly

[Full Video]

From ferry schedules to dinner reservations, Cortana is the digital assistant designed to help people get things done. Cortana will eventually be everywhere people need assistance—on the phone, PC, Xbox One, and other places like the home and car. Cortana is part of the Microsoft portfolio of intelligent products and services, and current research is designed to take it beyond voice search to create an assistant that is truly intelligent.

Presenter: Anna Roth

[Video Abstract | Full Video]

This demo shows how Custom Vision Service can be applied to many AI vision applications. For example, if a client needs to build a custom image classifier, they can submit a few images of objects, and a model is deployed at the touch of a button. Microsoft Office is also using Custom Vision Service to automatically caption images in PowerPoint.

Presenter: Mike Seltzer

[Video Abstract | Full Video]

Two of the most important components of speech recognition systems are the acoustic model and the language model. Those models behind Microsoft’s speech recognition engine have been optimized for certain usage scenarios, such as interacting with Cortana on a smart phone, searching the web by voice, or sending text messages to a friend. But if a user has specific needs, such as recognizing domain-specific vocabulary or the ability to understand accents, then the acoustic and language models need to be customized. This demo shows the benefits of customizing acoustic and language models to improve the accuracy of speech recognition for lectures. Using the Custom Speech Service (Cognitive Service) technics, the demo shows how the technology can tune speech recognition for specific topic and lecturers.

Presenter: Gang Hua

[Video Abstract | Full Video]

This demo demonstrates several applications of Microsoft’s recent work in artistic style transfer for images and videos. One technology, called StyleBank, provides an explicit representation for visual styles with a feedforward deep network that can clearly separate the content and style from an image. This framework can render stylized videos online, achieving more stable rendering results than in the past. In addition, the Deep Image Analogy technique takes a pair of images, transferring the visual attributes from one to the other. It enables a wide variety of applications in artistic effects.

Presenter: Guihong Cao

[Video Abstract | Full Video]

DeepFindSearching within web documents on mobile devices is difficult and unnatural: ctrl-f searches only for exact matches, and it’s hard to see the search results. DeepFind takes a step toward solving this problem by allowing users to search within web documents using natural language queries and displays snippets from the document that answer the user’s questions.

Users can interact with DeepFind on bing.com, m.bing.com, and the Bing iOS App in two different ways: as an overlay experience, which encourages exploration and follow-up questions, or as a rich carousel of document snippets integrated directly into the search engine results pages, which proactively answers the user’s question.

Presenter: David Baumert

[Video Abstract]

This demonstration uses Softbank’s Pepper robot as testbed hardware to show a set of human-collaboration activities based on Microsoft Cognitive Services and other Microsoft Research technologies.

As both a research and prototype-engineering effort, this project is designed to implement software technology and learn from concepts such as Brooks’ subsumption architecture, which distributes the brain activities of the robot between the local device for reflex functions, the local facility infrastructure for recognition functions, and remote API services hosted in the cloud for cognitive functions. This implementation is designed to be machine-independent and relevant to all robots requiring human-collaboration capabilities. This approach has supported new investigations such as non-verbal communication and body movements expressed and documented using Labanotation, making it possible for a robot to process conversations with humans and automatically generate life-like and meaningful physical behaviors to accompany its spoken words.

Presenter: Nilesh Bhide

[Video Abstract | Full Video]

As we move into the world of messaging apps, bots and botification of content, users are starting to move from keyword searches to relying on bots and assistants for their information seeking needs. Bing has built InfoBots, a set of AI- and Bing-powered QnA capabilities that bots can leverage to help users with their information-seeking needs. InfoBots QnA capabilities are tuned for answering any information-seeking question from a wide variety of content (Open domain content from the Internet, specific vertical domain content, etc.). InfoBots supports conversational QnA through multi-turn question and answer understanding to answer natural-language-based questions. InfoBots capabilities have applications in both consumer and enterprise contexts.

Presenter: Silviu-Petru Cucerzan

[Video Abstract]

This demo shows how InstaFact brings the information and intelligence of the Satori knowledge graph into Microsoft’s productivity software. InstaFact can automatically complete factual information in the text a user is writing or can verify the accuracy of facts in text. It can infer the user’s needs based on data correlations and simple natural-language clues. It can expose in simple ways the data and structure Satori harvests from the Web, and let users populate their text documents and spreadsheets with up-to-date information in just a couple of clicks.

Presenter: Yan Xia

[Video Abstract | Full Video]

When traveling to China it’s best to know at least a bit of the language. The mobile app called Learn Chinese can help travelers enjoy a better journey. Learn Chinese teaches in an interactive way, by using speech and natural language processing technology. The AI robot teacher corrects the user’s Chinese pronunciation and wording through conversation over various scenarios, such as shopping, seeing a doctor, or having dinner in a restaurant. It’s a more natural way of learning a language, propelled by AI techniques.

Presenter: Mahmoud Adada

[Video Abstract]

Maluuba’s vision is to build literate machines. The research team has built deep learning models that can process written unstructured text and answer questions against it. The demo will showcase Maluuba’s machine reading comprehension (MRC) system by ingesting a 400-page automotive manual and answering users’ questions about it in real time. The long-term vision for this product is to apply MRC technology to all types of user manuals, such as cars, home appliances, and more.

Presenter: Alicia Edelman Pelton

[Video Abstract | Full Video]

Building machine learning (ML) models is an involved process requiring ML experts, engineers, and labelers. The demand of models for common-sense tasks far exceeds the available “teachers” that can build them. We approach this problem by allowing domain experts to apply what we call Machine Teaching (MT) principles. These include mining domain knowledge, concept decomposition, ideation, debugging, and semantic data exploration.

PICL is a toolkit that originated from the MT vision. It enables teachers with no ML expertise to build classifiers and extractors. The underlying SDK enables system designers and engineers to build customized experiences for their problem domain. In PICL, teachers can bring their own dataset, search or sample items to label using active learning strategies, label these items, create or edit features, monitor model performance, and review and debug errors, all in one place.

Presenter: Kelly Freed

[Video Abstract | Full Video]

Microsoft Pix helps every photographer take better pictures. Because it incorporates AI behind the lens, it can tweak settings, select the best shots, and enhance them on the fly. It’s designed to help take the guesswork out of getting great photos, so amateur photographers enjoy the moment, instead of struggling to capture it!

Presenter: Chris Wendt

[Video Abstract | Full Video]

Microsoft Translator live enables users to hold translated conversations across two or more languages, with up to 100 participants participating at the same time using PowerPoint, iOS, Android, Windows and web endpoints. Businesses, retail stores, and organizations around the world need to interact with customers who don’t speak the same language as the service providers, and Microsoft Translator live is an answer to all these needs.

Presenter: Ashley Feniello

[Video Abstract | Full Video]

This demo shows our work on a mobile robot that gives directions to visitors. Currently, this robot is navigating Microsoft Building 99, leading people, escorting and interacting with visitors and generally providing a social presence in the building. This robot uses Microsoft’s Platform for Situated Intelligence and Windows components for human interaction, as well as a robot operating system running under Linux for robot control, localization and navigation.

Presenter: Ivan Tarapov

[Video Abstract | Full Video]

Project InnerEye is a new AI product targeted at improving the productivity of oncologists, radiologists, and surgeons when working with radiological images. The project’s main focus is in the treatment of tumors and monitoring the progression of cancer in temporal studies. InnerEye builds upon many years of research in computer vision and machine learning. It employs decision forests (as used already in Kinect and Hololens) to help radiation oncologists and radiologists deliver better care, more efficiently and consistently to their cancer patients.

Presenter: Katja Hofmann

[Video Abstract | Full Video]

Project Malmo is an open source AI experimentation platform that supports fundamental AI research. With the platform, Microsoft provides an experimentation environment in which promising approaches can be systematically and easily compared, and that fosters collaboration between researchers. Project Malmo is built on top of Minecraft, which is particularly appealing due to its design; open-ended, collaborative, and creative. Project Malmo particularly focuses on Collaborative AI – developing AI agents that can learn to collaborate with other agents, including humans, to help them achieve their goals. To foster research in this area, Microsoft recently ran the Malmo Collaborative AI Challenge, in which more than 80 teams of students worldwide, competed to develop new algorithms that facilitate collaboration. This demo demonstrates the results of the collaborative AI challenge task and shows selected agents and how new tasks and agents can be easily implemented.

Presenter: Mihai Jalobeanu

Time: 1:15 PM–2:00 PM & 2:15 PM–3:00 PM

Location: Lassen

Engineering general-purpose interactive AI systems that are efficient, robust, transparent and maintainable is still a challenging task. Such systems need to integrate multiple competencies, deal with large amounts of streaming data, and react quickly to an uncertain environment. They often combine human-authored components with machine-learned, non-deterministic components, which further amplifies the challenges. In this technology showcase, we demonstrate a platform under development at Microsoft Research that aims to provide a foundation for developing this class of complex, multimodal, integrative-AI systems. The framework provides a runtime that enables efficient, parallel-coordinated computation over streaming data, a set of tools for visualization, data analytics and machine learning, and provides a chassis for pluggable AI-components that enable the rapid development of situated interactive systems. This technology showcase provided a short introduction and demonstration of various framework aspects. The session ran twice for 45 minutes, starting at 1:15 PM and 2:00 PM to allow participants to also visit other technology showcases.

Presenter: Ying Wang

[Video Abstract]

Zo is a sophisticated machine conversationalist with the personality of a 22-year-old with #friendgoals. She hangs out on Kik and Facebook and is always interested in a casual conversation with her growing crowd of human friends. Zo is an open-domain chatbot and her breadth of knowledge is vast. She can chime into a conversation with context-specific facts about things like celebrities, sports, or finance but she also has empathy, a sense of humor, and a healthy helping of sass. Using sentiment analysis, she can adapt her phrasing and responses based on positive or negative cues from her human counterparts. She can tell jokes, read your horoscope, challenge you to rhyming competitions, and much more. In addition to content, the phrasing of the conversations must sound natural, idiomatic, and human in both text and voice modalities. Zo’s “mind” is a sophisticated array of multiple machine learning (ML) techniques all working in sequence and in parallel to produce a unique, entertaining and, at times, amazingly human conversational experience. This demo shows some of Zo’s latest capabilities and how the team has achieved these technical accomplishments.

Faculty Summit 2017: The Edge of AI

Technology Showcase

Accelerating Research Using Networked FPGAs

AI for Earth Classification

Bing Visual Search

Cortana, Your Personal Assistant

Custom Vision Service

Customizing Speech Recognition for Higher Accuracy Transcriptions

Deep Artistic Style Transfer: From Images to Videos

DeepFind: Searching within Documents to Answer Natural Language Questions

Human-Robot Collaboration

InfoBots: AI-Powered Conversational QnA Systems

InstaFact—Bringing Knowledge to Office Apps

Interactive Chinese Learning App

Machine Reading Comprehension over Automotive Manual

Machine Teaching Using the Platform for Interactive Concept Learning (PICL)

Microsoft Pix

Microsoft Translator live

Mobile Directions Robot

Project InnerEye – Assistive AI for Cancer Treatment

Project Malmo – Experimentation Platform for the Next Generation of AI Research

Tutorial: Platform for Situated Intelligence

Zo AI