{"id":579955,"date":"2019-04-30T05:59:59","date_gmt":"2019-04-30T12:59:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=579955"},"modified":"2019-04-30T13:06:31","modified_gmt":"2019-04-30T20:06:31","slug":"toward-emotionally-intelligent-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/toward-emotionally-intelligent-artificial-intelligence\/","title":{"rendered":"Toward Emotionally Intelligent Artificial Intelligence"},"content":{"rendered":"
<\/a><\/p>\n Recent successes in machine intelligence hinge on core computation ability to efficiently search through billions of possibilities in order to make decisions. Sequences of decisions, if successful, often suggest that perhaps computation is catching up to\u2013or even surpassing\u2013human intelligence. Human intelligence, on the other hand, is highly generalizable, adaptive, robust and exhibits characteristics that the current state-of-the-art machine intelligence systems simply are not yet capable of producing. For example, humans are able to plan significantly far in advance based on the anticipated outcomes, even in the presence of many unknown variables. Human intelligence shines in scenarios in which other humans and living beings are involved and consistently demonstrates reasoning and meta-reasoning abilities. Human intelligence is also sympathetic, empathetic, kind, nurturing and\u2013importantly\u2013able to relinquish and redefine the goals of a mission for the benefit of a greater good. While almost all the work in machine intelligence focuses on \u201chow\u201d, the hallmark of human-intelligence is the ability to ask \u201cwhat\u201d and \u201cwhy\u201d.<\/p>\n Our hypothesis is that emotional intelligence is key to unlocking emergence of machines that are not only more general, robust and efficient, but that also are aligned with the values of humanity. The affective mechanisms in humans allow us to accomplish tasks that are far too difficult to program or teach current machines. For example, our sympathetic and parasympathetic responses allow us to stay safe and to be aware of danger. Our ability to recognize affect in others and imagine ourselves in their situations makes us far more effective in taking appropriate decisions and navigating in the complex world. Drives and affect such as hunger, curiosity, surprise, and joy enable us to regulate our own behavior and also determine the sets of goals that we wish to achieve. And finally, our ability to express our own internal state is an excellent way to signal to others and possibly influence their decision making.<\/p>\n\t\t\t Consequently, it has been hypothesized<\/a> that building such an emotional intelligence into a computational framework at minimum would require the following capabilities:<\/p> \t<\/div>\n\t\t Historically, the research in building emotionally intelligent machines has primarily taken the human-machine collaboration point of view and mostly focused on the first three capabilities. For example, the earliest work<\/a> on affect recognition started almost three decades ago, where physiological sensors, cameras, microphones, and so on were used to detect a host of affective responses. While there is much debate about how consistently and universally people express emotions on their faces and other physiological signals, and whether these really reflect how they feel inside, researchers have successfully built algorithms to identify useful signals in the noisy world of human expressions as well as demonstrated that these signals are consistent with socio-cultural norms<\/a>.<\/p>\n Ability to take appropriate actions based on the internal cognitive state of a human is imperative for an emotionally intelligent agent. Applications<\/a> such as automatic tutoring systems<\/a>, mental and physical health support, and applications for improving productivity lie at the forefront of what is being pursued. The recent line of work on sequential decision making, such as contextual bandits, is slowly making gains in this rich area. Our own work<\/a>, for example, shows how a system sensitive to affective aspects of managing a diet could help subjects make good decisions.<\/p>\n Expression of affect has been at the forefront of computing for many decades now. Even simple signals (for example, light, color, sound) have the ability to convey and provoke rich emotion. In \u201cNeural TTS Stylization with Adversarial and Collaborative Games<\/a>,\u201d (co-authored with Shuang Ma and Yale Song<\/a>) to be presented at the Seventh International Conference on Learning Representations\u2014ICLR 2019<\/a>, we propose a new machine learning approach to synthesizing realistic human sounding speech that is expressive. This architecture challenges the model to generate realistic sounding speech that is faithful to the textual content while maintaining an easily controllable dial for changing the emotion expressed in an independent fashion. Our model achieves start-of-the-art results across multiple tasks, including style transfer (content and style swapping), emotion modeling, and identity transfer (fitting a new speaker\u2019s voice). An open-source implementation is available with the paper.<\/p>\n