{"id":579955,"date":"2019-04-30T05:59:59","date_gmt":"2019-04-30T12:59:59","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=579955"},"modified":"2019-04-30T13:06:31","modified_gmt":"2019-04-30T20:06:31","slug":"toward-emotionally-intelligent-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/toward-emotionally-intelligent-artificial-intelligence\/","title":{"rendered":"Toward Emotionally Intelligent Artificial Intelligence"},"content":{"rendered":"<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-582514 size-large aligncenter\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-1024x576.png\" alt=\"\" width=\"1024\" height=\"576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2Site_04_2019_1400x788.png 1400w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>Recent successes in machine intelligence hinge on core computation ability to efficiently search through billions of possibilities in order to make decisions. Sequences of decisions, if successful, often suggest that perhaps computation is catching up to\u2013or even surpassing\u2013human intelligence. Human intelligence, on the other hand, is highly generalizable, adaptive, robust and exhibits characteristics that the current state-of-the-art machine intelligence systems simply are not yet capable of producing. For example, humans are able to plan significantly far in advance based on the anticipated outcomes, even in the presence of many unknown variables. Human intelligence shines in scenarios in which other humans and living beings are involved and consistently demonstrates reasoning and meta-reasoning abilities. Human intelligence is also sympathetic, empathetic, kind, nurturing and\u2013importantly\u2013able to relinquish and redefine the goals of a mission for the benefit of a greater good. While almost all the work in machine intelligence focuses on \u201chow\u201d, the hallmark of human-intelligence is the ability to ask \u201cwhat\u201d and \u201cwhy\u201d.<\/p>\n<p>Our hypothesis is that emotional intelligence is key to unlocking emergence of machines that are not only more general, robust and efficient, but that also are aligned with the values of humanity. The affective mechanisms in humans allow us to accomplish tasks that are far too difficult to program or teach current machines. For example, our sympathetic and parasympathetic responses allow us to stay safe and to be aware of danger. Our ability to recognize affect in others and imagine ourselves in their situations makes us far more effective in taking appropriate decisions and navigating in the complex world. Drives and affect such as hunger, curiosity, surprise, and joy enable us to regulate our own behavior and also determine the sets of goals that we wish to achieve. And finally, our ability to express our own internal state is an excellent way to signal to others and possibly influence their decision making.<\/p>\n\t\t\t<div class=\"ms-grid \">\n\t\t\t<div class=\"ms-row\">\n\t\t\t\t\t<div  class=\"m-col-12-24\" >\n\t\t<p>Consequently, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/mitpress.mit.edu\/books\/affective-computing\">it has been hypothesized<\/a> that building such an emotional intelligence into a computational framework at minimum would require the following capabilities:<\/p><ul><li>Recognizing others\u2019 emotions<\/li><li>Responding to others\u2019 emotions<\/li><li>Expressing emotions<\/li><li>Regulating and utilizing emotions in decision making<\/li><\/ul><p>\t<\/div>\n\t\t<div  class=\"m-col-12-24\" >\n\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/EI-for-AI.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-581890 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/EI-for-AI.png\" alt=\"Graphic of a brain with circuits to imply neural pathways of emotion.\" width=\"596\" height=\"530\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/EI-for-AI.png 596w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/EI-for-AI-300x267.png 300w\" sizes=\"(max-width: 596px) 100vw, 596px\" \/><\/a>\t<\/div>\n\t<\/p>\t\t\t<\/div>\n\t\t<\/div>\n\t\t\n<p>Historically, the research in building emotionally intelligent machines has primarily taken the human-machine collaboration point of view and mostly focused on the first three capabilities. For example, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/mitpress.mit.edu\/books\/affective-computing\">the earliest work<\/a> on affect recognition started almost three decades ago, where physiological sensors, cameras, microphones, and so on were used to detect a host of affective responses. While there is much debate about how consistently and universally people express emotions on their faces and other physiological signals, and whether these really reflect how they feel inside, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/alumni.media.mit.edu\/~djmcduff\/assets\/publications\/McDuff_2017_SAS_Abstract.pdf\">researchers have successfully built algorithms to identify useful signals in the noisy world of human expressions as well as demonstrated that these signals are consistent with socio-cultural norms<\/a>.<\/p>\n<p>Ability to take appropriate actions based on the internal cognitive state of a human is imperative for an emotionally intelligent agent. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ieeexplore.ieee.org\/document\/1532370\">Applications<\/a> such as <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2016\/12\/ACM2005.pdf\">automatic tutoring systems<\/a>, mental and physical health support, and applications for improving productivity lie at the forefront of what is being pursued. The recent line of work on sequential decision making, such as contextual bandits, is slowly making gains in this rich area. <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2016\/10\/foodandmood_final.pdf\">Our own work<\/a>, for example, shows how a system sensitive to affective aspects of managing a diet could help subjects make good decisions.<\/p>\n<p>Expression of affect has been at the forefront of computing for many decades now. Even simple signals (for example, light, color, sound) have the ability to convey and provoke rich emotion. In \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/neural-tts-stylization-with-adversarial-and-collaborative-games\/\">Neural TTS Stylization with Adversarial and Collaborative Games<\/a>,\u201d (co-authored with Shuang Ma and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalesong\/\">Yale Song<\/a>) to be presented at the Seventh International Conference on Learning Representations\u2014<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/iclr.cc\/\">ICLR 2019<\/a>, we propose a new machine learning approach to synthesizing realistic human sounding speech that is expressive. This architecture challenges the model to generate realistic sounding speech that is faithful to the textual content while maintaining an easily controllable dial for changing the emotion expressed in an independent fashion. Our model achieves start-of-the-art results across multiple tasks, including style transfer (content and style swapping), emotion modeling, and identity transfer (fitting a new speaker\u2019s voice). An open-source implementation is available with the paper.<\/p>\n<div id=\"attachment_581881\" style=\"width: 610px\" class=\"wp-caption aligncenter\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/researchdemopage.wixsite.com\/tts-gan\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-581881\" class=\"wp-image-581881\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Neural_TTS-1024x552.jpg\" alt=\"Figure 1-Our neural architecture uses a combination of adversarial and collaborative approaches. The algorithm received two audio samples during each training step and has to produce two samples one of which is a reconstruction of the first audio sample (i.e., has both the content and style of sample 1) and the second which has the content of sample 1 and the style of sample 2. In doing so it creates an internal representation of both content and style which are disambiguated. \" width=\"600\" height=\"324\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Neural_TTS-1024x552.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Neural_TTS-300x162.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Neural_TTS-768x414.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Neural_TTS.jpg 1025w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><p id=\"caption-attachment-581881\" class=\"wp-caption-text\">Figure 1-Our neural architecture uses a combination of adversarial and collaborative approaches. The algorithm received two audio samples during each training step and has to produce two samples one of which is a reconstruction of the first audio sample (i.e., has both the content and style of sample 1) and the second which has the content of sample 1 and the style of sample 2. In doing so it creates an internal representation of both content and style which are disambiguated.<\/p><\/div>\n<p>While the recognition, expression and intervention aspects of artificially emotionally intelligent systems have been studied in-depth over the past 20 years, there is a still more compelling form of intelligence\u2014a system that utilizes the affective mechanisms effectively in order to learn better and make choices efficiently. In the most recent line of work, we hope to explore questions of how to build such affective mechanisms that help our computational processes achieve more than what they accomplish currently.<\/p>\n<p>Our recent work, also appearing at ICLR 2019, explores the idea of affect-based intrinsic motivations that can aid in learning decision-making mechanisms. Much of the recent success in artificial intelligence in solving games such as Go, Pac-Man, and text-based RPGs rely on reinforcement learning, where good actions are rewarded and bad actions are penalized. However, it requires a large number of trials in such an action-reward framework for a computational agent to learn a reasonable policy. The intuition behind our proposal is to get inspiration from how humans and other living beings leverage affective mechanisms to learn much more efficiently.<\/p>\n<p>As a human learns to navigate the world, the body\u2019s (nervous system\u2019s) responses provide constant intrinsic feedback about the potential consequence of action choices, for example, becoming nervous when close to a cliff\u2019s edge or when driving fast around a bend. Physiological changes are correlated with these biological preparations to protect one-self from danger. The anticipatory response in humans to a threatening situation is for the heart rate to increase, heart rate variability to decrease, and for blood to be diverted from the extremities and for the sweat glands to dilate. This is the body&#8217;s \u201cfight or flight\u201d response. Humans have evolved over millions of years to build up these complex systems. What if machines had similar feedback systems?<\/p>\n<div id=\"attachment_581398\" style=\"width: 511px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2a.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-581398\" class=\"wp-image-581398\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2a-1024x728.png\" alt=\"Visceral Machines are a novel approach to reinforcement learning that leverages neural networks trained on physiological signals to mimic autonomic nervous system responses. Such signals then are used as intrinsic reward mechanisms to train agents that can learn to accomplish various tasks.\" width=\"501\" height=\"356\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2a-1024x728.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2a-300x213.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2a-768x546.png 768w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><p id=\"caption-attachment-581398\" class=\"wp-caption-text\">Figure 2-Visceral Machines are a novel approach to reinforcement learning that leverages neural networks trained on physiological signals to mimic autonomic nervous system responses. Such signals then are used as intrinsic reward mechanisms to train agents that can learn to accomplish various tasks.<\/p><\/div>\n<p>In \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/visceral-machines-risk-aversion-in-reinforcement-learning-with-intrinsic-physiological-rewards\/\">Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards<\/a>,\u201d we propose a novel approach to reinforcement learning that leverages an intrinsic reward function trained on human fight or flight behavior.<\/p>\n<p>Our hypothesis is that such reward functions can circumvent the challenges associated with sparse and skewed rewards in reinforcement learning settings and can help improve sample efficiency. In our case, extrinsic rewards from events are not necessary for the agent to learn. We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage. We are excited about the potential of training autonomous systems that mimic the ability to feel and respond to stimuli in an emotional way.<\/p>\n<div id=\"attachment_581413\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2b.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-581413\" class=\"wp-image-581413 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2b-1024x601.png\" alt=\"An example of the physiological response (blood volume pulse) recorded from a human during a driving task. A zoomed in section of the pulse wave with frames from the view of the driver are shown. Note how the pulse wave pinches between seconds 285 and 300, during this period the driver collided with a wall while turning sharply to avoid another obstacle. The pinching begins before the collision occurs as the driver\u2019s anticipatory response is activated. The intrinsic reward component aims to recreate statistical properties of the blood volume pulse wave during driving in the simulated environment\" width=\"1024\" height=\"601\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2b-1024x601.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2b-300x176.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Figure-2b-768x451.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><p id=\"caption-attachment-581413\" class=\"wp-caption-text\">Figure 3-An example of the physiological response (blood volume pulse) recorded from a human during a driving task. A zoomed in section of the pulse wave with frames from the view of the driver are shown. Note how the pulse wave pinches between seconds 285 and 300, during this period the driver collided with a wall while turning sharply to avoid another obstacle. The pinching begins before the collision occurs as the driver\u2019s anticipatory response is activated. The intrinsic reward component aims to recreate statistical properties of the blood volume pulse wave during driving in the simulated environment<\/p><\/div>\n<p>A lot of computer scientists and roboticists have aspired to build agents that resemble memorable characters in popular science fiction such as KITT and R2D2. However, rich opportunities exist for building holistic affective computing mechanisms that go a step beyond and to help us build robust, efficient and non-myopic artificial intelligence. We hope that this work inspires a fresh look at how emotions can be used in artificial intelligence.<\/p>\n<p>We hope to see you at ICLR in New Orleans in May and look forward to sharing ideas and advancing the conversation on the possibilities in the exciting research realm of emotionally intelligent agents.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent successes in machine intelligence hinge on core computation ability to efficiently search through billions of possibilities in order to make decisions. Sequences of decisions, if successful, often suggest that perhaps computation is catching up to\u2013or even surpassing\u2013human intelligence. Human intelligence, on the other hand, is highly generalizable, adaptive, robust and exhibits characteristics that the [&hellip;]<\/p>\n","protected":false},"author":38022,"featured_media":581425,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[194467],"tags":[],"research-area":[13556,13554],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-579955","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artifical-intelligence","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[867219],"related-projects":[],"related-events":[571356],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788.png\" class=\"img-object-cover\" alt=\"a man standing in front of a computer screen\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/04\/Visceral-Machines-2_Site_04_2019_1400x788-343x193.png 343w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"Daniel McDuff and Ashish Kapoor","formattedDate":"April 30, 2019","formattedExcerpt":"Recent successes in machine intelligence hinge on core computation ability to efficiently search through billions of possibilities in order to make decisions. Sequences of decisions, if successful, often suggest that perhaps computation is catching up to\u2013or even surpassing\u2013human intelligence. Human intelligence, on the other hand,&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/579955"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=579955"}],"version-history":[{"count":46,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/579955\/revisions"}],"predecessor-version":[{"id":582517,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/579955\/revisions\/582517"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/581425"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=579955"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=579955"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=579955"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=579955"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=579955"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=579955"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=579955"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=579955"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=579955"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=579955"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=579955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}