{"id":347972,"date":"2017-01-06T08:44:47","date_gmt":"2017-01-06T16:44:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=347972"},"modified":"2018-10-16T19:58:30","modified_gmt":"2018-10-17T02:58:30","slug":"sample-efficient-deep-reinforcement-learning-dialog-control","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/sample-efficient-deep-reinforcement-learning-dialog-control\/","title":{"rendered":"Sample-efficient Deep Reinforcement Learning for Dialog Control"},"content":{"rendered":"<p>Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL).  For RL, a policy gradient approach is natural, but is sample inefficient.  In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL.  The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks.  On two tasks, these methods reduce the number of dialogs\/episodes required by about a third, vs. standard policy gradient methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193718],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-347972","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_publishername":"arxiv","msr_edition":"","msr_affiliation":"","msr_published_date":"2016-12-18","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2016-1134","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"347975","msr_publicationurl":"https:\/\/arxiv.org\/abs\/1612.06000","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"main","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2017\/01\/main.pdf","id":347975,"label_id":0},{"type":"url","title":"https:\/\/arxiv.org\/abs\/1612.06000","viewUrl":false,"id":false,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":0,"url":"https:\/\/arxiv.org\/abs\/1612.06000"}],"msr-author-ordering":[{"type":"text","value":"Kavosh Asadi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"jawillia","user_id":32190,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=jawillia"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[390593,395930],"msr_project":[393245,377990,295931,171313],"publication":[],"video":[],"download":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":393245,"post_title":"Conversational Intelligence","post_name":"conversational-intelligence","post_type":"msr-project","post_date":"2017-07-05 10:01:45","post_modified":"2017-11-15 13:39:25","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/conversational-intelligence\/","post_excerpt":"Intelligent agents that can handle human language play a growing role in personalized, ubiquitous computing and the everyday use of devices. Agents need to be able to communicate and collaborate with humans in ways that are seamless and natural, and to be able to learn new behaviors, concepts, and relationships as first-class operations. In other words, our devices need to be able to converse with us. In this project, Microsoft Research AI teams are interested&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/393245"}]}},{"ID":377990,"post_title":"Deep Reinforcement Learning for Goal-Oriented Dialogues","post_name":"deep-reinforcement-learning-goal-oriented-dialogue","post_type":"msr-project","post_date":"2017-04-18 11:51:36","post_modified":"2019-08-19 10:03:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/deep-reinforcement-learning-goal-oriented-dialogue\/","post_excerpt":"Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [Proposal] All the data, source code and schedule information will be updated here. This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), natural language generation (NLG), and dialogue management (DM) that consists of state tracking and policy learning. Our research focus is&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990"}]}},{"ID":295931,"post_title":"Chatbots and\u00a0Conversation As A Platform (CAAP)","post_name":"chatbots-conversation-platform-caap","post_type":"msr-project","post_date":"2016-09-21 23:16:41","post_modified":"2017-06-05 12:48:54","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/chatbots-conversation-platform-caap\/","post_excerpt":"At\u00a0Microsoft Build 2016 event, Microsoft CEO Satya Nadella said\u00a0that chatbots, as next big thing, will have\u00a0\u201cas profound an impact as previous shifts we\u2019ve had.\u201d\u00a0The past paradigm shifts include graphical user interface, the web browser and the touchscreen. Conversations As\u00a0A platform(CAAP) has\u00a0the promise of making booking a flight or buying a new shirt as easy as sending a text message,\u00a0with the potential to make computing more\u00a0accessible to users\u00a0on mobile devices. This group has been worked on&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/295931"}]}},{"ID":171313,"post_title":"Dialog and Conversational Systems Research","post_name":"dialog-and-conversational-systems-research","post_type":"msr-project","post_date":"2014-03-14 09:46:35","post_modified":"2017-07-11 15:34:26","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dialog-and-conversational-systems-research\/","post_excerpt":"Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world. Projects Spoken Language Understanding","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171313"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/347972","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/347972\/revisions"}],"predecessor-version":[{"id":399428,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/347972\/revisions\/399428"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=347972"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=347972"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=347972"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=347972"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=347972"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=347972"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=347972"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=347972"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=347972"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=347972"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=347972"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=347972"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=347972"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=347972"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=347972"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=347972"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}