{"id":377990,"date":"2017-04-18T11:51:36","date_gmt":"2017-04-18T18:51:36","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=377990"},"modified":"2019-08-19T10:03:33","modified_gmt":"2019-08-19T17:03:33","slug":"deep-reinforcement-learning-goal-oriented-dialogue","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/deep-reinforcement-learning-goal-oriented-dialogue\/","title":{"rendered":"Deep Reinforcement Learning for Goal-Oriented Dialogues"},"content":{"rendered":"

Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [Proposal<\/a>] All the data, source code and schedule information will be updated here<\/a>.<\/p>\n

This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), natural language generation (NLG), and dialogue management (DM) that consists of state tracking and policy learning. Our research focus is on deep reinforcement learning approaches for dialogue management in goal-oriented dialogue settings, including movie ticket booking, trip planning, sales assistant etc.<\/p>\n

\"\"

Composite Task Completion Dialogue System<\/p><\/div>\n

User Simulator<\/strong><\/a>
\nTraining reinforcement learners is challenging because they need an environment to operate in. Thus, we developed a user simulator for learning and evaluation. [
Li et al. 2016<\/a>]<\/p>\n

Infobot<\/strong><\/a>
\nWe developed the first end-to-end reinforcement learning agent with differential knowledge base access. [
Dhuwan et al. ACL 2017<\/a>], and the first end-to-end dialogue policy trained with both supervised and reinforcement learning [Williams et al. 2016<\/a>].<\/p>\n

Task-completion bot<\/strong><\/a>
\nWe developed an end-to-end learning framework for task-completion neural dialogue systems [
Li et al. IJCNLP 2017<\/a>]. We also developed an BBQ Networks (Bayes-by-Backprop Q-Networks)\u00a0which performs efficient exploration for dialogue policy learning [Lipton et al. 2017<\/a>], as well as efficient actor-critic methods which substantially reduce the sample complexity\u00a0for end-to-end learning of LSTM-based dialogue policy [Asadi et al. 2016<\/a>].<\/p>\n

Composite Task-completion bot<\/strong>
\nWe developed a composite task-completion dialogue system, based on hierarchical reinforcement learning to learn the dialogue policies that operate at different temporal scales, and demonstrated its significant improvement over flat deep reinforcement learning in both simulation and human evaluation [
Peng et al. EMNLP 2017<\/a>]. (The source code will be released soon.<\/em>)<\/p>\n

 <\/p>\n

 <\/p>\n

<\/h1>\n","protected":false},"excerpt":{"rendered":"

Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems, at SLT 2018. [Proposal] All the data, source code and schedule information will be updated here. This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-377990","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2016-04-18","related-publications":[482721,500372,493946,437235,438609,454764,369872,377222,379010,376403,418055,372953,340424,347972,339806,294722,294719,305843,552729,591757,502184,506330,508631],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"Research Team","alias":"jfgao"},{"type":"guest","display_name":"Kavosh Asadi","user_id":399587,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Yun-Nung (Vivian) Chen","user_id":398510,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Bhuwan Dhingra","user_id":398498,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Zachary Lipton","user_id":398495,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Baolin Peng","user_id":398489,"people_section":"Past Interns & Visitors","alias":""},{"type":"guest","display_name":"Da Tang","user_id":398501,"people_section":"Past Interns & Visitors","alias":""}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":35,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990\/revisions"}],"predecessor-version":[{"id":604194,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/377990\/revisions\/604194"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=377990"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=377990"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=377990"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=377990"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=377990"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}