{"id":851928,"date":"2022-06-23T09:00:00","date_gmt":"2022-06-23T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=851928"},"modified":"2022-08-17T08:54:40","modified_gmt":"2022-08-17T15:54:40","slug":"godel-combining-goal-oriented-dialog-with-real-world-conversations","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/godel-combining-goal-oriented-dialog-with-real-world-conversations\/","title":{"rendered":"GODEL: Combining goal-oriented dialog with real-world conversations"},"content":{"rendered":"\n
\"Diagram<\/figure>\n\n\n\n

They make restaurant recommendations, help us pay bills, and remind us of appointments. Many people have come to rely on virtual assistants and chatbots to perform a wide range of routine tasks. But what if a single dialog agent, the technology behind these language-based apps, could perform all these tasks and then take the conversation further? In addition to providing on-topic expertise, such as recommending a restaurant, it could engage in a conversation about the history of the neighborhood or a recent sports game, and then bring the conversation back on track.\u202fWhat if the agent\u2019s responses continually reflect the latest world events? And what if it could do all of this without the need for any additional work by the designer?\u202f  <\/p>\n\n\n\n

With GODEL (opens in new tab)<\/span><\/a>, this may not be far off. GODEL stands for G<\/strong>rounded O<\/strong>pen D<\/strong>ialogue<\/strong> L<\/strong>anguage Model, and it ushers in a new class of pretrained language models that enable both task-oriented and social conversation and are evaluated by the usefulness of their responses.  <\/p>\n\n\n\n

Pretrained language models are among the engines that power conversational AI, the technology that underlies these dialog agents. They can either be task-oriented (\u201cgive me a job, and I\u2019ll do it\u201d) or engage in a conversation without a specified outcome, known as open-domain or chit-chat. GODEL combines both these capabilities, giving dialog agents the ability to generate responses based not just on the context of the conversation, but also on external information, content that was not part of the dataset when the model was trained. This includes both structured content, such as information stored in databases, and unstructured content, such as restaurant reviews, Wikipedia articles, and other publicly available material found on the web. This explains how a simple task-based query about restaurant recommendations can evolve into a dialog about ingredients, food, and even cooking techniques\u2014the kind of winding path that real-world conversations take.  <\/p>\n\n\n\n

In 2019, the Deep Learning<\/a> and Natural Language Processing<\/a> groups at Microsoft Research released DialoGPT<\/a>, the first large-scale pretrained language model designed specifically for dialog. This helped make conversational AI more accessible and easier to work with, and it enabled the research community to make considerable progress in this area. With GODEL, our goal is to help further this progress by empowering researchers and developers to create dialog agents that are unrestricted in the types of queries they can respond to and the sources of information they can draw from. We also worked to ensure those responses are useful to the person making the query.    <\/p>\n\n\n\n

In our paper, \u201cGODEL: Large-Scale Pre-training for Goal-Directed Dialog (opens in new tab)<\/span><\/a>,\u201d we describe the technical details underlying GODEL, and we have made the code available on GitHub (opens in new tab)<\/span><\/a>. <\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n\n\n\n
Download the code<\/a><\/div>\n<\/div>\n\n\n\n
<\/div>\n\n\n\n

A grounded model<\/h2>\n\n\n\n

One of GODEL\u2019s key features is the flexibility it provides users in defining their model\u2019s grounding<\/em>\u2014the sources from which their dialog agents retrieve information. This flexibility informs GODEL\u2019s versatility in diverse conversational settings. If someone were to inquire about a local restaurant for example, GODEL would be able to provide specific and accurate responses even though that venue may not have been included in the data used to train it. Responses would vary depending on whether the grounding information is empty, a snippet of a document, a search result (unstructured text), or information drawn from a database about the restaurant (structured text). However, each response would be appropriate and useful. <\/p>\n\n\n\n

In addition to specificity, grounded generation helps keep models up to date, as the grounded text can incorporate information that may not have been available at the time the model was trained. For example, if a model were developed before the 2022 Winter Olympics, GODEL would be able to provide details on those games and a list of winners even though all the data available to train it predates that event.<\/p>\n\n\n\n\t

\n\t\t\n\n\t\t

\n\t\tMicrosoft research podcast<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"photo\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

What\u2019s Your Story: Lex Story<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Model maker and fabricator Lex Story helps bring research to life through prototyping. He discusses his take on failure; the encouragement and advice that has supported his pursuit of art and science; and the sabbatical that might inspire his next career move.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

Broad application of GODEL<\/h2>\n\n\n\n

Another main feature of GODEL is its wide range of dialog applications. While its predecessor, DialoGPT, and other prior pretrained models for dialog have mostly focused on social bots, GODEL can be applied to a variety of dialogs, including those that are task-oriented, question-answering, and grounded chit-chat. In the same conversation, GODEL can produce reasonable responses for a variety of query types, including general questions or requests for specific actions.  <\/p>\n\n\n\n

In addition, GODEL\u2019s responses have been evaluated for their helpfulness. In our paper<\/a>, we show that evaluation is done more reliably on datasets that are goal-directed, and that people generally agree on which responses are better when asked to judge their utility towards achieving certain goals. Equipped with this robust evaluation setup, we compared our model against several strong baselines and state-of-the-art approaches and show that GODEL is superior in terms of both human and automatic evaluation, as indicated in Figure 1. The paper<\/a> describes extensive experiments against other state-of-the-art pretrained language models and demonstrates that performance gains are even larger in these cases. <\/p>\n\n\n\n

\"Two<\/a>
Figure 1: These charts illustrate GODEL\u2019s performance against T5, a pretrained model that performed best in our evaluation. They compare the aggregate performance of models fine-tuned from GODEL against that of models fine-tuned from T5. They show that GODEL performs much better in human evaluations and makes appreciable gains in the automatic evaluation. The test set for these experiments combines a variety of dialog genres, including task-oriented dialog, conversational question-answering, and grounded chit-chat.<\/figcaption><\/figure>\n\n\n\n

The following examples illustrate different dialog scenarios where GODEL uses a variety of sources to respond to identical user queries. <\/p>\n\n\n\n\n\n

This example illustrates how GODEL responds in an open-ended scenario in which the user asks a question that is completely unrelated to the initial question. Despite the lack of relevance, GODEL responds appropriately while trying to bring the conversation back on track. <\/p>\n\n\n\n

\"Figure<\/a><\/figure>\n\n\n\n\n\n

This example illustrates how GODEL responds in a task-oriented setting in which the model is connected to the components of a traditional goal-oriented dialog systems, such as a database. In this case, the relevant environment contains structured information, a database returning two restaurants relevant to the current conversation. \u202f<\/p>\n\n\n\n

\"Figure<\/a><\/figure>\n\n\n\n\n\n

This example illustrates how GODEL responds in a task-oriented setting in which traditional components of task-oriented dialog systems are not available. In this case, GODEL retrieves a restaurant review via a search engine. The response reflects both the context of the conversation and a snippet of the retrieved text, a restaurant review.  <\/p>\n\n\n\n

\"Figure<\/a><\/figure>\n\n\n\n\n\n

\u202fThis example illustrates how GODEL responds in a question-answering scenario, where the user asks a general question and the context provides the dialog agent with the words it needs to search for the relevant information on the web. <\/p>\n\n\n\n

\"Figure<\/a><\/figure>\n\n\n\n\n\n

GODEL available as open source<\/h2>\n\n\n\n

To advance research, we believe it is crucial to make code and models publicly available, and we have released GODEL as fully open source (opens in new tab)<\/span><\/a>. We have made three versions of GODEL available: base, large, and extra-large. We are also including the code needed to retrain all pretrained models and to fine-tune models for specific tasks: the CoQA dataset, intended for conversational question-answering; the Wizard of Wikipedia and Wizard of the Internet datasets, aimed at information-seeking chats; and MultiWOZ is for task-completion dialogs. <\/p>\n\n\n\n

We hope GODEL helps numerous academic research teams advance the field of conversational AI with innovative dialog models while eliminating the need for significant GPU resources. We plan to continuously improve GODEL and make more models available to the research community. Please visit our project page<\/a> to learn more about the GODEL project and new releases.<\/p>\n\n\n\n

Acknowledgements<\/h2>\n\n\n\n

We would like to thank our fellow colleagues at Microsoft Research who contributed to this work and blog post: Bill Dolan, Pengcheng He, Elnaz Nouri, Clarisse Simoes Ribeiro. <\/p>\n","protected":false},"excerpt":{"rendered":"

They make restaurant recommendations, help us pay bills, and remind us of appointments. Many people have come to rely on virtual assistants and chatbots to perform a wide range of routine tasks. But what if a single dialog agent, the technology behind these language-based apps, could perform all these tasks and then take the conversation […]<\/p>\n","protected":false},"author":37583,"featured_media":851937,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-851928","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144736,144931],"related-projects":[599811],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Michel Galley","user_id":32887,"display_name":"Michel Galley","author_link":"Michel Galley<\/a>","is_active":false,"last_first":"Galley, Michel","people_section":0,"alias":"mgalley"},{"type":"user_nicename","value":"Lars Liden","user_id":32612,"display_name":"Lars Liden","author_link":"Lars Liden<\/a>","is_active":false,"last_first":"Liden, Lars","people_section":0,"alias":"laliden"},{"type":"user_nicename","value":"Chris Brockett","user_id":31423,"display_name":"Chris Brockett","author_link":"Chris Brockett<\/a>","is_active":false,"last_first":"Brockett, Chris","people_section":0,"alias":"chrisbkt"},{"type":"guest","value":"zhou-yu","user_id":"852018","display_name":"Zhou Yu","author_link":"Zhou Yu<\/a>","is_active":true,"last_first":"Yu, Zhou","people_section":0,"alias":"zhou-yu"},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"display_name":"Jianfeng Gao","author_link":"Jianfeng Gao<\/a>","is_active":false,"last_first":"Gao, Jianfeng","people_section":0,"alias":"jfgao"}],"msr_type":"Post","featured_image_thumbnail":"\"Diagram","byline":"","formattedDate":"June 23, 2022","formattedExcerpt":"They make restaurant recommendations, help us pay bills, and remind us of appointments. Many people have come to rely on virtual assistants and chatbots to perform a wide range of routine tasks. But what if a single dialog agent, the technology behind these language-based apps,…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/851928"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=851928"}],"version-history":[{"count":22,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/851928\/revisions"}],"predecessor-version":[{"id":870555,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/851928\/revisions\/870555"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/851937"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=851928"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=851928"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=851928"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=851928"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=851928"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=851928"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=851928"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=851928"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=851928"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=851928"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=851928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}