{"id":599811,"date":"2019-07-25T16:02:45","date_gmt":"2019-07-25T23:02:45","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=599811"},"modified":"2021-04-06T14:02:04","modified_gmt":"2021-04-06T21:02:04","slug":"large-scale-pretraining-for-response-generation","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/large-scale-pretraining-for-response-generation\/","title":{"rendered":"DialoGPT"},"content":{"rendered":"<h2>DialoGPT: Toward Human-Quality Conversational Response Generation via Large-Scale Pretraining<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-650484\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/8u7n73001661651651.jpg\" alt=\"graphic with multiple chat boxes in various colors\" width=\"383\" height=\"212\" \/><!--Creating a chatbot was not a easy task a few years before.-->The DialoGPT project establishes a foundation for building versatile open-domain chatbots that can deliver engaging and natural conversational responses across a variety of conversational topics, tasks, and information requests, without resorting to heavy hand-crafting.<\/p>\n<p>Until recently, such versatile conversational AI systems seemed elusive. The advent of <em>large-scale transformer-based pretraining <\/em>methods\u00a0(like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/openai\/gpt-2\" target=\"_blank\" rel=\"noopener noreferrer\">GPT-2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and BERT) is changing that. The empirical success of pretraining methods in other areas of natural language processing has inspired researchers to apply them to conversational AI, often to good effect (for example, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/medium.com\/huggingface\/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313\" target=\"_blank\" rel=\"noopener noreferrer\">HuggingFace&#8217;s transfer learning model<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>). However, such models are trained on conventional written text, which is often not representative how people interact. With the dual goal of attaining the topical versatility afforded by scale with a more conversationally interactive tone, <b>DialoGPT <\/b>takes transformer-based pretraining one step further to leverage massive amounts of publicly-available colloquial text data.<\/p>\n<p>DialoGPT adapts pretraining techniques to response generation <strong>using hundreds of Gigabytes of colloquial data.<\/strong>\u00a0 Like GPT-2, DialoGPT is formulated as an <em>autoregressive<\/em> (AR) language model, and uses a multi-layer transformer as model architecture. Unlike GPT-2, which trains on general text data,\u00a0 DialoGPT draws on 147M multi-turn dialogues extracted from Reddit discussion threads. Our implementation is based on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/huggingface\/transfer-learning-conv-ai\" target=\"_blank\" rel=\"noopener noreferrer\">huggingface pytorch-transformer<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/openai\/gpt-2\" target=\"_blank\" rel=\"noopener noreferrer\">OpenAI GPT-2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We have released a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT\" target=\"_blank\" rel=\"noopener noreferrer\">public Github repo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for DialoGPT, which contains a data extraction script, model training code and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT#models\" target=\"_blank\" rel=\"noopener noreferrer\">model checkpoints<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for pretrained small (117M), medium (345M) and large (762M) models. We hope this release will foster exploration of large-scale pretraining for response generation by the conversational AI research community.<\/p>\n<p>Our assumption has been that our DialoGPT\u00a0 approach should capture the joint distribution of source\/prompt and target\/response pairs in conversational flow with good granularity. In practice, this is what we observe: sentences generated by DialoGPT are diverse and contain information specific to the source prompt, analogous to the outputs that GPT-2 generates. We have evaluated the model on a public benchmark dataset (DSTC-7), and a new 6k multi- reference test dataset extracted from Reddit postings. \u00a0Our experiments show a state-of-the-art performance in terms of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT#evaluations\" target=\"_blank\" rel=\"noopener noreferrer\">automatic evaluation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (including relevance and diversity metrics). Results of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT#human_eval\" target=\"_blank\" rel=\"noopener noreferrer\">evaluation using human judges<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> suggest that DialoGPT responses may approach human-level response quality in a single-turn Turing test. Generated examples may be seen <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT#generated-examples\" target=\"_blank\" rel=\"noopener noreferrer\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<div id=\"attachment_650517\" style=\"width: 360px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-650517\" class=\"wp-image-650517\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/telegram_bot-280x300.png\" alt=\"screenshot of GPT2Bot chat interface\" width=\"350\" height=\"374\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/telegram_bot-280x300.png 280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/telegram_bot-957x1024.png 957w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/telegram_bot-768x821.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2019\/07\/telegram_bot.png 1006w\" sizes=\"auto, (max-width: 350px) 100vw, 350px\" \/><p id=\"caption-attachment-650517\" class=\"wp-caption-text\">courtesy: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/polakowo\/gpt2bot\" target=\"_blank\" rel=\"noopener noreferrer\">polakowo\/gpt2bot@GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p><\/div>\n<p>This project aims to facilitate research in large-scale pretraining for conversational data; accordingly it is released as a model only. On its own, the model provides only information about the weights of text spans. The onus of decoder implementation resides with the user. Several 3rd party <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT#news\" target=\"_blank\" rel=\"noopener noreferrer\">decoding implementations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> are available, including a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/DialoGPT-medium\" target=\"_blank\" rel=\"noopener noreferrer\">10-line decoding script snippet<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> from Huggingface team.<\/p>\n<p>The conversational text data used to train DialoGPT is different from the large written text corpora (e.g. wiki, news) associated with previous pretrained models. It is less formal, more interactive, occasionally trollish, and in general much noisier. These characteristics pose new challenges (and opportunities) in training and decoding. Despite efforts to minimize the amount of overtly offensive data prior to training, DialoGPT can still generate output that may trigger offense. Output may reflect gender and other historical biases implicit in the data and may exhibit a propensity to express agreement with propositions that are unethical, biased or offensive (or the reverse, disagreeing with otherwise ethical statements). These are known issues in current state-of-the-art end-to-end conversation models trained on large naturally-occurring datasets. A<em> major motive for releasing DialoGPT is to facilitate investigation of these issues and develop mitigation strategies.<\/em> In no case should inappropriate content generated as a result of using DialoGPT be construed to reflect the views or values of either the authors or Microsoft Corporation.<\/p>\n<p>This project is a joint project between <strong>MSR AI<\/strong> and <strong>Microsoft Dynamics 365 AI Research<\/strong> team. For more details, please see our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/DialoGPT\" target=\"_blank\" rel=\"noopener noreferrer\">Github repository<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and our <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dialogpt-large-scale-generative-pre-training-for-conversational-response-generation\/\">paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> published on the ACL 2020 demo track. This project has also been featured in the news media (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.theregister.co.uk\/2019\/11\/07\/microsoft_pg13_chatbot\/\" target=\"_blank\" rel=\"noopener noreferrer\">The Register<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.infoq.com\/news\/2019\/11\/microsoft-ai-conversation\/\" target=\"_blank\" rel=\"noopener noreferrer\">InfoQ<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.adweek.com\/digital\/microsoft-releases-gpt-2-blueprint-advanced-chatbot-offensive\/\" target=\"_blank\" rel=\"noopener noreferrer\">AdWeek<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The DialoGPT project establishes a foundation for building versatile open-domain chatbots that can deliver engaging and natural conversational responses across a variety of conversational topics, tasks, and information requests, without resorting to heavy hand-crafting.<\/p>\n","protected":false},"featured_media":651732,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13545],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-599811","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-human-language-technologies","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2019-11-01","related-publications":[619842],"related-downloads":[619005],"related-videos":[],"related-groups":[144736],"related-events":[],"related-opportunities":[],"related-posts":[658872,851928],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"guest","display_name":"Siqi Sun","user_id":600960,"people_section":"Section name 0","alias":""},{"type":"user_nicename","display_name":"Michel Galley","user_id":32887,"people_section":"Section name 0","alias":"mgalley"},{"type":"user_nicename","display_name":"Yen-Chun Chen","user_id":39672,"people_section":"Section name 0","alias":"yenche"},{"type":"user_nicename","display_name":"Jianfeng Gao","user_id":32246,"people_section":"Section name 0","alias":"jfgao"},{"type":"user_nicename","display_name":"Bill Dolan","user_id":31229,"people_section":"Section name 0","alias":"billdol"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/599811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":36,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/599811\/revisions"}],"predecessor-version":[{"id":652293,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/599811\/revisions\/652293"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/651732"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=599811"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=599811"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=599811"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=599811"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=599811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}