{"id":599811,"date":"2019-07-25T16:02:45","date_gmt":"2019-07-25T23:02:45","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=599811"},"modified":"2021-04-06T14:02:04","modified_gmt":"2021-04-06T21:02:04","slug":"large-scale-pretraining-for-response-generation","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/large-scale-pretraining-for-response-generation\/","title":{"rendered":"DialoGPT"},"content":{"rendered":"
The DialoGPT project establishes a foundation for building versatile open-domain chatbots that can deliver engaging and natural conversational responses across a variety of conversational topics, tasks, and information requests, without resorting to heavy hand-crafting.<\/p>\n
Until recently, such versatile conversational AI systems seemed elusive. The advent of large-scale transformer-based pretraining <\/em>methods\u00a0(like GPT-2 (opens in new tab)<\/span><\/a> and BERT) is changing that. The empirical success of pretraining methods in other areas of natural language processing has inspired researchers to apply them to conversational AI, often to good effect (for example, HuggingFace’s transfer learning model (opens in new tab)<\/span><\/a>). However, such models are trained on conventional written text, which is often not representative how people interact. With the dual goal of attaining the topical versatility afforded by scale with a more conversationally interactive tone, DialoGPT <\/b>takes transformer-based pretraining one step further to leverage massive amounts of publicly-available colloquial text data.<\/p>\n DialoGPT adapts pretraining techniques to response generation using hundreds of Gigabytes of colloquial data.<\/strong>\u00a0 Like GPT-2, DialoGPT is formulated as an autoregressive<\/em> (AR) language model, and uses a multi-layer transformer as model architecture. Unlike GPT-2, which trains on general text data,\u00a0 DialoGPT draws on 147M multi-turn dialogues extracted from Reddit discussion threads. Our implementation is based on the huggingface pytorch-transformer (opens in new tab)<\/span><\/a> and OpenAI GPT-2 (opens in new tab)<\/span><\/a>. We have released a public Github repo (opens in new tab)<\/span><\/a> for DialoGPT, which contains a data extraction script, model training code and model checkpoints (opens in new tab)<\/span><\/a> for pretrained small (117M), medium (345M) and large (762M) models. We hope this release will foster exploration of large-scale pretraining for response generation by the conversational AI research community.<\/p>\n Our assumption has been that our DialoGPT\u00a0 approach should capture the joint distribution of source\/prompt and target\/response pairs in conversational flow with good granularity. In practice, this is what we observe: sentences generated by DialoGPT are diverse and contain information specific to the source prompt, analogous to the outputs that GPT-2 generates. We have evaluated the model on a public benchmark dataset (DSTC-7), and a new 6k multi- reference test dataset extracted from Reddit postings. \u00a0Our experiments show a state-of-the-art performance in terms of automatic evaluation (opens in new tab)<\/span><\/a> (including relevance and diversity metrics). Results of evaluation using human judges (opens in new tab)<\/span><\/a> suggest that DialoGPT responses may approach human-level response quality in a single-turn Turing test. Generated examples may be seen here (opens in new tab)<\/span><\/a>.<\/p>\n