<\/a>Figure 2: The above illustrates one context and its multiple responses in the latent space induced by SpaceFusion. Distance and direction from the predicted response vector given the context roughly match the relevance and diversity, respectively.<\/p><\/div>\n
For example, as illustrated in Figure 2, given a context\u2014in this case, \u201cAnyone want to start this game?\u201d\u2014the positive responses \u201cI\u2019d love to play it\u201d and \u201cYes, I do\u201d are arranged along the same direction. The negative ones\u2014\u201cI\u2019m not interested in the game\u201d and \u201cNo, I don\u2019t\u201d\u2014are mapped on a line in another direction. Diversity in responses is achieved by exploring the latent space along different directions. Furthermore, the distance in the latent space corresponds to the relevancy. Responses farther away from the context\u2014\u201cYes, I do\u201d and \u201cNo, I don\u2019t\u201d\u2014are usually generic, while those closer are more relevant to the specific context: \u201cI\u2019m not interested in the game\u201d and \u201cWhen will you?\u201d<\/p>\n
SpaceFusion disentangles the criteria of relevancy and diversity and represents them in two independent dimensions\u2014direction and distance\u2014making it easier to jointly optimize both. Our empirical experiments and human evaluation have shown that SpaceFusion performs better in these two criteria compared to competitive baselines.<\/p>\n
Learning a shared latent space<\/h3>\n
So, how exactly does SpaceFusion align different latent spaces?<\/p>\n
The idea is quite intuitive: For each pair of points from two different latent spaces, we first minimize their distance in the shared latent space and then encourage a smooth transition between them. This is done by adding two novel regularization terms\u2014distance term and smoothness term\u2014to the objective function.<\/p>\n
Taking conversation as the example, the distance term measures the Euclidean distance between a point from the S2S latent space, which is mapped from the context and represents the predicted response, and the points from the AE latent space, which correspond to its target responses. Minimizing such distance encourages the S2S model to map the context to a point close to and surrounded by its responses in the shared latent space, as illustrated by Figure 2.<\/p>\n
The smoothness term measures the likelihood of generating the target response from a random interpolation between the point mapped from the context and the one mapped from the response. By maximizing this likelihood, we encourage a smooth transition of the meaning of the generated responses as we move away from the context. This allows us to explore the neighborhood of the prediction point made by the S2S and thus generate diverse responses that are relevant to the context.<\/p>\n
With these two novel regularizations added in the objective function, we put the distance and smoothness constraints on the learning of latent space, so the training will not only focus on the performance on each latent space, but also try to align them together by adding these desired structures. Our work focused on conversational models, but we expect that SpaceFusion can align the latent spaces learned by other models trained on different datasets. This makes it possible to bridge different abilities and knowledge domains learned by each specific AI system and is a baby step toward a more general intelligence.<\/p>\n","protected":false},"excerpt":{"rendered":"
A palette makes it easy for painters to arrange and mix paints of different colors as they create art on the canvas before them. Having a similar tool that could allow AI to jointly learn from diverse data sources such as those for conversations, narratives, images, and knowledge could open doors for researchers and scientists […]<\/p>\n","protected":false},"author":38022,"featured_media":584938,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[243622],"tags":[],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-584920","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-human-language-technologies","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[171447],"related-events":[589690],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"","byline":"Xiang Gao","formattedDate":"May 8, 2019","formattedExcerpt":"A palette makes it easy for painters to arrange and mix paints of different colors as they create art on the canvas before them. Having a similar tool that could allow AI to jointly learn from diverse data sources such as those for conversations, narratives,…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/584920"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38022"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=584920"}],"version-history":[{"count":10,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/584920\/revisions"}],"predecessor-version":[{"id":585061,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/584920\/revisions\/585061"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/584938"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=584920"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=584920"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=584920"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=584920"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=584920"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=584920"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=584920"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=584920"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=584920"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=584920"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=584920"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}