{"id":212091,"date":"2016-01-21T11:53:38","date_gmt":"2016-01-21T11:53:38","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dual-embedding-space-model-desm\/"},"modified":"2019-08-19T10:21:11","modified_gmt":"2019-08-19T17:21:11","slug":"dual-embedding-space-model-desm","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dual-embedding-space-model-desm\/","title":{"rendered":"Dual Embedding Space Model (DESM)"},"content":{"rendered":"
\n

The Dual Embedding Space Model<\/em> (DESM) is an information retrieval model that uses two word embeddings, one for query words and one for document words. It takes into account the vector similarity between each query word vector and all document word vectors.<\/p>\n<\/div>\n

\n

A key challenge for information retrieval is to model document aboutness<\/em>. The traditional approach uses term frequency, with more occurrences of a query word indicating that the document is more likely to be about that word. DESM uses multiple document words as aboutness evidence for each query term. For example, for the query term \u201cAlbuquerque\u201d the two passages of text below are indistinguishable according to term frequency, each having one occurrence. Our approach considers the presence of related terms such as \u201cpopulation\u201d and \u201cmetropolitan\u201d, which is evidence that passage (a) is about Albuquerque while passage (b) merely mentions Albuquerque.<\/p>\n

\"desm_aboutness\"<\/p>\n

Here we generate our dual embeddings using the well-known tool word2vec (opens in new tab)<\/span><\/a>. In most word2vec studies, word embeddings are taken from the model\u2019s input matrix only (IN). In this paper we also use the output matrix (OUT) embeddings. In the table below, the IN vector for \u201cYale\u201d is close to the IN vector for \u201cHarvard\u201d (IN-IN), but its nearest neighbour in OUT space is \u201cFaculty\u201d (IN-OUT). The single embedding approaches (IN-IN and OUT-OUT) tend to group words of the same type (typical<\/em>), whereas the dual embedding approach (IN-OUT) groups words that occur together in the training data (topical<\/em>).<\/p>\n

\"desm-nearestneighbours\"<\/span><\/span><\/p>\n

The DESM approach of performing all-pairs comparison with dual embeddings yields positive results on information retrieval testbeds. More details can be found in the publications listed below.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"

The Dual Embedding Space Model (DESM) is an information retrieval model that uses two word embeddings, one for query words and one for document words. It takes into account the vector similarity between each query word vector and all document word vectors. A key challenge for information retrieval is to model document aboutness. The traditional […]<\/p>\n","protected":false},"featured_media":244490,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13555],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-212091","msr-project","type-msr-project","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-research-area-search-information-retrieval","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"1\/21\/2016","related-publications":[215405,215414],"related-downloads":[234677],"related-videos":[],"related-groups":[267093],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Bhaskar Mitra","user_id":31257,"people_section":"Group 1","alias":"bmitra"},{"type":"user_nicename","display_name":"Nick Craswell","user_id":33088,"people_section":"Group 1","alias":"nickcr"},{"type":"user_nicename","display_name":"Rich Caruana","user_id":33365,"people_section":"Group 1","alias":"rcaruana"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212091"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212091\/revisions"}],"predecessor-version":[{"id":604206,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/212091\/revisions\/604206"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/244490"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=212091"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=212091"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=212091"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=212091"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=212091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}