{"id":464856,"date":"2018-02-07T07:50:14","date_gmt":"2018-02-07T15:50:14","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=464856"},"modified":"2018-02-07T07:50:14","modified_gmt":"2018-02-07T15:50:14","slug":"microsoft-researchers-unlock-black-box-network-embedding","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-researchers-unlock-black-box-network-embedding\/","title":{"rendered":"Microsoft researchers unlock the black box of network embedding"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-464862 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_1000x400.jpg\" alt=\"\" width=\"1000\" height=\"400\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_1000x400.jpg 1000w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_1000x400-300x120.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_1000x400-768x307.jpg 768w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>At the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.wsdm-conference.org\/2018\">ACM Conference on Web Search and Data Mining 2018<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, my team will introduce research that, for the first time, provides a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks, known as network embedding. We then use this theoretical explanation to present a new network embedding method that performs as well as or better than existing methods.<\/p>\n<p>Networks are fundamental ways of representing knowledge and relating to the world. As humans, we think through association; we naturally draw links between concepts or entities. People are linked through family relationships or through collaborations. Diseases are linked to treatments. Works of art are linked to their creators. Wikipedia represents the interconnectedness of human knowledge.<\/p>\n<p>As humans, we have great capacity to understand knowledge, notice similarities and make connections and inferences. However, our capacity is limited by the amount of information we can process. Computers, on the other hand, can process tremendous amounts of information, but their ability to understand knowledge and make inferences is limited. That\u2019s because humans think in symbols and computers think in math. Recent advances in computer science such as word and network embedding address this problem by using methods that translate discrete symbols into continuous representations such as high-dimensional vectors that computers can process algebraically.<\/p>\n<p>Network embedding methods enable us to map the structure and characteristics of a network algorithmically, without human intervention. Mapping can be used to predict missing links, to identify missing entities and to make inferences that inform, for example, recommender systems on e-commerce websites.<\/p>\n<p>Existing methods of network embedding work empirically, but so far, scientists have considered them a black box: We lacked understanding of how or why these methods work. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/2761896323\">Our recent research<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&#8212;a collaboration between Microsoft Research and Tsinghua University&#8212;contributes to the theoretical understanding of four popular network-embedding methods, including <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/2154851992\">DeepWalk<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/2366141641\">node2vec<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, as well as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/1888005072\">LINE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/2145658888\">PTE<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which were developed at Microsoft Research Asia.<\/p>\n<p>Our research bridges the gap between the empirical performance and theoretical mechanism of network embedding. We prove that four popular network-embedding models are &#8212;in theory&#8212; implicitly performing <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/42355184\">matrix factorizations<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> over a network. More excitingly, the theoretical analysis also provides the closed forms of those factorized matrices, most of which are related to the network\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/#\/detail\/115178988\">Laplacian matrix<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We show that these models share a common practice: they are unified into a general matrix-factorization framework. As such, this work lays the theoretical foundation for network embedding, yielding a better understanding of latent representation learning over networks.<\/p>\n<p>Theoretical understanding is important because it enables us to go beyond empirical attempts. Once we understand why and how something works, we are better able to predict how it will work in the future and can use it more efficiently. Based on the new theoretical understanding we derived from four existing methods, we proposed a new algorithm, NetMF, that uses matrix factorization explicitly to create network embeddings. The new algorithm performed at comparative or better levels than existing methods in our tests, providing further support for our theoretical explanation.<\/p>\n<p>We use network embedding in the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/microsoft-academic-graph\/\">Microsoft Academic Graph<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> \u2013 the largest knowledge graph of research publications in existence, accessible via the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/academic-knowledge\/\">Academic Knowledge API<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/academic.microsoft.com\/\">Microsoft Academic<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> website. As machines map the structure and meaning of entities in the graph and their relationships, they can begin making inferences. For example, they can infer that two papers are quite similar in content, or two authors do similar work. In a way, this vector representation is like your genome sequence. If it is similar to someone else\u2019s, you must be related. With network embedding, we can find academic twins that perhaps were not aware of each other. We can help researchers discover scholarship they might not have come across otherwise, and bridge gaps between traditional disciplinary divisions.<\/p>\n<p><em>The research team included Jiezhong Qiu (a summer intern at Microsoft Research), Yuxiao Dong, Hao Ma, Kuansan Wang from Microsoft Research, and Jian Li and Jie Tang from Tsinghua University.<\/em><\/p>\n<div id=\"attachment_464706\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-464706\" class=\"wp-image-464706 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_Mihaela-Math-1024x692.jpg\" alt=\"\" width=\"1024\" height=\"692\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_Mihaela-Math-1024x692.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_Mihaela-Math-300x203.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_Mihaela-Math-768x519.jpg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-464706\" class=\"wp-caption-text\">Figure 1: The closed forms of the matrices that are implicitly approximated and factorized by popular network-embedding models.<\/p><\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At the ACM Conference on Web Search and Data Mining 2018, my team will introduce research that, for the first time, provides a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks, known as network embedding. We then use this theoretical explanation to present a new network embedding method [&hellip;]<\/p>\n","protected":false},"author":37074,"featured_media":464868,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194474,194460],"tags":[],"research-area":[13563,13555],"msr-region":[256048],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-464856","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-visulalization","category-search-and-information-retrieval","msr-research-area-data-platform-analytics","msr-research-area-search-information-retrieval","msr-region-global","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[171464],"related-events":[],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"480\" height=\"280\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_480x280.jpg\" class=\"img-object-cover\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_480x280.jpg 480w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2018\/02\/WSDM_480x280-300x175.jpg 300w\" sizes=\"(max-width: 480px) 100vw, 480px\" \/>","byline":"Kuansan Wang","formattedDate":"February 7, 2018","formattedExcerpt":"At the ACM Conference on Web Search and Data Mining 2018, my team will introduce research that, for the first time, provides a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks, known as network embedding. We then use&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/464856"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37074"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=464856"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/464856\/revisions"}],"predecessor-version":[{"id":464874,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/464856\/revisions\/464874"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/464868"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=464856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=464856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=464856"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=464856"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=464856"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=464856"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=464856"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=464856"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=464856"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=464856"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=464856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}