{"id":163869,"date":"2012-11-01T00:00:00","date_gmt":"2012-11-01T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/fennel-streaming-graph-partitioning-for-massive-scale-graphs\/"},"modified":"2018-11-09T01:23:41","modified_gmt":"2018-11-09T09:23:41","slug":"fennel-streaming-graph-partitioning-for-massive-scale-graphs","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/fennel-streaming-graph-partitioning-for-massive-scale-graphs\/","title":{"rendered":"Fennel: Streaming Graph Partitioning for Massive Scale Graphs"},"content":{"rendered":"
\n

Graph partitioning is a key problem to enable efficient solving of a wide range of computational tasks and querying over large-scale graph data, such as computing node centralities using iterative computations, and personalized recommendations. In this work, we introduce a unifying framework for graph partitioning which enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementation. We show that many previously proposed methods are special instances of this framework, we derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant benefits over previous approaches, using a large set of real-world and synthetic graphs.<\/p>\n

Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be overall comparable to the de-facto standard offline software METIS, and it even outperforms it on numerous real-world graphs. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 8.5 hours by METIS to produce a balanced partition that cuts 11.98% of edges. Furthermore, modularity–a popular measure for community detection [Girvan and Newman, 2002; Newman and Girvan, 2004; Newman, 2006]–is also a special instance of our framework. We establish the first rigorous approximation algorithm, achieving a guarantee of O(log(k)\/k) for partitioning into k clusters.<\/p>\n

Finally, we evaluate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform, and observe significant gains in terms of the communication cost and runtime.<\/p>\n<\/div>\n

<\/p>\n","protected":false},"excerpt":{"rendered":"

Graph partitioning is a key problem to enable efficient solving of a wide range of computational tasks and querying over large-scale graph data, such as computing node centralities using iterative computations, and personalized recommendations. In this work, we introduce a unifying framework for graph partitioning which enables a well principled design of scalable, streaming graph […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-163869","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us"],"msr_publishername":"ACM","msr_edition":"","msr_affiliation":"","msr_published_date":"2014-11-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"ACM","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"205778","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/MSR-TR-2012-213.pdf","id":"205778","title":"MSR-TR-2012-213.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":205778,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/MSR-TR-2012-213.pdf"}],"msr-author-ordering":[{"type":"text","value":"Charalampos E. Tsourakakis","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Christos Gkantsidis","user_id":31424,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Christos Gkantsidis"},{"type":"user_nicename","value":"Bozidar Radunovic","user_id":31286,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Bozidar Radunovic"},{"type":"user_nicename","value":"Milan Vojnovic","user_id":32922,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Milan Vojnovic"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[171035],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":171035,"post_title":"Big Data Analytics","post_name":"big-data-analytics","post_type":"msr-project","post_date":"2012-10-18 14:31:33","post_modified":"2019-08-19 14:57:01","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/big-data-analytics\/","post_excerpt":"We conduct research in the area of algorithms and systems for processing massive amounts of data. Our work aims at pushing the boundary\u00a0of\u00a0computer science\u00a0in the area of algorithms and systems for large-scale computations. Our mission is to\u00a0achieve\u00a0major technological breakthroughs\u00a0in order to facilitate\u00a0new\u00a0systems and services relying on efficient processing of big data. Research Areas Database queries - How can we efficiently resolve database queries on massive amounts of input data? Here the input data may be…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171035"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/163869"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/163869\/revisions"}],"predecessor-version":[{"id":549435,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/163869\/revisions\/549435"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=163869"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=163869"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=163869"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=163869"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=163869"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=163869"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=163869"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=163869"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=163869"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=163869"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=163869"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=163869"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=163869"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=163869"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=163869"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=163869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}