{"id":571077,"date":"2019-03-02T19:54:47","date_gmt":"2019-03-03T03:54:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=571077"},"modified":"2021-03-18T10:49:08","modified_gmt":"2021-03-18T17:49:08","slug":"instalytics-cluster-filesystem-co-design-for-big-data-analytics","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/instalytics-cluster-filesystem-co-design-for-big-data-analytics\/","title":{"rendered":"INSTalytics: Cluster Filesystem Co-design for Big-data Analytics"},"content":{"rendered":"
We present the design, implementation, and evaluation of INSTalytics, a co-designed stack of a cluster file system and the compute layer, for efficient big-data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle. To achieve this, INSTalytics uses compute-awareness to customize the three-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables INSTalytics to preserve the same recovery cost and availability as traditional replication. INSTalytics also uses compute-awareness to expose a new sliced-read API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes. We have built a prototype implementation of INSTalytics in a production analytics stack, and we show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.<\/p>\n","protected":false},"excerpt":{"rendered":"
We present the design, implementation, and evaluation of INSTalytics, a co-designed stack of a cluster file system and the compute layer, for efficient big-data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193715],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246784,247552,251500,253867,246691,246793,253861,253864,251488,248284],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-571077","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-field-of-study-analytics","msr-field-of-study-big-data","msr-field-of-study-block-data-storage","msr-field-of-study-co-design","msr-field-of-study-computer-science","msr-field-of-study-distributed-computing","msr-field-of-study-file-system","msr-field-of-study-filter-signal-processing","msr-field-of-study-joins","msr-field-of-study-scheduling-computing"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-2-4","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"ACM Transactions on Storage","msr_volume":"15","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"4","msr_organization":"","msr_how_published":"","msr_notes":"Invited Paper: USENIX FAST 2019 Special Section","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/03\/instalytics-fast19.pdf","id":"571080","title":"instalytics-fast19","label_id":"243132","label":0},{"type":"doi","viewUrl":"false","id":"false","title":"10.1145\/3369738","label_id":"243106","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":571080,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2019\/03\/instalytics-fast19.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Muthian Sivathanu","user_id":36320,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Muthian Sivathanu"},{"type":"text","value":"Midhul Vuppalapati","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Bhargav Gulavani","user_id":31219,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Bhargav Gulavani"},{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"user_nicename","value":"Jyoti Leeka","user_id":40066,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jyoti Leeka"},{"type":"text","value":"Jayashree Mohan","user_id":0,"rest_url":false},{"type":"text","value":"Piyus Kedia","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562],"msr_event":[],"msr_group":[],"msr_project":[600474,557094],"publication":[],"video":[],"download":[],"msr_publication_type":"article","related_content":{"projects":[{"ID":600474,"post_title":"Micro Co-design","post_name":"micro-codesign","post_type":"msr-project","post_date":"2020-02-23 21:23:12","post_modified":"2020-02-23 21:23:13","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/micro-codesign\/","post_excerpt":"The massive scale of cloud infrastructure services enables, and often necessitates, vertical co-design of the infrastructure stack by cloud providers. Micro\u00a0co-design is a\u00a0Minimally invasive,\u00a0Cheap, and retro-fittable approach to co-design that extracts efficiency out of existing software infrastructure layers. It does this by making lightweight changes to generic software interfaces. We are exploring micro co-design in the context of four systems: Instalytics, Astra, Gandiva, and Quiver, aimed at improving the efficiency and functionality of key infrastructure…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/600474"}]}},{"ID":557094,"post_title":"Instalytics: Storage for Big Data","post_name":"instalytics-storage-for-big-data","post_type":"msr-project","post_date":"2018-12-14 02:37:43","post_modified":"2019-11-19 09:21:44","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/instalytics-storage-for-big-data\/","post_excerpt":"Instalytics (Intelligent Store-powered Analytics) is a vertically integrated infrastructure stack that enables efficient big data analytics in large-scale data centers, by careful co-design of the storage layer (cluster file system) with the compute layer (query engine and job scheduler).","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/557094"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/571077"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/571077\/revisions"}],"predecessor-version":[{"id":734767,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/571077\/revisions\/734767"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=571077"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=571077"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=571077"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=571077"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=571077"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=571077"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=571077"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=571077"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=571077"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=571077"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=571077"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=571077"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=571077"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=571077"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=571077"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=571077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}