{"id":557094,"date":"2018-12-14T02:37:43","date_gmt":"2018-12-14T10:37:43","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=557094"},"modified":"2019-11-19T09:21:44","modified_gmt":"2019-11-19T17:21:44","slug":"instalytics-storage-for-big-data","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/instalytics-storage-for-big-data\/","title":{"rendered":"Instalytics: Storage for Big Data"},"content":{"rendered":"

Instalytics (In<\/strong>telligent St<\/strong>ore-powered Analytics<\/strong>) is a vertically integrated infrastructure stack that enables efficient big data analytics in large-scale data centers, by careful co-design of the storage layer (cluster file system) with the compute layer (query engine and job scheduler).<\/p>\n

As an example of the benefits from such co-design, Instalytics\u00a0amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, Instalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle\u00a0 \u00a0To achieve this, Instalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables Instalytics to preserve the same recovery cost and availability as traditional replication.\u00a0 Another example of using compute-awareness is that the file system in Instalytics\u00a0 exposes a new sliced-read<\/em> API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently through coordinated request scheduling and selective caching at the storage nodes.<\/p>\n","protected":false},"excerpt":{"rendered":"

Instalytics (Intelligent Store-powered Analytics) is a vertically integrated infrastructure stack that enables efficient big data analytics in large-scale data centers, by careful co-design of the storage layer (cluster file system) with the compute layer (query engine and job scheduler).<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13547],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-557094","msr-project","type-msr-project","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"2017-03-01","related-publications":[571077],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Kaushik Rajan","user_id":32574,"people_section":"Section name 1","alias":"krajan"}],"msr_research_lab":[199562],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/557094"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":8,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/557094\/revisions"}],"predecessor-version":[{"id":621915,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/557094\/revisions\/621915"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=557094"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=557094"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=557094"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=557094"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=557094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}