{"id":319952,"date":"2016-11-11T12:23:04","date_gmt":"2016-11-11T20:23:04","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=319952"},"modified":"2018-10-16T20:16:23","modified_gmt":"2018-10-17T03:16:23","slug":"quill-efficient-transferable-rich-analytics-scale","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/quill-efficient-transferable-rich-analytics-scale\/","title":{"rendered":"Quill: Efficient, Transferable, and Rich Analytics at Scale"},"content":{"rendered":"
This paper introduces Quill (stands for a quadrillion tuples per day), a library and distributed platform for relational and temporal analytics over large datasets in the cloud. Quill exposes a new abstraction for parallel datasets and computation, called ShardedStreamable. This abstraction provides the ability to express ef\ufb01cient distributed physical query plans that are transferable, i.e., movable from of\ufb02ine to real-time and vice versa. ShardedStreamable decouples incremental query logic speci\ufb01cation, a small but rich set of data movement operations, and keying; this allows Quill to express a broad space of plans with complex querying functionality, while leveraging existing temporal libraries such as Trill. Quill\u2019s layered architecture provides a careful separation of responsibilities with independently useful components, while retaining high performance. We built Quill for the cloud, with a master-less design where a language-integrated client library directly communicates and coordinates with cloud workers using off-the-shelf distributed cloud components such as queues. Experiments on up to 400 cloud machines, and on datasets up to 1TB, \ufb01nd Quill to incur low overheads and outperform SparkSQL by up to orders-of-magnitude for temporal and 6X for relational queries, while supporting a rich space of transferable, programmable, and expressive distributed physical query plans.<\/p>\n","protected":false},"excerpt":{"rendered":"
This paper introduces Quill (stands for a quadrillion tuples per day), a library and distributed platform for relational and temporal analytics over large datasets in the cloud. Quill exposes a new abstraction for parallel datasets and computation, called ShardedStreamable. This abstraction provides the ability to express ef\ufb01cient distributed physical query plans that are transferable, i.e., […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-319952","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"International Conference on Very Large Databases (PVLDB Vol. 9, Issue. 14)","msr_affiliation":"","msr_published_date":"2016-09-01","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"319955","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"p1623-chandramouli","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/p1623-chandramouli.pdf","id":319955,"label_id":0},{"type":"file","title":"shrink-vldb17","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/shrink-vldb17.pdf","id":345575,"label_id":0}],"msr_related_uploader":"","msr_attachments":[{"id":345575,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/shrink-vldb17.pdf"},{"id":319955,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/11\/p1623-chandramouli.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"badrishc","user_id":31166,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=badrishc"},{"type":"text","value":"Raul Castro Fernandez","user_id":0,"rest_url":false},{"type":"text","value":"Jonathan Goldstein","user_id":0,"rest_url":false},{"type":"text","value":"Ahmed Eldawy","user_id":0,"rest_url":false},{"type":"text","value":"Abdul Quamar","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[957177],"msr_project":[171207,170875],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":171207,"post_title":"Trill","post_name":"trill","post_type":"msr-project","post_date":"2013-09-19 14:35:28","post_modified":"2019-07-16 08:47:44","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/trill\/","post_excerpt":"Trill is a high-performance open-source in-memory incremental analytics library. It can handle both real-time and offline data, and is based on a temporal data and query model. Trill can be used as a streaming engine, a lightweight in-memory relational engine, and as a progressive query processor (for early query results on partial data). You can learn more about Trill from the publications below, or from our slides\u00a0here pdf | pptx. Trill is now open-source, and…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/171207"}]}},{"ID":170875,"post_title":"Streams","post_name":"streams","post_type":"msr-project","post_date":"2011-11-21 13:31:30","post_modified":"2017-06-19 10:26:41","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/streams\/","post_excerpt":"In the streams research project, we propose novel architectures, efficient processing techniques, models, and applications to support time-oriented queries over real-time and offline data streams. Our current focus in the project centers around Trill, a high-performance streaming analytics engine that is now used across Microsoft. Our currect focus areas include efficient query processing, scale-out, resiliency, streaming state management, and unstructured data support.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/170875"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/319952"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/319952\/revisions"}],"predecessor-version":[{"id":525636,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/319952\/revisions\/525636"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=319952"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=319952"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=319952"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=319952"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=319952"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=319952"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=319952"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=319952"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=319952"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=319952"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=319952"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=319952"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=319952"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=319952"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=319952"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=319952"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}