{"id":474279,"date":"2018-03-16T15:02:38","date_gmt":"2018-03-16T22:02:38","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=474279"},"modified":"2018-03-16T15:21:41","modified_gmt":"2018-03-16T22:21:41","slug":"cloudviews-materialized-views-big-data-workloads","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/cloudviews-materialized-views-big-data-workloads\/","title":{"rendered":"CloudViews: Materialized Views for Big Data Workloads"},"content":{"rendered":"

Analytics workloads in Microsoft’s clusters are generated by users submitting SCOPE scripts, which are compiled down to jobs, which are then executed on the clusters. \u00a0An analysis of these workloads suggests\u00a0that there is significant overlap between the sub-graphs of the different jobs. \u00a0That is, the sub-graphs of different jobs compute the same<\/em> result. \u00a0 This suggests that we can build upon view materialization to improve performance. \u00a0However, doing view materialization at a massive scale, where the system processes hundreds of thousands of jobs per day, poses interesting challenges. \u00a0This project has two parts:<\/p>\n