Analytics workloads in Microsoft’s clusters are generated by users submitting SCOPE scripts, which are compiled down to jobs, which are then executed on the clusters. An analysis of these workloads suggests that there is significant overlap between the sub-graphs of the different jobs. That is, the sub-graphs of different jobs compute the same result. This suggests that we can build upon view materialization to improve performance. However, doing view materialization at a massive scale, where the system processes hundreds of thousands of jobs per day, poses interesting challenges. This project has two parts:
- An algorithmic component where we develop algorithms for automatically identifying which views to materialize.
- A systems-building component where we implement our algorithms into the SCOPE runtime system and deploy them in Microsoft’s clusters.
People
Alekh Jindal
Senior Scientist
Cloud and Information Services Lab