Scope playback: self-validation in the cloud

2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) | , pp. 3

Publication

The last decade witnessed the emergence of various distributed storage and computation systems for cloud-scale data processing. Scope is the distributed computation platform targeted for a variety of data analysis and data mining applications, powering Bing and other online services at Microsoft. Scope combines benefits of both traditional parallel databases and MapReduce execution engines to allow easy programmability. It features a SQL-like declarative scripting language with .NET extensions, and delivers massive scalability and high performance through advanced optimization. Scope currently operates over tens of thousands of machines and processes over a million jobs per month. Such massive data computation platform presents new challenges and opportunities for efficient and effective testing and validation. Traditional approaches for testing database systems are not always sufficient due to several factors. Model-based query generation typically fails to provide coverage of user-defined code, which is very common in Scope scripts. Additionally, rapid release cycles in the platform-as-a-service environment require tools to quickly identify potential regressions, predict the impact of breaking changes, and provide massive test coverage in a short amount of time. In this paper, we describe a test automation tool, denoted by Scope Playback, that addresses these new requirements. Scope Playback leverages the Scope system itself in two important ways. First, it exploits data about every job submitted to production clusters, which is automatically stored by the Scope system. Second, the testing process itself is implemented as a Scope script, automatically benefiting from transparent and massive computation parallelism. Scope Playback currently serves as one crucial validation technique and ensures product quality during Scope release cycles.