Computation Reuse via Fusion in Amazon Athena

2022 IEEE 38th International Conference on Data Engineering (ICDE) | , pp. 1610-1620

Publication

Amazon Athena is a serverless, interactive query service that allows efficiently analyzing large volumes of data stored in Amazon S3 using ANSI SQL. Some design choices in the engine, especially those concerning streaming of intermediate results, can result in suboptimal executions for query patterns that have common expressions. In this paper we build upon recent work and introduce new optimizations in Athena that handle some common expression scenarios without materializing intermediate results or duplicating work. We describe commonalities and differences with previous work, and provide experimental results that validate our approach on TPC-DS data.