PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!

  • Remmelt Ammerlaan ,
  • Gilbert Antonius ,
  • Marc Friedman ,
  • H M Sajjad Hossain ,
  • Alekh Jindal ,
  • Peter Orenberg ,
  • Hiren Patel ,
  • Shi Qiao ,
  • Vijay Ramani ,
  • Lucas Rosenblatt ,
  • Abhishek Roy ,
  • Irene Shaffer ,
  • Soundarajan Srinivasan ,

VLDB 2022 |

Modern data processing systems require optimization at massive scale, and using machine learning to optimize these systems (ML-for-systems) has shown promising results. Unfortunately, ML-for-systems is subject to over generalizations that do not capture the large variety of workload patterns, and tend to augment the performance of certain subsets in the workload while regressing performance for others. In this paper, we introduce a performance safeguard system, called PerfGuard, that designs pre-production experiments for deploying ML-for-systems. Instead of searching the entire space of query plans (a well-known, intractable problem), we focus on query plan deltas (a significantly smaller space). PerfGuard formalizes these differences, and correlates plan deltas to important feedback signals, like execution cost. We describe the deep learning architecture and the end-to-end pipeline in PerfGuard that could be used with general relational databases. We show that this architecture improves on baseline models, and that our pipeline identifies key query plan components as major contributors to plan disparity. Offline experimentation shows PerfGuard as a promising approach, with many opportunities for future improvement.