PRETZEL: opening the black box of machine learning prediction serving systems

Yunseong Lee; Alberto Scolari; Byung-Gon Chun; Marco Domenico Santambrogio; Matteo Interlandi; Markus Weimer

PRETZEL: opening the black box of machine learning prediction serving systems

Yunseong Lee ,
Alberto Scolari ,
Byung-Gon Chun ,
Marco Domenico Santambrogio ,
Matteo Interlandi ,
Markus Weimer

Operating Systems Design and Implementation | October 2018

Published by USENIX Association

Publication | Publication

Download BibTex

Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training-time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5× while reducing memory footprint by 25×, and increasing throughput by 4.7×.