Towards QoS-awareness and Improved Utilization of Spatial Multitasking GPUs

Wei Zhang; Quan Chen; Ningxin Zheng; Weihao Cui; Kaihua Fu; Minyi Guo

Towards QoS-awareness and Improved Utilization of Spatial Multitasking GPUs

Wei Zhang ,
Quan Chen ,
Ningxin Zheng ,
Weihao Cui ,
Kaihua Fu ,
Minyi Guo

IEEE Transactions on Computers | March 2021 , pp. 1-1

Download BibTex

Datacenters use GPUs to provide the significant computing throughput required by emerging user-facing services. The diurnal user access pattern of user-facing services provides a strong incentive to co-located applications for better GPU utilization, and prior work has focused on enabling co-location on multicore processors and traditional non-preemptive accelerators. However, current GPUs are evolving towards spatial multitasking and introduce a new set of challenges to eliminate QoS violations. We propose C-Laius, a runtime system that carefully allocates the computation resource to co-located applications for maximizing the throughput of batch applications while guaranteeing the required QoS of user-facing services. C-Laius not only allows co-locating one user-facing application with multiple batch applications, but also supports the condition of multiple user-facing applications with batch applications. In the case of a single co-located user-facing application, our evaluation on an Nvidia RTX 2080Ti GPU shows that C-Laius improves the utilization of spatial multitasking accelerators by 20.8%, while achieving the 99%-ile latency target for user-facing services. As to the case of multiple co-located user-facing applications, C-Laius ensures no violation of QoS while improving the accelerator utilization by 35.9% on average.