Fast Local Page-Tables for Virtualized NUMA Servers with vMitosis
- Ashish Panwar ,
- Reto Achermann ,
- Arkaprava Basu ,
- Abhishek Bhattacharjee ,
- K. Gopinath ,
- Jayneel Gandhi
2021 Architectural Support for Programming Languages and Operating Systems |
Published by Association for Computing Machinery
Increasing heterogeneity in the memory system mandates careful data placement to hide the non-uniform memory access (NUMA) effects on applications. However, NUMA optimizations have predominantly focused on application data in the past decades, largely ignoring the placement of kernel data structures due to their small memory footprint; this is evident in typical OS designs that pin kernel objects in memory. In this paper, we show that careful placement of kernel data structures is gaining importance in the context of page-tables: sub-optimal placement of page-tables causes severe slowdown (up to 3.1x) on virtualized NUMA servers.
In response, we present vMitosis — a system for explicit management of two-level page-tables, i.e., the guest and extended page-tables, on virtualized NUMA servers. vMitosis enables faster address translation by migrating and replicating page-tables. It supports two prevalent virtualization configurations: first, where the hypervisor exposes the NUMA architecture to the guest OS, and second, where such information is hidden from the guest OS. vMitosis is implemented in Linux/KVM, and our evaluation on a recent 1.5TiB 4-socket server shows that it effectively eliminates NUMA effects on 2D page-table walks, resulting in a speedup of 1.8-3.1x for Thin (single-socket) and 1.06-1.6x for Wide (multi-socket) workloads.