Mariposa: Measuring SMT Instability in Automated Program Verification
- Yi Zhou ,
- Jay Bosamiya ,
- Yoshiki Takashima ,
- Jessica Li ,
- Marijn Heule ,
- Bryan Parno
Program verification has been successfully applied to increasingly large and complex systems. Much of this recent success can be attributed to the automation provided by dispatching verification condition queries via SMT solvers. However, multiple teams anecdotally report that this style of automated verification is plagued by proof instability, where semantically irrelevant changes to the query can have large effects on the SMT solver’s response.In this work, we present Mariposa, a tool to detect and quantity instability. To better understand the status quo of instability, we apply Mariposa to a set of 17,043 SMT queries from six existing program verification projects. We discover that SMT solver upgrades often make projects less stable, and that the most recent SMT solver version is unstable on 2.6% of the queries. For individual projects, the unstable ratio can grow to 5.0%. Based on our experimental results, we curate the Mariposa benchmark, which we hope will help measure and incentivize stability improvements in SMT-based program verification.