QURE: AI-assisted and Automatically Verified UDF Inlining

Proceedings of the ACM on Management of Data (SIGMOD 2025) | , Vol 3

Related File | Related File

User-defined functions (UDFs) extend the capabilities of SQL by improving code reusability and encapsulating complex logic, but can hinder the performance due to optimization and execution inefficiencies. Prior approaches attempt to address this by rewriting UDFs into native SQL, which is then inlined into the SQL queries that invoke them. However, these approaches are either limited to simple pattern matching or require the synthesis of complex verification conditions from procedural code, a process that is brittle and difficult to automate. This limits coverage and makes the translation approaches less extensible to unseen procedural constructs. In this work, we present QURE, a framework that (1) leverages large language models (LLMs) to translate UDFs to native SQL, and (2) introduces a novel formal verification method to establish equivalence between the UDF and its translation. QURE uses the semantics of SQL operators to automate the derivation of verification conditions, in turn resulting in broad coverage and high extensibility. We model a large set of imperative constructs, particularly those common in Python and Pandas UDFs, in an intermediate verification language, allowing for the verification of their SQL translation. In our empirical evaluation of Python and Pandas UDFs, equivalence is successfully verified for 88% of UDF-SQL pairs, with LLMs correctly translating 84.8% of these UDFs. The remaining UDFs lack semantically equivalent SQL. Executing the translated UDFs achieves median performance improvements of 23.7x on single-node clusters and 12.5x on 12-node clusters compared to the original UDFs, while also significantly reducing out-of-memory errors.

Python and Pandas UDFs used for evaluating QURE: QURE Benchmark.zip (opens in new tab)