Welding Natural Language Queries to Analytics IRs with LLMs

From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate query representations (IRs) of modern data analytics systems. This has the direct benefit of making IRs more accessible to end-users, but interestingly, it can also result in improved translation accuracy and better end-to-end performance, especially when the query semantics is better captured in the IR rather than in SQL. We build an LLM-based pipeline (nl2weld) for one instance of this flow, to translate natural language queries to the Weld IR using gpt-4. nl2weld is carefully designed to harness self-reflection and instruction-following capabilities of gpt-4, providing it various forms of feedback such as domain specific instructions and feedback from the Weld compiler. We evaluate NL2WELD on a subset of the Spider benchmark and compare it against the gold standard SQL and DIN-SQL, a state-of-the-art nl2sql system. We report a comparable accuracy of 77.4% on the dataset, and also demonstrate examples on which nl2weld produces code that is 1.5 − 4× faster than the gold standard and DIN-SQL.