{"id":980586,"date":"2023-10-30T10:21:17","date_gmt":"2023-10-30T17:21:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=980586"},"modified":"2024-01-16T06:40:01","modified_gmt":"2024-01-16T14:40:01","slug":"welding-natural-language-queries-to-analytics-irs-with-llms","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/welding-natural-language-queries-to-analytics-irs-with-llms\/","title":{"rendered":"Welding Natural Language Queries to Analytics IRs with LLMs"},"content":{"rendered":"
From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate query representations (IRs) of modern data analytics systems. This has the direct benefit of making IRs more accessible to end-users, but interestingly, it can also result in improved translation accuracy and better end-to-end performance, especially when the query semantics is better captured in the IR rather than in SQL. We build an LLM-based pipeline (nl2weld) for one instance of this flow, to translate natural language queries to the Weld IR using gpt-4. nl2weld is carefully designed to harness self-reflection and instruction-following capabilities of gpt-4, providing it various forms of feedback such as domain specific instructions and feedback from the Weld compiler. We evaluate NL2WELD on a subset of the Spider benchmark and compare it against the gold standard SQL and DIN-SQL, a state-of-the-art nl2sql system. We report a comparable accuracy of 77.4% on the dataset, and also demonstrate examples on which nl2weld produces code that is 1.5 \u2212 4\u00d7 faster than the gold standard and DIN-SQL.<\/p>\n","protected":false},"excerpt":{"rendered":"
From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-980586","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-1-14","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/theme\/systems\/","label_id":"252679","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"user_nicename","value":"Aseem Rastogi","user_id":36021,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Aseem Rastogi"},{"type":"user_nicename","value":"Akash Lal","user_id":30905,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Akash Lal"},{"type":"user_nicename","value":"Sampath Rajendra","user_id":43107,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sampath Rajendra"},{"type":"text","value":"Krithika Subramanian","user_id":0,"rest_url":false},{"type":"text","value":"Krut Patel","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562,199565],"msr_event":[],"msr_group":[144939],"msr_project":[967329,967350],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967329,"post_title":"Domain Specialization","post_name":"domain-specialization","post_type":"msr-project","post_date":"2023-10-16 02:14:29","post_modified":"2024-01-12 08:47:20","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/domain-specialization\/","post_excerpt":"Scaling performance beyond Moore's law Domain specialization is expected to play a big role in how computer systems evolve in future. With the end of Moore's law, we are already seeing CPU, GPU and domain specific hardware evolving rapidly. The next decade is therefore expected to see big changes in how we develop, compile and run software. This project focuses on data systems, a class of systems where, as the data sizes grow, performance scaling is going to be of importance.First, we believe that domain-specific compilers will play a crucial strategic role in helping software leverage the changing hardware landscape. Such compilers will be multi-layered and will progressively lower computation through multiple intermediate abstractions, performing domain specific optimizations at the higher layers and specializing code to the hardware in lower layers. We have been working on two such domain specific compilers in the data domain. Second, new hardware specific algorithms need…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967329"}]}},{"ID":967350,"post_title":"AI-Driven Software Engineering","post_name":"967350","post_type":"msr-project","post_date":"2023-09-12 02:52:36","post_modified":"2023-10-09 10:27:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/967350\/","post_excerpt":"Using AI to assist every developer build better software faster Generative AI is transforming the way software is built. We conduct research at the forefront of this transformation. We design ML models, algorithms and platforms to improve developer productivity and software reliability. Check out the publications tab to learn more.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967350"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586\/revisions"}],"predecessor-version":[{"id":999141,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586\/revisions\/999141"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=980586"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=980586"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=980586"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=980586"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=980586"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=980586"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=980586"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=980586"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=980586"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=980586"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=980586"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=980586"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=980586"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=980586"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=980586"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=980586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}