Machine learning, in particular Large Language Models, has shown great promise at automating several aspects of programming and software development such as coding, testing, integration, static analysis, verification etc. in recent years. In this project, we leverage and extend large language models with ideas grounded in programming languages and correctness to develop trusted AI agents for all aspects of programming for reliable software development.
Current work explores various directions:
AI for program specifications
We study the problem of user-intent formalization (high-level deck (opens in new tab)) to derive formal specifications from informal requirements, and applications towards developing correct software. Some results
- Test (oracle) generation to find functional bugs in TOGA
- Test-driven interactive intent-formalization for improving accuracy and explainability of code generation in TiCoder
- Generating program specifications from natural language as in nl2postcond
- Symbolically testing specifications for correctness using program verifiers Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages – Microsoft Research
- Instantiating these ideas for security-critical domains such as verified parser generation from RFC descriptions in 3DGen (opens in new tab).
AI for proof synthesis and proof-oriented programming
We explore program proof automation using neural techniques including (a) improving LLMs ability to generate correct loop invariants by neurally ranking LLM-generated loop invariants in iRank (opens in new tab), and (b) creating large-scale dataset and fine-tuned models for proof automation for proof-oriented programming (PoPAI) (opens in new tab) (opens in new tab)in languages such as F*.
Others
Previous works include fine-tuning large language models on test execution data to predict runtime faults for generated code in CodeRanker, and improving search-based test generation with LLMs in CodaMOSA.