Recent innovations in generative large language models (LLMs) have made their applications and use-cases ubiquitous. This has led to large-scale deployments of these models, using complex, expensive, and power-hungry AI accelerators, most commonly GPUs. These developments make LLM training and inference efficiency an important challenge.
In the Azure Research – Systems (opens in new tab) group we are working on improving the Azure infrastructure including hardware, power, and serving.
Check the publications tab for our work so far!