News & features
Research Focus: Week of September 23, 2024
Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Time-series forecasting is a technique used to predict future values based on previously…
Research Focus: Week of April 15, 2024
In this issue: New research on appropriate reliance on generative AI; Power management opportunities for LLMs in the cloud; LLMLingua-2 improves task-agnostic prompt compression; Enhancing COMET to embrace under-resourced African languages:
Splitwise improves GPU usage by splitting LLM inference phases
| Esha Choukse, Chaojie Zhang, Íñigo Goiri, Aashaka Shah, Saeed Maleki, Rodrigo Fonseca, and Ricardo Bianchini
Expanded LLM use creates new demands on cloud GPU capacity. Splitwise presents an efficient solution by separating the two essential phases of LLM inference, achieving higher throughput within a limited power budget.