Esha Choukse

Principal Researcher

Awards | ACM SIGMICRO

Esha Choukse receives 2025 SIGMICRO Early Career Award

November 7, 2025

Choukse was recognized for her foundational contributions to hardware memory compression and to sustainable and efficient datacenter systems.

Microsoft Research Blog

Research Focus: Week of September 23, 2024

September 25, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Time-series forecasting is a technique used to predict future values based on previously…

Microsoft Research Blog

Research Focus: Week of April 15, 2024

April 17, 2024

In this issue: New research on appropriate reliance on generative AI; Power management opportunities for LLMs in the cloud; LLMLingua-2 improves task-agnostic prompt compression; Enhancing COMET to embrace under-resourced African languages:

An example of the generative LLM inference process and the two phases associated with it. The initial prompt is “Which is better, pizza or burger?” and it generates the word “Pizza”. The token generation phase generates the words/tokens: “is”, “better”, and “.”. The prompt phase has the following properties: (1) all input tokens are processed in parallel to generate the first output token, (2) compute intensive, and (3) is a smaller part of the end-to-end latency. The token phase is: (1) serialized, (2) memory intensive, and (3) tends to be the majority of the end-to-end latency.

Microsoft Research Blog

Splitwise improves GPU usage by splitting LLM inference phases

January 4, 2024 | Esha Choukse, Chaojie Zhang, Íñigo Goiri, Aashaka Shah, Saeed Maleki, Rodrigo Fonseca, and Ricardo Bianchini

Expanded LLM use creates new demands on cloud GPU capacity. Splitwise presents an efficient solution by separating the two essential phases of LLM inference, achieving higher throughput within a limited power budget.

Esha Choukse

News & features

Esha Choukse receives 2025 SIGMICRO Early Career Award

Research Focus: Week of September 23, 2024

Research Focus: Week of April 15, 2024

Splitwise improves GPU usage by splitting LLM inference phases

Contact Esha Choukse