About
I am a Researcher in the Azure Research- Systems (opens in new tab) team. I am currently leading the efficient AI (opens in new tab) research project, focusing on the power/energy/thermal bottlenecks of GenAI deployment in cloud, and datacenter sustainability.
For publications, visit this (opens in new tab) tab.
Most recent news:
- September 2024: We published a preprint of our work on 10-million tokens long context LLM inference, Mnemosyne at https://arxiv.org/abs/2409.17264 (opens in new tab)
- September 2024: Our paper on input-dependent power consumption of GPUs was accepted by the Sustainable computing workshop at SC’24!
- September 2024: Our joint paper with AMD on Optimizing GPU data center power was accepted by APCCAS 2024!
- August 2024: DynamoLLM preprint is out!
- July 2024: We received two paper acceptances at MICRO 2024!
- July 2024: Served as Program co-chair for HotCarbon 2024 — stay tuned for a report out and proceedings.
- June 2024: Co-authored 4 papers presented at ISCA, with Splitwise being nominated for Best Paper!
- April 2024: Co-authored a paper on GenAI inference power provisioning at ASPLOS 2024.
- April 2024: Gave invited talks at the EMC2 workshop at ASPLOS and at UCSD on the topic: “Rapid growth in GPU deployments in datacenters: With great power comes great responsibility”.
Some of the amazing students I have been working with/ have worked with:
- Amey Agrawal, Georgia Tech (opens in new tab)
- Jovan Stojkovic, UIUC (opens in new tab)
- Yuhan Liu, University of Chicago (opens in new tab)
- Yueying Li, Cornell (opens in new tab)
- Theo Gregersen, CMU (opens in new tab)
- Pratyush Patel, UW Seattle (opens in new tab)
- Muhammad Laghari, Virgina Tech (opens in new tab)
- Gagandeep Panwar, Virginia Tech (opens in new tab)
- Jaylen Wang, CMU (opens in new tab)
- Josh Fried, MIT (opens in new tab)
- Gauhar Irfan Chaudhry, MIT (opens in new tab)
- Edwin Lim, CMU (opens in new tab)
- Kunal Jain, IIIT Hyderabad (opens in new tab)
- Marcin Copik, ETH Zurich (opens in new tab)
I received my PhD in 2019 from the University of Texas at Austin, with a thesis on main memory compression for higher effective capacity and bandwidth. I am generally interested in hardware-software co-design for systems challenges.