POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference
Aditya K Kamath, Ramya Prabhu, Jayashree Mohan, Simon Peter, Ramachandran Ramjee, Ashish Panwar
Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025 | March 2025