Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang
ISCA 2024 | July 2024
Microsoft Research Focus http://approjects.co.za/?big=en-us/research/blog/research-focus-week-of-july-15-2024/