TurboAttention: Efficient Attention Approximation For High Throughputs LLMs
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint
Hao Kang, Srikant Bharadwaj, James Hensman, Tushar Krishna, Victor Ruehle, Saravan Rajmohan
ArXiv | December 2024, Vol abs/2412.08585
Xi Wang, Liana Mikaelyan, Taketomo Isazawa, James Hensman
October 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman, Pashmina Cameron
2024 Neural Information Processing Systems | March 2024
Preprint