System and Engineering Group

MSR Asia-Shanghai

项目

LLMLingua Series

成立: 2023年2月1日

Effectively Deliver Information to LLMs via Prompt Compression LLMLingua Read More (opens in new tab) LongLLMLingua Read More (opens in new tab) LLMLingua-2 Read More (opens in new tab) Large language models (LLMs) have demonstrated remarkable capabilities and have been applied…

MInference: Million-Tokens Prompt Inference for Long-context LLMs

成立: 2024年2月29日

Million-Tokens Prompt Inference for Long-context LLMs MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then…