Downloads
Loading…
Sarathi-Serve
November 2023
Sarathi-Serve (a research prototype) is a high throughput and low-latency LLM serving framework. This repository contains a benchmark suite for evaluating LLM performance from a systems point of view. It contains various workloads and scheduling policies that together can be…
VIDUR: LLM Simulator
November 2023
Vidur is a high-fidelity and extensible LLM inference simulator. It can help you with capacity planning and finding the best deployment configuration for your LLM deployments, test new research ideas like new scheduling algorithms, optimizations like speculative decoding, etc., and…