{"id":1032207,"date":"2024-05-08T01:01:42","date_gmt":"2024-05-08T08:01:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=1032207"},"modified":"2024-05-08T01:17:18","modified_gmt":"2024-05-08T08:17:18","slug":"vidur-a-large-scale-simulation-framework-for-llm-inference","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/vidur-a-large-scale-simulation-framework-for-llm-inference\/","title":{"rendered":"VIDUR: A Large-Scale Simulation Framework for LLM Inference"},"content":{"rendered":"

Optimizing the deployment of Large Language Models (LLMs) is expensive today since it requires experimentally
\nrunning an application workload against an LLM implementation while exploring large configuration space formed
\nby system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address
\nthis challenge, we present Vidur \u2013 a large-scale, high-fidelity, easily-extensible simulation framework for LLM
\ninference performance. Vidur models the performance of LLM operators using a combination of experimental
\nprofiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads
\nby estimating several metrics of interest such as latency and throughput. We validate the fidelity of Vidur on
\nseveral LLMs and show that it estimates inference latency with less than 9% error across the range. Further, we
\npresent Vidur-Search, a configuration search tool that helps optimize LLM deployment. Vidur-Search uses Vidur
\nto automatically identify the most cost-effective deployment configuration that meets application performance
\nconstraints. For example, Vidur-Search finds the best deployment configuration for LLaMA2-70B in one hour on
\na CPU machine, in contrast to a deployment-based exploration which would require 42K GPU hours \u2013 costing
\n218K dollars. Source code for Vidur is available at https:\/\/github.com\/microsoft\/vidur.<\/p>\n","protected":false},"excerpt":{"rendered":"

Optimizing the deployment of Large Language Models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur \u2013 a large-scale, high-fidelity, easily-extensible simulation framework […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13547],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[246691],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1032207","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-systems-and-networking","msr-locale-en_us","msr-field-of-study-computer-science"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-5-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/05\/vidur_mlsys24.pdf","id":"1032210","title":"vidur_mlsys24","label_id":"243132","label":0}],"msr_related_uploader":"","msr_attachments":[{"id":1032210,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/05\/vidur_mlsys24.pdf"}],"msr-author-ordering":[{"type":"text","value":"Amey Agrawal","user_id":0,"rest_url":false},{"type":"text","value":"Nitin Kedia","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Jayashree Mohan","user_id":40927,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jayashree Mohan"},{"type":"user_nicename","value":"Ashish Panwar","user_id":42153,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ashish Panwar"},{"type":"user_nicename","value":"Nipun Kwatra","user_id":37634,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Nipun Kwatra"},{"type":"text","value":"Bhargav S. Gulavani","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ramachandran Ramjee","user_id":33337,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ramachandran Ramjee"},{"type":"text","value":"Alexey Tumanov","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562],"msr_event":[],"msr_group":[144939],"msr_project":[968667],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1032207"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1032207\/revisions"}],"predecessor-version":[{"id":1032219,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1032207\/revisions\/1032219"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1032207"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=1032207"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1032207"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1032207"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1032207"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=1032207"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1032207"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=1032207"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=1032207"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1032207"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1032207"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1032207"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1032207"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1032207"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1032207"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}