{"id":1027098,"date":"2024-05-07T09:00:00","date_gmt":"2024-05-07T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/loftq-reimagining-llm-fine-tuning-with-smarter-initialization\/"},"modified":"2024-05-01T07:52:24","modified_gmt":"2024-05-01T14:52:24","slug":"loftq-reimagining-llm-fine-tuning-with-smarter-initialization","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/loftq-reimagining-llm-fine-tuning-with-smarter-initialization\/","title":{"rendered":"LoftQ: Reimagining LLM fine-tuning with smarter initialization"},"content":{"rendered":"\n

This research paper was presented at the <\/em><\/strong>12th<\/sup> International Conference on Learning Representations<\/em><\/strong> (opens in new tab)<\/span><\/a> (ICLR 2024), the premier conference dedicated to the advancement of deep learning.<\/em><\/strong><\/p>\n\n\n\n

\"Teal<\/figure>\n\n\n\n
\n\t
\n\t\t
\n\t\t\t\t\t\tPublication<\/span>\n\t\t\tLoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models<\/span> <\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n

Large language models (LLMs) use extensive datasets and advanced algorithms to generate nuanced, context-sensitive content. However, their development requires substantial computational resources. To address this, we developed LoftQ, an innovative technique that streamlines the fine-tuning process\u2014which is used to adapt pre-trained language models to perform well in specialized applications, such as analyzing medical documents. During fine-tuning, the model undergoes additional training on a smaller, task-specific dataset. This results in improved performance, such as more accurate predictions, better understanding of domain-specific language, and more relevant responses in the context of the specialized area.<\/p>\n\n\n\n

LoftQ\u2019s strength lies in its ability to combine quantization and adaptive initialization during fine-tuning. Quantization reduces the precision of model parameters, lowering memory and computation needs. This not only accelerates processing but also reduces power consumption. Adaptive initialization closely aligns the model\u2019s parameters to its optimal pre-trained state, preserving its capabilities while minimizing resource use. Our paper, \u201cLoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models<\/a>,\u201d presented at ICLR 2024, details how this method can help make AI technologies more efficient and sustainable. <\/p>\n\n\n\n

How LoftQ works <\/h2>\n\n\n\n

LoftQ builds on the principles of LoRA<\/a> and QLoRA (opens in new tab)<\/span><\/a>. LoRA is a method that greatly reduces the number of parameters needed for training, decreasing the memory requirements for fine-tuning. QLoRA is a fine-tuning approach that uses 4-bit quantized, frozen weights and low rank adapters, significantly reducing memory requirements while maintaining high performance. This is illustrated in Table 1, which shows the amount of memory needed for fine-tuning an LLM with 7 billion parameters as well as the memory requirements for LoRA and QLoRA. LoRA achieves a fourfold reduction in memory usage, and QLoRA further reduces it by twofold.<\/p>\n\n\n\n

\"LoftQ
Table 1: This table shows the GPU memory usage for a 7-billion parameter LLM with the following configurations: full fine-tuning on the left, LoRA in the middle, and QLoRA on the right.<\/figcaption><\/figure>\n\n\n\n

Unlike LoRA, QLoRA comes with a tradeoff, where some quality of the pretrained model is sacrificed due to the quantization of weights. LoftQ recognizes this and optimizes the initialization of quantization and low-rank adaptation matrices. That is, LoftQ seeks to identify a combination of a quantized matrix and a low rank matrix such that their sum closely approximates the original pretrained weight. This is done for every matrix that would be adapted in the model.<\/p>\n\n\n\n

The LoftQ algorithm alternates between two primary steps. First it quantizes (simplifies) the weights, and then it finds the best low-rank factors that approximate the quantization between the pretrained weight and the low-rank weight. The process repeats for a few steps. This method enables the fine-tuning process to start from a more effective initial state, which preserves accuracy while using less computational power and much more simplified weights.<\/p>\n\n\n\n

LoftQ requires a one-time setup to simplify and prepare these weights, allowing a fixed portion of the model\u2019s parameters (e.g., 5 percent) to be adjusted. Once established, this configuration can be repeatedly applied as the model transitions between various tasks and settings. <\/p>\n\n\n\n

Evaluating LoftQ <\/h2>\n\n\n\n

Tests using various types of LLMs, including those with different combinations of encoding and decoding capabilities like the Llama-2, show that models initialized with LoftQ consistently achieve strong performance, often matching or surpassing those configured with QLoRA.<\/p>\n\n\n\n

In practical terms, comparing the performance of LoftQ and QLoRA on different tasks using the Llama-2 model family yields distinct results, which are highlighted in Table 2. For the WikiText-2 dataset, which measures the model\u2019s perplexity (lower is better), and the GSM8K dataset, which tests the model\u2019s ability to solve basic math problems (higher is better), we demonstrate the effectiveness of varying degrees of weight simplification\u2014averaging 3, 2.5, and 2.25 bits per weight. Our paper<\/a> discusses the results in more detail. <\/p>\n\n\n\n

\"LoftQ
Table 2. This table compares LoftQ and QLoRA during the fine-tuning of two Llama-2 models on the Wikitext-2 and GSM8K datasets.<\/figcaption><\/figure>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\tSpotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"\"\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

Implications and looking forward <\/h2>\n\n\n\n

LoftQ promises to advance the field of AI by accelerating research and facilitating the creation of cutting-edge tools while supporting sustainable development. While initially focused on LLMs, LoftQ\u2019s flexible design also supports fine-tuning in other types of models, such those for vision and speech technologies. As our research progresses, we expect to make further enhancements that will boost performance on downstream tasks. We hope these improvements will lead to broader adoption across various AI applications. We\u2019re excited about the breadth of this technology\u2019s applicability and encourage the AI community to explore its benefits. LoftQ is available as open source through the Hugging Face PEFT library (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"

LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.<\/p>\n","protected":false},"author":42735,"featured_media":1027119,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1027098","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[1014303],"related-researchers":[{"type":"user_nicename","value":"Nikos Karampatziakis","user_id":33104,"display_name":"Nikos Karampatziakis","author_link":"Nikos Karampatziakis<\/a>","is_active":false,"last_first":"Karampatziakis, Nikos","people_section":0,"alias":"nikosk"},{"type":"user_nicename","value":"Chen Liang","user_id":43239,"display_name":"Chen Liang","author_link":"Chen Liang<\/a>","is_active":false,"last_first":"Liang, Chen","people_section":0,"alias":"chenliang1"},{"type":"user_nicename","value":"Weizhu Chen","user_id":34863,"display_name":"Weizhu Chen","author_link":"Weizhu Chen<\/a>","is_active":false,"last_first":"Chen, Weizhu","people_section":0,"alias":"wzchen"},{"type":"guest","value":"yixiao-li","user_id":"1027107","display_name":"Yixiao Li","author_link":"Yixiao Li<\/a>","is_active":true,"last_first":"Li, Yixiao","people_section":0,"alias":"yixiao-li"},{"type":"guest","value":"yifan-yu-2","user_id":"1027137","display_name":"Yifan Yu","author_link":"Yifan Yu<\/a>","is_active":true,"last_first":"Yu, Yifan","people_section":0,"alias":"yifan-yu-2"},{"type":"guest","value":"tuo-zhao","user_id":"782632","display_name":"Tuo Zhao","author_link":"Tuo Zhao<\/a>","is_active":true,"last_first":"Zhao, Tuo","people_section":0,"alias":"tuo-zhao"}],"msr_type":"Post","featured_image_thumbnail":"\"LoftQ","byline":"","formattedDate":"May 7, 2024","formattedExcerpt":"LoftQ boosts LLM efficiency by streamlining the fine-tuning process, reducing computational demands while preserving high performance. Innovations like this can help make AI technology more energy-efficient.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1027098","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1027098"}],"version-history":[{"count":38,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1027098\/revisions"}],"predecessor-version":[{"id":1029315,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1027098\/revisions\/1029315"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1027119"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1027098"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1027098"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1027098"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1027098"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1027098"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1027098"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1027098"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1027098"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1027098"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1027098"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1027098"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}