{"id":1010406,"date":"2024-03-05T06:00:00","date_gmt":"2024-03-05T14:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1010406"},"modified":"2024-03-06T08:41:49","modified_gmt":"2024-03-06T16:41:49","slug":"orca-math-demonstrating-the-potential-of-slms-with-model-specialization","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/orca-math-demonstrating-the-potential-of-slms-with-model-specialization\/","title":{"rendered":"Orca-Math: Demonstrating the potential of SLMs with model specialization"},"content":{"rendered":"\n
\"abstract<\/figure>\n\n\n\n

Our work on Orca<\/a> and Orca 2<\/a> demonstrated the power of improved training signals and methods to enhance the reasoning abilities of smaller language models, getting closer to the levels found in much larger language models. Orca-Math is another step in this direction, where we explore the capabilities of small language models (SLMs) when specialized in a certain area, in this case solving grade school math problems, which has long been recognized as a complex task for SLMs.<\/p>\n\n\n\n

Orca-Math is a 7 billion parameters model created by fine-tuning the Mistral 7B model. Orca-Math achieves 86.81% on GSM8k pass@1, exceeding the performance of much bigger models including general models (e.g. LLAMA-2-70, Gemini Pro and GPT-3.5) and math-specific models (e.g. MetaMath-70B and WizardMa8th-70B). Note that the base model (Mistral-7B) achieves 37.83% on GSM8K.<\/p>\n\n\n\n

\"Alt
 Bar graph comparing GSM8K score of different models with an upward trend in quality. The models are LLAMA-2-70, GPT-3.5, Gemini Pro,  WizardMath-70B, MetaMath-70B and Orca-Math-7B. The graph shows that the Orca-Math-7B model outperforms other bigger models on GSM8K.<\/figcaption><\/figure>\n\n\n\n

The state-of-the-art (SOTA) performance of the Orca-Math model can be attributed to two key insights:<\/p>\n\n\n\n