Phi-2<\/td> 2.7B<\/td> 59.2<\/td> 68.8<\/td> 62.0<\/td> 61.1<\/td> 53.7<\/td><\/tr><\/tbody><\/table>Table 1.<\/strong> Averaged performance on grouped benchmarks compared to popular open-source SLMs.<\/center><\/figcaption><\/figure>\n\n\n\nModel<\/th> Size<\/th> BBH<\/th> BoolQ<\/th> MBPP<\/th> MMLU<\/th><\/tr><\/thead> Gemini Nano 2<\/td> 3.2B<\/td> 42.4<\/td> 79.3<\/td> 27.2<\/td> 55.8<\/td><\/tr> Phi-2<\/td> 2.7B<\/td> 59.3<\/td> 83.3<\/td> 59.1<\/td> 56.7<\/td><\/tr><\/tbody><\/table>Table 2.<\/strong> Comparison between Phi-2 and Gemini Nano 2 Model on Gemini\u2019s reported benchmarks.<\/center><\/figcaption><\/figure>\n\n\n\nIn addition to these benchmarks, we also performed extensive testing on commonly used prompts from the research community. We observed a behavior in accordance with the expectation we had given the benchmark results. For example, we tested a prompt used to probe a model\u2019s ability to solve physics problems, most recently used to evaluate the capabilities of the Gemini Ultra model, and achieved the following result:<\/p>\n\n\n\nFigure 4. <\/strong>Phi-2’s output on a simple physics problem, which includes an approximately correct square root calculation.<\/figcaption><\/figure>\n\n\n\nFigure 5. <\/strong>Similarly to Gemini\u2019s test we also further queried Phi-2 with a student\u2019s wrong answer to see if Phi-2 could identify where the mistake is (it did, despite Phi-2 being not fine-tuned for chat or instruction-following). We note however that it is not fully an apple-to-apple comparison with the Gemini Ultra\u2019s output described in the Gemini report, in particular in the latter case the student\u2019s answer was given as an image with handwritten text rather than raw text in our case.<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training data curation make it ideal for exploration around mechanistic interpretability, safety improvements, and fine-tuning experimentation on a variety of tasks.<\/p>\n","protected":false},"author":42183,"featured_media":991311,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-991293","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[968280],"related-researchers":[{"type":"user_nicename","value":"Mojan Javaheripi","user_id":42777,"display_name":"Mojan Javaheripi","author_link":"Mojan Javaheripi<\/a>","is_active":false,"last_first":"Javaheripi, Mojan","people_section":0,"alias":"mojavaheripi"}],"msr_type":"Post","featured_image_thumbnail":" ","byline":" Mojan Javaheripi<\/a> and S\u00e9bastien Bubeck","formattedDate":"December 12, 2023","formattedExcerpt":"Phi-2 is now accessible on the Azure model catalog. Its compact size and new innovations in model scaling and training data curation make it ideal for exploration around mechanistic interpretability, safety improvements, and fine-tuning experimentation on a variety of tasks.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/991293"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=991293"}],"version-history":[{"count":26,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/991293\/revisions"}],"predecessor-version":[{"id":993270,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/991293\/revisions\/993270"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/991311"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=991293"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=991293"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=991293"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=991293"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=991293"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=991293"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=991293"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=991293"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=991293"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=991293"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=991293"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}