{"id":945906,"date":"2023-06-05T19:34:45","date_gmt":"2023-06-06T02:34:45","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=945906"},"modified":"2024-06-10T09:36:18","modified_gmt":"2024-06-10T16:36:18","slug":"orca-progressive-learning-from-complex-explanation-traces-of-gpt-4","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4\/","title":{"rendered":"Orca: Progressive Learning from Complex Explanation Traces of GPT-4"},"content":{"rendered":"

Recent research has focused on enhancing the capability of smaller models through imitation\u00a0learning<\/span>, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from<\/span>\u00a0limited imitation signals\u00a0from<\/span> shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model’s capability as they tend to learn<\/span> to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca<\/span>, a 13-billion parameter model that\u00a0learns<\/span>\u00a0to imitate the reasoning process of LFMs.\u00a0Orca<\/span>\u00a0learns<\/span>\u00a0from<\/span>\u00a0rich signals\u00a0from<\/span>\u00a0GPT <\/span>4<\/span>\u00a0including\u00a0explanation<\/span>\u00a0traces<\/span>; step-by-step thought processes; and other\u00a0complex<\/span>\u00a0instructions, guided by teacher assistance\u00a0from<\/span> ChatGPT. To promote this progressive<\/span>\u00a0learning<\/span>, we tap into large-scale and diverse imitation data with judicious sampling and selection.\u00a0Orca<\/span> surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex<\/span>\u00a0zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover,\u00a0Orca<\/span> reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4<\/span> pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT<\/span>–4<\/span>. Our research indicates that\u00a0learning<\/span>\u00a0from<\/span>\u00a0step-by-step\u00a0explanations<\/span>, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.<\/p>\n","protected":false},"excerpt":{"rendered":"

Recent research has focused on enhancing the capability of smaller models through imitation\u00a0learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from\u00a0limited imitation signals\u00a0from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13545],"msr-publication-type":[193715],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694,246685,246808],"msr-conference":[],"msr-journal":[259054],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-945906","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us","msr-field-of-study-artificial-intelligence","msr-field-of-study-machine-learning","msr-field-of-study-natural-language-processing"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2023-6-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2306.02707.pdf","label_id":"243132","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2306.02707","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Subhabrata (Subho) Mukherjee","user_id":38308,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Subhabrata (Subho) Mukherjee"},{"type":"text","value":"Arindam Mitra","user_id":0,"rest_url":false},{"type":"text","value":"Ganesh Jawahar","user_id":0,"rest_url":false},{"type":"text","value":"Sahaj Agarwal","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Hamid Palangi","user_id":36344,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Hamid Palangi"},{"type":"user_nicename","value":"Ahmed Awadallah","user_id":31979,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ahmed Awadallah"}],"msr_impact_theme":[],"msr_research_lab":[199565,992148],"msr_event":[],"msr_group":[392600,702211],"msr_project":[983295],"publication":[],"video":[],"download":[],"msr_publication_type":"article","related_content":{"projects":[{"ID":983295,"post_title":"Orca","post_name":"orca","post_type":"msr-project","post_date":"2023-11-20 18:02:56","post_modified":"2024-07-31 10:51:04","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/orca\/","post_excerpt":"Using AI, to improve AI Orca is a research team in Microsoft Research. Orca focuses on creating automated pipelines for creating high-quality synthetic data at scale, and training\u00a0models for specialization and model self-improvement. Orca\u2019s research areas involve self-improvement strategies, feedback-driven teaching methods between large and small models to create high-quality synthetic data and using domain specific data to specialize LMs. Orca focuses on the following directions: In Orca, we recently released AgentInstruct, an extensible agentic…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/983295"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/945906"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/945906\/revisions"}],"predecessor-version":[{"id":1045053,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/945906\/revisions\/1045053"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=945906"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=945906"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=945906"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=945906"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=945906"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=945906"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=945906"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=945906"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=945906"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=945906"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=945906"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=945906"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=945906"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=945906"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=945906"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=945906"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}