{"id":1080684,"date":"2024-09-03T10:30:00","date_gmt":"2024-09-03T17:30:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&p=1080684"},"modified":"2024-09-03T12:38:30","modified_gmt":"2024-09-03T19:38:30","slug":"direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/direct-nash-optimization-teaching-language-models-to-self-improve-with-general-preferences\/","title":{"rendered":"Direct Nash Optimization: Teaching language models to self-improve with general preferences"},"content":{"rendered":"\n

Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers, discusses teaching language models to self-improve using a preference oracle like GPT-4, framing it as a two-player game to find an optimal policy at a Nash equilibrium, and achieving state-of-the-art win rates against GPT-4 Turbo on benchmarks such as Alpaca-Eval and MT-Bench.<\/p>\n","protected":false},"excerpt":{"rendered":"

Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers, discusses teaching language models to self-improve using a preference oracle like GPT-4, framing it as a two-player game to find an optimal policy at a Nash equilibrium, and achieving state-of-the-art win rates against GPT-4 Turbo on benchmarks such as Alpaca-Eval and MT-Bench.<\/p>\n","protected":false},"featured_media":1080687,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"research-area":[13556],"msr-video-type":[268311],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1080684","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-video-type-microsoft-research-forum","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/V04Q7YhEUzw","msr_secondary_video_url":"","msr_video_file":"","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1080684"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1080684\/revisions"}],"predecessor-version":[{"id":1082283,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1080684\/revisions\/1082283"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1080687"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1080684"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1080684"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=1080684"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1080684"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1080684"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1080684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}