{"id":1172603,"date":"2026-03-17T10:15:42","date_gmt":"2026-03-17T17:15:42","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-video&#038;p=1172603"},"modified":"2026-05-20T10:47:36","modified_gmt":"2026-05-20T17:47:36","slug":"q-learning-with-flow-matching-policies","status":"publish","type":"msr-video","link":"https:\/\/www.microsoft.com\/en-us\/research\/video\/q-learning-with-flow-matching-policies\/","title":{"rendered":"Q-learning with Flow-Matching Policies"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Expressive policies such as diffusion and flow-matching policies have recently driven progress in robotic manipulation because they can model complex action distributions and generalize from just a handful of demonstrations. But most are still trained purely with supervised imitation learning. Optimizing them with off-policy reinforcement learning remains challenging, which limits real-world applicability for tasks that require online self-improvement and adaptations. In this talk, I will discuss approaches for making off-policy RL work with flow-matching policies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading h5\" id=\"speaker-bio\">Speaker bio<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Qiyang (Colin) Li is a PhD student at UC Berkeley advised by Prof. Sergey Levine. His research interests include reinforcement learning and robot learning, with a focus on leveraging offline prior experience for online exploration. Before that, he was an undergraduate student at the University of Toronto advised by Prof. Roger Grosse.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Expressive policies such as diffusion and flow-matching policies have recently driven progress in robotic manipulation because they can model complex action distributions and generalize from just a handful of demonstrations. But most are still trained purely with supervised imitation learning. Optimizing them with off-policy reinforcement learning remains challenging, which limits real-world applicability for tasks that [&hellip;]<\/p>\n","protected":false},"featured_media":1172604,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_hide_image_in_river":0,"footnotes":""},"research-area":[13556],"msr-video-type":[270340],"msr-locale":[268875],"msr-post-option":[],"msr-session-type":[],"msr-impact-theme":[],"msr-pillar":[],"msr-episode":[],"msr-research-theme":[],"class_list":["post-1172603","msr-video","type-msr-video","status-publish","has-post-thumbnail","hentry","msr-research-area-artificial-intelligence","msr-video-type-msr-new-england-generative-modeling-sampling-seminar","msr-locale-en_us"],"msr_download_urls":"","msr_external_url":"https:\/\/youtu.be\/jwRIibXzvbM","msr_secondary_video_url":"","msr_video_file":"http:\/\/0","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1172603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-video"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1172603\/revisions"}],"predecessor-version":[{"id":1172743,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video\/1172603\/revisions\/1172743"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1172604"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1172603"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1172603"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=1172603"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1172603"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1172603"},{"taxonomy":"msr-session-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-session-type?post=1172603"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1172603"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1172603"},{"taxonomy":"msr-episode","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-episode?post=1172603"},{"taxonomy":"msr-research-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-theme?post=1172603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}