fine-grained spatiotemporal reasoning, causal inference, long-horizon planning & memory, and robust tool-use<\/em>, and it convenes both academia and industry to discuss approaches, datasets, and benchmarks for robust agents that complete complex tasks \u201cin the wild.\u201d<\/p>\n\n\n\nThis year\u2019s edition emphasizes the intersection of LMMs and VLA models<\/em> and the full loop from representation to inference to decision-making, including structured reasoning strategies (e.g., chain-\/tree-of-thought, program-aided reasoning), long-horizon planning\/memory, and evaluation protocols that diagnose reasoning (not just recognition).<\/p>\n\n\n\nChallenges<\/h2>\n\n\n\n To measure progress with fine-grained evaluations and public leaderboards<\/em>, the workshop proposes two challenges:<\/p>\n\n\n\n\nMindCube (Spatial Mental Models under Partial Observability)<\/strong>\n\nEvaluates whether VLMs can form robust spatial mental models, by capturing positions<\/em>, orientations<\/em>, and counterfactual \u201cwhat-if\u201d dynamics<\/em>, from limited viewpoints.<\/li>\n<\/ul>\n<\/li>\n\n\n\nSITE (Standardized, Cross-modal Spatial Intelligence Thorough Evaluation)<\/strong>\n\nEvaluates spatial intelligence across single-image<\/em>, multi-image<\/em>, and video <\/em>modalities and across spatial factors (scale, visualization vs. orientation, intrinsic vs. extrinsic frames, static vs. dynamic).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\nCall for contributions<\/h2>\n\n\n\n\nSubmissions of published and unpublished works are welcome.<\/li>\n\n\n\n Accepted works will be presented as posters and spotlights at the workshop.<\/li>\n<\/ul>\n\n\n\nImportant dates<\/h3>\n\n\n\n\nWorkshop papers\n\nPaper submission deadline: April 21, 2026<\/strong><\/li>\n\n\n\nNotification: May 19, 2026<\/strong><\/li>\n\n\n\nCamera-ready deadline: June 2, 2026<\/strong><\/li>\n<\/ul>\n<\/li>\n\n\n\nChallenges\n\nStart date: January 21, 2026<\/li>\n\n\n\n End date: June 2, 2026<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<\/div>\n\n\n\n\n\n
Agenda (tentative)<\/h2>\n\n\n\nTime<\/th> Description<\/th><\/tr><\/thead> 13:00-13:30<\/td> Invited talk: Spatial Intelligence and Embodied AI<\/strong> Manling Li, Northwestern University<\/td><\/tr>13:30-14:00<\/td> Invited talk: Robotic perception, planning, and reasoning<\/strong> Chelsea Finn, Stanford & PI<\/td><\/tr>14:00-14:30<\/td> Workshop paper presentations<\/strong><\/td><\/tr>14:30-15:00<\/td> Afternoon break + poster session<\/strong><\/td><\/tr>15:00-15:30<\/td> Invited talk: Test-time scaling and reinforcement learning<\/strong> Xiaolong Wang, UCSD & Nvidia<\/td><\/tr>15:30-16:00<\/td> Invited talk: Multimodal reasoning and reward-driven video understanding<\/strong> Mohit Bansal, University North Carolina Chapel Hill<\/td><\/tr>16:00-16:30<\/td> Invited talk: Sam3 promptable concept segmentation<\/strong> Kate Saenko, Meta AGI Foundations<\/td><\/tr>16:30-17:30<\/td> Panel discussion and closing remarks<\/strong> Moderators: Zhengyuan Yang, Jianfeng Gao<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<\/div>\n\n\n","tab-content":[],"msr_startdate":"2026-06-03","msr_enddate":"2026-06-03","msr_event_time":"Mountain Standard Time (UTC -7)","msr_location":"Denver, Colorado, USA","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"Upcoming: June 3, 2026","msr_register_text":"Register now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Full workshop title: The 5th Workshop on Computer Vision in the Wild (CVinW): Towards Unified Multimodal Agents for Reasoning in the Wild Host conference: The Conference on Computer Vision and Pattern Recognition (CVPR) (opens in new tab) | June 3-4, 2026 Workshop organizers:\u00a0Reuben Tan, Zhengyuan Yang Workshop scientific advisor: Jianfeng Gao Speakers: The 5th CVinW workshop brings together researchers building multimodal AI agents that can perceive, reason, and act in digital and physical environments. The…","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/1160075","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/1160075\/revisions"}],"predecessor-version":[{"id":1160608,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/1160075\/revisions\/1160608"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1160075"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1160075"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1160075"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1160075"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=1160075"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1160075"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=1160075"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1160075"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1160075"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}