{"id":972234,"date":"2023-10-05T09:00:00","date_gmt":"2023-10-05T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=972234"},"modified":"2024-06-10T09:47:36","modified_gmt":"2024-06-10T16:47:36","slug":"holoassist-a-multimodal-dataset-for-next-gen-ai-copilots-for-the-physical-world","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/holoassist-a-multimodal-dataset-for-next-gen-ai-copilots-for-the-physical-world\/","title":{"rendered":"HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world"},"content":{"rendered":"\n<p class=\"has-text-align-center\"><strong><em>This research paper was presented at the <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/iccv2023.thecvf.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong><em>2023 IEEE\/CVF International Conference on Computer Vision<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em> (ICCV), a premier academic conference for computer vision.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1.png\" alt=\"\"ICCV23 PARIS\" to the left of a picture of the first page of the HoloAssist publication on a blue and purple gradient background.\" class=\"wp-image-972549\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1.png 1400w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1280x720.png 1280w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<p>When was the last time you were faced with a task you had no clue how to tackle? Maybe it was fixing a broken bike, replacing a printer toner, or making a cup of espresso? In such circumstances, your usual options might include reaching out to a knowledgeable friend or relative for assistance. Alternatively, you might resort to scouring the internet, conducting a web search, posing questions on online forums, or seeking out relevant instructional videos. But what if there were another option? What if you could turn to an AI assistant, or <em>copilot<\/em>, for help?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ai-in-the-real-world\">AI in the real world<\/h2>\n\n\n\n<p>Our daily lives are filled with a wide range of tasks, both for work and leisure, spanning the digital and physical realms. We often find ourselves in need of guidance to learn and carry out these tasks effectively. Recent advances in AI, particularly in the areas of large language and multimodal models, have given rise to intelligent digital agents. However, when it comes to the physical world, where we perform a significant number of our tasks, AI systems have historically faced greater challenges.&nbsp;<\/p>\n\n\n\n<p>A longstanding aspiration within the AI community has been to develop an interactive AI assistant capable of perceiving, reasoning, and collaborating with people in the real world. Whether it\u2019s scenarios like autonomous driving, robot navigation and manipulation, hazard detection in industrial settings, or support and guidance for mixed-reality tasks, progress in physical activities has been slower and more incremental compared with their fully digital counterparts.<\/p>\n\n\n\n<div style=\"height:20px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1045002\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: Event<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/cvpr-2024\/\" aria-label=\"Microsoft at CVPR 2024\" data-bi-cN=\"Microsoft at CVPR 2024\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/05\/CVPR_WebBanner_1920x720.png\" alt=\"teal background triangular pattern\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft at CVPR 2024<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p class=\"large\">Microsoft is a proud sponsor and active participant of\u00a0<a href=\"https:\/\/cvpr2023.thecvf.com\/Conferences\/2024\" target=\"_blank\" rel=\"noreferrer noopener\">CVPR 2024<\/a>, which focuses on advancements in computer vision and pattern recognition. <\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/cvpr-2024\/\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" aria-label=\"Learn more\" data-bi-cN=\"Microsoft at CVPR 2024\" target=\"_blank\">\n\t\t\t\t\t\t\tLearn more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"the-promise-and-challenge-of-interactive-ai-copilots\">The promise and challenge of interactive AI copilots<\/h2>\n\n\n\n<p>There is great potential for developing interactive AI copilots to assist people with real-world tasks, but there are also obstacles. The key challenge is that current state-of-the-art AI assistants lack firsthand experience in the physical world. Consequently, they cannot perceive the state of the real world and actively intervene when necessary. This limitation stems from a lack of training on the specific data required for perception, reasoning, and modeling in such scenarios. In terms of AI development, there\u2019s a saying that \u201cdata is king.\u201d This challenge is no exception. To advance interactive AI agents for physical tasks, we must thoroughly understand the problem domain and establish a gold standard for copilots\u2019 capabilities. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"a-new-multimodal-interactive-dataset\">A new multimodal interactive dataset<\/h2>\n\n\n\n<p>As a first step in this direction, we are excited to share our paper, \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/holoassist-an-egocentric-human-interaction-dataset-for-interactive-ai-assistants-in-the-real-world\/\">HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/iccv2023.thecvf.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">ICCV 2023<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. HoloAssist is a large-scale egocentric, or first-person, human interaction dataset, where two people collaboratively execute physical manipulation tasks. A task performer executes a task while wearing a mixed-reality headset that captures seven synchronized data streams, as shown in Figure 1. Simultaneously, a task instructor observes the performer\u2019s first-person video feed in real time and offers verbal instruction.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"498\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig1.png\" alt=\"An image illustrating the setup for the HoloAssist dataset, which features a two-person interactive assistive task-completion setting.  A task-performer is wearing a mixed reality headset while an instructor watches the first-person video feed and provides instructions.  Eight modalities are captured, RGB, eye gaze, hand pose, head pose, depth, IMU, audio, text transcription. \" class=\"wp-image-972576\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig1.png 480w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig1-289x300.png 289w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig1-173x180.png 173w\" sizes=\"(max-width: 480px) 100vw, 480px\" \/><figcaption class=\"wp-element-caption\">Figure 1: HoloAssist features a two-person interactive assistive task-completion setting.<\/figcaption><\/figure>\n\n\n\n<p>HoloAssist contains a large collection of data, comprising 166 hours of recordings involving 222 diverse participants. These participants form 350 distinct instructor-performer pairs carrying out 20 object-centric manipulation tasks. Video 1 shows how tasks are recorded, while Figure 2 provides a task breakdown. The objects range from common electronic devices to rarer items found in factories and specialized labs. The tasks are generally quite demanding, often requiring instructor assistance for successful completion. To provide comprehensive insights, we\u2019ve captured seven different raw sensor modalities: RGB, depth, head pose, 3D hand pose, eye gaze, audio, and IMU. These modalities help in understanding human intentions, estimating world states, predicting future actions, and more. Finally, the eighth modality is an augmentation with third-person manual annotations, consisting of a text summary, intervention types, mistake annotations, and action segments, as illustrated in Figure 3.<\/p>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video controls src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/holoassist4kzoom.mp4\"><\/video><figcaption class=\"wp-element-caption\">Video 1: A sampling of task recordings showcasing color and depth, two of the eight modalities.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"263\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig2.png\" alt=\"Data distribution captured in HoloAssist. On the left, the number of sessions per activity, and on the right, the total length of sessions in minutes. There are 20 tasks: GoPro, Nintendo Switch, DSLR, portable printer, computer, Nespresso machine, standalone printer, big coffee machine, IKEA furniture (stool, utility cart, tray table, nightstand), NavVis laser scanner, ATV motorcycle, wheel belt, and circuit breaker.  There are between 25 and 180 sessions per activity and sessions range from 47 to 1390 minutes. \" class=\"wp-image-972573\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig2.png 624w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig2-300x126.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig2-240x101.png 240w\" sizes=\"(max-width: 624px) 100vw, 624px\" \/><figcaption class=\"wp-element-caption\">Figure 2: Data distribution captured in HoloAssist. On the left, the number of sessions per activity. On the right, the total session length in minutes.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"414\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig3.png\" alt=\"HoloAssist includes action and conversational annotations and provides summaries of videos indicating mistakes and interventions during tasks. Each action is tagged with a \u201cmistake\u201d or \u201ccorrect\u201d attribute, while spoken statements are labeled with intervention types.  The image shows examples of each of these. \" class=\"wp-image-972570\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig3.png 480w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig3-300x259.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Fig3-209x180.png 209w\" sizes=\"(max-width: 480px) 100vw, 480px\" \/><figcaption class=\"wp-element-caption\">Figure 3: HoloAssist includes action and conversational annotations, and it also provides summaries of videos indicating mistakes and interventions during tasks. Each action is tagged with a \u201cmistake\u201d or \u201ccorrect\u201d attribute, while spoken statements are labeled with intervention types.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"towards-proactive-ai-assistants\">Towards proactive AI assistants<\/h2>\n\n\n\n<p>Our work builds on previous advancements in egocentric vision and embodied AI. Unlike earlier datasets, such as those listed in Table 1, HoloAssist stands out due to its multi-person, interactive task-execution setting. Human interaction during task execution provides a valuable resource for designing AI assistants that are anticipatory and proactive that can provide precisely timed instructions that are grounded in the environment, in contrast with current \u201cchat-based\u201d AI assistants that wait for you to ask a question. This unique scenario is ideal for developing assistive AI agents and complements existing datasets, which contribute rich knowledge and representation.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"508\" height=\"325\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Table1.png\" alt=\"The table shows a comparison of nine related datasets and simulation platforms and for each dataset the setting, whether it is collaborative and interactive, instructional and procedural, and the number of hours of video.  HoloAssist features a multi-person assistive setting which is a unique addition to existing first-person (egocentric) datasets. \" class=\"wp-image-972567\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Table1.png 508w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Table1-300x192.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/HoloAssist_Table1-240x154.png 240w\" sizes=\"(max-width: 508px) 100vw, 508px\" \/><figcaption class=\"wp-element-caption\">Table 1: Comparison of related datasets and simulation platforms. HoloAssist features a multi-person assistive setting, which is a unique addition to existing egocentric (first-person) datasets.<\/figcaption><\/figure>\n\n\n\n<p>Finally, we evaluated the dataset&#8217;s performance on action classification and anticipation tasks, providing empirical results that shed light on the role of different modalities in various tasks. With this dataset, we introduce new tasks and benchmarks focused on mistake detection, intervention type prediction, and 3D hand pose forecasting, all crucial elements for developing intelligent assistants.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"looking-forward\">Looking forward<\/h2>\n\n\n\n<p>This work represents an initial step in broader research that explores how intelligent agents can collaborate with humans in real-world tasks. We&#8217;re excited to share this work and our dataset with the community and anticipate numerous future directions, such as annotating object poses, investigating object-centric models of affordance and manipulations in AI assistance, and AI-assisted planning and state tracking, among others. We believe HoloAssist, along with its associated benchmarks and tools, will benefit future research endeavors focused on building powerful AI assistants for real-world everyday tasks. You can access the HoloAssist dataset and code on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/holoassist.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"contributors\">Contributors<\/h3>\n\n\n\n<p>Taein Kwon, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mahdirad\/\">Mahdi Rad<\/a>, Bowen Pan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ischakra\/\">Ishani Chakraborty<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sandrist\/\">Sean Andrist<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dbohus\/\">Dan Bohus<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ashleyf\/\">Ashley Feniello<\/a>, Bugra Tekin, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fevieira\/\">Felipe Vieira Frujeri<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mapoll\/\">Marc Pollefeys<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks.<\/p>\n","protected":false},"author":42183,"featured_media":972549,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-972234","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565,602418,992148],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Xin Wang","user_id":41146,"display_name":"Xin Wang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wanxin\/\" aria-label=\"Visit the profile page for Xin Wang\">Xin Wang<\/a>","is_active":false,"last_first":"Wang, Xin","people_section":0,"alias":"wanxin"},{"type":"user_nicename","value":"Neel Joshi","user_id":33073,"display_name":"Neel Joshi","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/neel\/\" aria-label=\"Visit the profile page for Neel Joshi\">Neel Joshi<\/a>","is_active":false,"last_first":"Joshi, Neel","people_section":0,"alias":"neel"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-960x540.png\" class=\"img-object-cover\" alt=\"&quot;ICCV23 PARIS&quot; to the left of a picture of the first page of the HoloAssist publication on a blue and purple gradient background.\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1024x576.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-240x135.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/10\/BLG_-ICCV-2023-BlogHeroFeature-1400x788-1.png 1400w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wanxin\/\" title=\"Go to researcher profile for Xin Wang\" aria-label=\"Go to researcher profile for Xin Wang\" data-bi-type=\"byline author\" data-bi-cN=\"Xin Wang\">Xin Wang<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/neel\/\" title=\"Go to researcher profile for Neel Joshi\" aria-label=\"Go to researcher profile for Neel Joshi\" data-bi-type=\"byline author\" data-bi-cN=\"Neel Joshi\">Neel Joshi<\/a>","formattedDate":"October 5, 2023","formattedExcerpt":"HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/972234"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42183"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=972234"}],"version-history":[{"count":40,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/972234\/revisions"}],"predecessor-version":[{"id":1045071,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/972234\/revisions\/1045071"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/972549"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=972234"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=972234"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=972234"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=972234"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=972234"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=972234"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=972234"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=972234"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=972234"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=972234"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=972234"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}