MindAgent:Emerging Gaming Interaction

We collaborate with X-Box and Mesh team, explored a new gaming infrastructure and designed the dynamic real-time system for human-player and NPCs with GPT-X in the multi-agent platform.

GitHub: MindAgent (opens in new tab)

ArXiv: https://arxiv.org/abs/2309.09971 (opens in new tab)

Demo: MindAgent.mp4 (opens in new tab)

人员

Hoi Vo的肖像

Hoi Vo

TECHNICAL FELLOW

Xbox Emerging Technologies

Steven Gong的肖像

Steven Gong

Internship

UCLA, MSR

Zane Durante的肖像

Zane Durante

Internship

Stanford, MSR

Yusuke Noda的肖像

Yusuke Noda

PRINCIPAL SOFTWARE ENGINEER

Microsoft Gaming-Xbox Team

Song-chun Zhu的肖像

Song-chun Zhu

Professor

UCLA

Demetri Terzopoulos的肖像

Demetri Terzopoulos

Chancellor's Professor

UCLA

Fei-Fei Li的肖像

Fei-Fei Li

Professor

Stanford University

Jianfeng Gao的肖像

Jianfeng Gao

Distinguished Scientist & Vice President

We are very excited to share the good news. Our project “MindAgent: Emergent Gaming Interaction (opens in new tab)” is public recently. We seek to develop a unified interaction infrastructure and architecture that can jointly: understand large language corpora, visual (image and video) inputs, as well as provide meaningful action-based outputs.  Our model on a broad range of gaming video tasks and show agent action stream efficacy across a range of tasks including interactive agent, visual and natural language understanding. In this work, we propose a novel infrastructure – MindAgent – to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CuisineWorld, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CuisineWorld and adapted in existing broader Minecraft gaming domain. By creating a powerful and general-purpose foundation model with visual, language, and action capabilities, we can have great impact across many industries, both within Microsoft and external.

minecraft vr demo – YouTube (opens in new tab)