{"id":1007064,"date":"2024-02-12T10:46:26","date_gmt":"2024-02-12T18:46:26","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=1007064"},"modified":"2024-03-08T11:53:07","modified_gmt":"2024-03-08T19:53:07","slug":"embodied-agent-ai","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/embodied-agent-ai\/","title":{"rendered":"Agent AI Towards a Holistic Intelligence"},"content":{"rendered":"
Recent advancements in large foundational mod<\/span>els have remarkably enhanced our understanding <\/span>of sensory information in open-world environ<\/span>ments. At this pivotal moment, it is crucial to the <\/span>AI research trend toward excessive reductionism <\/span>and returning to the AI principles inspired by the <\/span>holistic philosophy of Aristotle. Specifically, we <\/span>emphasize developing \u201cAgent AI\u201d, an embodied <\/span>system that integrates large foundation models <\/span>into agent actions. The emerging field of Agent <\/span>AI spans a wide range of existing embodied and <\/span>agent-based multimodal interactions, including <\/span>robotics, gaming, and diagnostic systems. We em<\/span>phasize the importance of integrating recent large <\/span>foundational models to enhance intelligence and <\/span>interaction capabilities. Furthermore, we discuss <\/span>how agents exhibit remarkable capabilities across <\/span>a variety of domains and tasks, challenging our understanding of learning and cognition. This paper we aim to broaden the research community\u2019s perspective on achieving holistic intelligence, while highlighting the need for an integrated approach that considers the agent\u2019s purpose, functionality, and interaction. Finally, we reflect on a deeper discussion of these Agent AI topics from a mainstream and interdisciplinary perspective. This discussion illustrates AI cognition and consciousness within the scope of scientific discourse, and may serves as a basis for future research directions and social influences.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":" Recent advancements in large foundational models have remarkably enhanced our understanding of sensory information in open-world environments. At this pivotal moment, it is crucial to the AI research trend toward excessive reductionism and returning to the AI principles inspired by the holistic philosophy of Aristotle. Specifically, we emphasize developing \u201cAgent AI\u201d, an embodied system that […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13562,13554],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1007064","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-2-12","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2403.00833","label_id":"243109","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2403.00833.pdf","label_id":"243109","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2403.00833","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2403.00833.pdf","label_id":"243118","label":0}],"msr_attachments":[{"id":1010664,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/AgentAIposition.pdf"},{"id":1009638,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/AgentAI_p.pdf"},{"id":1008462,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Position_AgentAI.pdf"},{"id":1008459,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/AgentAI_position-65d4cb0de80b2.pdf"},{"id":1008420,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/AgentAI_position.pdf"},{"id":1008417,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Agent_AI_position-65d4b84493079.pdf"},{"id":1007076,"url":"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2024\/02\/Agent_AI_position.pdf"}],"msr-author-ordering":[{"type":"user_nicename","value":"Qiuyuan Huang","user_id":36356,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Qiuyuan Huang"},{"type":"user_nicename","value":"Naoki Wake","user_id":39916,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Naoki Wake"},{"type":"text","value":"Bidipta Sarkar","user_id":0,"rest_url":false},{"type":"text","value":"Zane Durante","user_id":0,"rest_url":false},{"type":"text","value":"Ran Gong","user_id":0,"rest_url":false},{"type":"text","value":"Rohan Taori","user_id":0,"rest_url":false},{"type":"guest","value":"yusuke-noda","user_id":969939,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=yusuke-noda"},{"type":"guest","value":"demetri-terzopoulos","user_id":969951,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=demetri-terzopoulos"},{"type":"text","value":"Noboru Kuno","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Ade Famoti","user_id":43005,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ade Famoti"},{"type":"user_nicename","value":"Ashley Llorens","user_id":39964,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ashley Llorens"},{"type":"user_nicename","value":"John Langford","user_id":32204,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=John Langford"},{"type":"guest","value":"hoi-vo","user_id":969933,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=hoi-vo"},{"type":"guest","value":"fei-fei-li","user_id":969957,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=fei-fei-li"},{"type":"user_nicename","value":"Katsushi Ikeuchi","user_id":32500,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Katsushi Ikeuchi"},{"type":"user_nicename","value":"Jianfeng Gao","user_id":32246,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Jianfeng Gao"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144931,668253],"msr_project":[788159],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":788159,"post_title":"Agent AI","post_name":"agent-ai","post_type":"msr-project","post_date":"2023-09-25 21:53:00","post_modified":"2024-02-28 07:03:22","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/agent-ai\/","post_excerpt":"Agent-based multimodal AI systems are becoming a ubiquitous presence in our everyday lives. A promising direction for making these systems more interactive is to embody them as agents within specific environments. The grounding of large foundation models to act as agents within specific environments can provide a way of incorporating visual and contextual information into an embodied system. For example, a system that can perceive user actions, human behavior, environment objects, audio expressions, and the…","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/788159"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064\/revisions"}],"predecessor-version":[{"id":1009701,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/1007064\/revisions\/1009701"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1007064"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=1007064"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=1007064"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1007064"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=1007064"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=1007064"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=1007064"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=1007064"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=1007064"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1007064"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1007064"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=1007064"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=1007064"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=1007064"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1007064"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1007064"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}