{"id":995544,"date":"2024-01-05T08:03:11","date_gmt":"2024-01-05T16:03:11","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=995544"},"modified":"2024-10-23T10:34:02","modified_gmt":"2024-10-23T17:34:02","slug":"afmr-multimodal-and-crossmodal-learning","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/afmr-multimodal-and-crossmodal-learning\/","title":{"rendered":"AFMR: Multimodal and Crossmodal Learning"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"white\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t<\/span>\n\t\t\t\t\t\t\t\t\tAccelerating Foundation Models Research\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Multimodal and Crossmodal Learning<\/h1>\n\n\n\n

<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

\n

Academic research plays such an important role in advancing science, technology, culture, and society. This grant program helps ensure this community has access to the latest and leading AI models.<\/em><\/strong><\/p>\nBrad Smith, Vice Chair and President<\/cite><\/blockquote>\n\n\n\n

\n
<\/div>\n\n\n\n
\n
\"green<\/figure>\n\n\n\n

AFMR Goal: Align AI with shared human goals, values, and preferences via research on models<\/h2>\n\n\n\n

which enhances safety, robustness, sustainability, responsibility, and transparency, while ensuring rapid progress can be measured via new evaluation methods<\/p>\n<\/div>\n\n\n\n

<\/div>\n<\/div>\n\n\n\n
\n\t\n\t
\n\t\t
\n\t\t\t
<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n

The research projects focus on improving and applying Multi-Modal foundation models in various ways. Some projects focus on foundational aspects, such as enhancing the efficiency of foundational vision and language models, training audio-visual foundation models for tasks like segmentation and localization, and curating multimodal video datasets, and aligning multi-modal vision-language foundation models to understand their capabilities and limitations. Others address its applicational aspects, such as advancing traffic monitoring, geospatial data interaction, and predicting human mobility using Multi-Modal foundation models, enhancing video-based foundation models for reasoning, and addressing demographic bias in image generation by balancing model bias. This comprehensive approach ensures a broad and deep exploration of multimodal models and their potential applications.<\/p>\n\n\n\n

<\/div>\n\n\n\n\n\n

NC A&T State University<\/strong>: Leila Hashemi-Beni (PI) <\/p>\n\n\n\n

Effective traffic monitoring is critical for transportation agencies and city planners to understand traffic patterns, congestion, and safety hazards. This proposal outlines a project to develop an advanced AI-based traffic monitoring system. The system leverages Physics-Informed Neural Networks (PINN) to model traffic state and employs Generative Pre-trained Transformer (GPT) models to interpret user input and PINN model outputs. To train the PINN models, a high-resolution dataset collected by unmanned aerial vehicles (UAVs) will be utilized. The primary goal of this project is to create a highly accurate and efficient traffic monitoring system capable of identifying traffic states and computing traffic state parameters.<\/p>\n\n\n\n

Related paper:<\/strong><\/p>\n\n\n\n