{"id":995529,"date":"2024-01-05T08:02:22","date_gmt":"2024-01-05T16:02:22","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=995529"},"modified":"2025-03-31T12:37:02","modified_gmt":"2025-03-31T19:37:02","slug":"afmr-benchmarks-evaluation-and-measurement","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/afmr-benchmarks-evaluation-and-measurement\/","title":{"rendered":"AFMR: Benchmarks, Evaluation and Measurement"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\"white\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t<\/span>\n\t\t\t\t\t\t\t\t\tAccelerating Foundation Models Research\t\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

Benchmarks, Evaluation and Measurement<\/h1>\n\n\n\n

<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n

\n

Academic research plays such an important role in advancing science, technology, culture, and society. This grant program helps ensure this community has access to the latest and leading AI models.<\/em><\/strong><\/p>\nBrad Smith, Vice Chair and President<\/cite><\/blockquote>\n\n\n\n

\n
<\/div>\n\n\n\n
\n
\"green<\/figure>\n\n\n\n

AFMR Goal: Align AI with shared human goals, values, and preferences via research on models<\/h2>\n\n\n\n

which enhances safety, robustness, sustainability, responsibility, and transparency, while ensuring rapid progress can be measured via new evaluation methods<\/p>\n<\/div>\n\n\n\n

<\/div>\n<\/div>\n\n\n\n
\n\t\n\t
\n\t\t
\n\t\t\t
<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n

Evaluating the functionality, efficiency, and reliability of language models is the main theme of this group of research projects. They cover various scenarios and applications, such as assessing the models\u2019 comprehension, processing, and generation of better responses, with topics such as uncertainty quantification, abstraction & reasoning, knowledge distillation, structured pruning, and skills-based framework; developing the models\u2019 instruction-following ability, task-agnostic distillation, and sequential planning skills.<\/p>\n\n\n\n

<\/div>\n\n\n\n\n\n

University of Maryland, Baltimore<\/strong>: Tejas Gokhale (PI)<\/p>\n\n\n\n

This proposal aims to develop a framework for quantifying the typicality of AI-generated images compared to natural photographs. The framework will produce scores indicating how typical an image is based on various levels of visual understanding, such as signal-level, geometric, photometric, semantic, and contextual typicality. The approach involves creating interpretable and explicable metrics to identify atypical images and explain why they are considered atypical. The expected outcomes include robust methods for out-of-distribution detection, deepfake detection, and enhancing trust in generative AI. The project will be open-sourced, providing tools and benchmarks for the broader research community.<\/p>\n\n\n\n\n\n

Stanford University<\/strong>: Christopher R\u00e9 (PI)<\/p>\n\n\n\n

This proposal suggests utilizing a skills-based framework to understand how foundation models acquire different capabilities from training data. This framework will then be used to select and order data to improve the performance of these models. The central question of the research is how best to understand the properties of skills in terms of scaling, data, and model architecture in order to develop a skills-based training paradigm for foundation models.<\/p>\n\n\n\n

Related paper:<\/strong><\/p>\n\n\n\n