Blog de recherche Microsoft

Eureka: Evaluating and understanding progress in AI

septembre 17, 2024
How can we rigorously evaluate and understand state-of-the-art progress in AI? Eureka is an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. Learn more about the extended findings.
  1. A summary of insights extracted by using the Eureka framework, shown via two radar charts for multimodal (left) and language (right) capabilities respectively. The radar charts show the best and worst performance observed for each capability.

    Eureka: Evaluating and understanding progress in AI 

    septembre 17, 2024

    How can we rigorously evaluate and understand state-of-the-art progress in AI? Eureka is an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. Learn more about the extended findings.

  2. Research Focus | September 9, 2024

    Research Focus: Week of September 9, 2024 

    septembre 12, 2024

    Investigating vulnerabilities in LLMs; A novel total-duration-aware (TDA) duration model for text-to-speech (TTS); Generative expert metric system through iterative prompt priming; Integrity protection in 5G fronthaul networks:

  3. Decorative graphic with wavy shapes in the background in blues and purples. Text overlay in center left reads: “Research Focus: August 26, 2024”

    Research Focus: Week of August 26, 2024 

    août 28, 2024

    Learn what’s next for AI at Research Forum on Sept. 3;  WizardArena simulates human-annotated chatbot games; MInference speeds pre-filling for long-context LLMs via dynamic sparse attention; Reef: Fast succinct non-interactive zero-knowledge regex proofs.

  4. Research Focus: August 5, 2024

    Research Focus: Week of August 12, 2024 

    août 14, 2024

    In this issue: Research Forum Ep. 4 explores multimodal AI. Registration is now open; Surveying developers’ AI needs; SuperBench improves cloud AI infrastructure reliability; Virtual Voices: Exploring factors influencing participation in virtual meetings.

  5. Research Focus: July 22, 2024

    Research Focus: Week of July 29, 2024 

    juillet 31, 2024

    In this issue: Skeleton Posterior-guided OpTimization (SPOT) exhibits potential in various causal discovery tasks; Using visual imagery for an EEG-based brain–computer interface; Developing human-centered AI systems to assist creative professionals.

Explore More

Events & conferences

Events & conferences 

Meet our community of researchers, learn about exciting research topics, and grow your network

Podcasts

Podcasts 

Ongoing conversations at the cutting edge of research

Microsoft Research Forum

Microsoft Research Forum 

Join us for a continuous exchange of ideas about research in the era of general AI