Research Focus: Week of June 10, 2024

Published

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: June 10, 2024

RELEVANCE: Automatic evaluation framework for LLM responses

Relevance in AI refers to the usefulness of information or actions to a specific task or query. It helps determine the accuracy, effectiveness, efficiency, and user satisfaction of content from search engines, chatbots, and other AI systems.

RELEVANCE (Relevance and Entropy-based Evaluation with Longitudinal Inversion Metrics) is a generative AI evaluation framework designed by researchers at Microsoft to automatically evaluate creative responses from large language models (LLMs). RELEVANCE combines custom tailored relevance assessments with mathematical metrics to ensure AI-generated content aligns with human standards and maintains consistency. Monitoring these metrics over time enables the automatic detection of when the LLM’s relevance evaluation starts to slip or hallucinate.

Custom relevance evaluation alone involves scoring responses based on predefined criteria. However, while these scores provide a direct assessment, they might not capture the full complexity and dynamics of response patterns over multiple evaluations or different sets of data (e.g. model hallucination and model slip). To address this issue, RELEVANCE integrates mathematical techniques with custom evaluations to ensure LLM response accuracy over time and adaptability to evolving LLM behaviors without involving manual review.


Recyclable vitrimer-based printed circuit boards for sustainable electronics

Printed circuit boards (PCBs) are ubiquitous in electronics and make up a substantial fraction of environmentally hazardous electronic waste when devices reach end-of-life. Their recycling is challenging due to their use of irreversibly cured thermoset epoxies in manufacturing. Researchers at Microsoft and the University of Washington aim to tackle this challenge, and potentially pave the way for sustainability transitions in the electronics industry. In a recent paper, published in Nature Sustainability: Recyclable vitrimer-based printed circuit boards for sustainable electronics, they present a PCB formulation using transesterification vitrimers (vPCBs) and an end-to-end fabrication process compatible with standard manufacturing ecosystems. This cradle-to-cradle life cycle assessment shows substantial environmental impact reduction of vPCBs over conventional PCBs in 11 categories. The team successfully manufactured functional prototypes of internet of things devices transmitting 2.4 GHz radio signals on vPCBs with electrical and mechanical properties meeting industry standards. Fractures and holes in vPCBs are repairable while retaining comparable performance over multiple repair cycles. The researchers also demonstrate a non-destructive recycling process based on polymer swelling with small-molecule solvents. Unlike traditional solvolysis recycling, this swelling process does not degrade the materials. A dynamic mechanical analysis finds negligible catalyst loss, minimal changes in storage modulus, and equivalent polymer backbone composition across multiple recycling cycles. This recycling process achieves 98% polymer recovery, 100% fiber recovery, and 91% solvent recovery to create new vPCBs without performance degradation, potentially paving the way to circularity in electronics.

Spotlight: Event

Inclusive Digital Maker Futures for Children via Physical Computing

This workshop will bring together researchers and educators to imagine a future of low-cost, widely available digital making for children, both within the STEAM classroom and beyond.

LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has reached billions of parameters, requiring large amounts of memory and resulting in significant inference latency, even on cutting edge AI-accelerators, such as graphics processing units (GPUs). Attempts to deliver the low latency demands of the applications relying on such large models do not cater to the computationally distinct nature of different phases during inference and thus fail to utilize the underlying hardware efficiently.

In a recent paper: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers, researchers from Microsoft propose a scalable technique of computing self-attention for the token-generation phase (decode-phase) of decoder-only transformer models. LeanAttention enables scaling the attention mechanism implementation for the challenging case of long context lengths by re-designing the execution flow for the decode-phase. The researchers show that the associative property of online softmax can be treated as a reduction operation, thus allowing them to parallelize the attention computation over these large context lengths. They extend the “stream-K” style reduction of tiled calculation to self-attention to enable the parallel computation, resulting in near 100% GPU utility and an average of 2.6x attention execution speedup over FlashAttention-2 and up to 8.33x speedup for 512k context lengths.


WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation

Recent research demonstrates that an LLM finetuned on a high-quality instruction dataset can obtain impressive abilities to address code-related tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality.

In a recent paper: WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation, researchers from Microsoft extend the generalization of instruction tuning by classifying the instruction data to four code-related tasks and propose an LLM-based generator-discriminator data process framework to generate diverse, high-quality instruction data from open source code. They introduce CodeSeaXDataset, a dataset comprising 19,915 instruction instances across four universal code-related tasks. In addition, they present WaveCoder, a fine-tuned code LLM with widespread and versatile enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of code LLMs. Their experiments show that WaveCoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, WaveCoder exhibits high efficiency in previous code generation tasks.


New course offers AutoGen training

DeepLearning.AI (opens in new tab), in collaboration with Microsoft and Penn State University, is offering a short training course: AI Agentic Design Patterns with AutoGen (opens in new tab), centered around the multi-agent framework for next-generation AI applications. Taught by AutoGen creators Chi Wang, principal researcher at Microsoft Research AI Frontiers, and Qingyun Wu, assistant professor at Penn State, the course explores how to use AutoGen to build and customize multi-agent systems, enabling agents to take on different roles and collaborate to accomplish complex tasks. You can learn more details in this video (opens in new tab).

AutoGen was designed to simplify the orchestration, optimization, and automation of LLM workflows, and is adopted widely as a generic programming framework for agentic AI. It offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.

Microsoft Research in the news

Superfast Microsoft AI is first to predict air pollution for the whole world 

Nature | June 4, 2004

An AI model developed by Microsoft can accurately forecast weather and air pollution for the whole world — and it does it in less than a minute. The model, called Aurora, also forecasts global weather for ten days.

Chatbot teamwork makes the AI dream work 

Wired | June 6, 2024

LLMs often stumble over math problems because they work by providing statistically plausible text rather than rigorous logical reasoning. Researchers from Microsoft show that having AI agents collaborate can mitigate that weakness.

1-bit LLMs Could Solve AI’s Energy Demands – IEEE Spectrum 

IEEE Spectrum |May 30, 2024

“One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs,” — Furu Wei, Microsoft Research.

Related publications

Continue reading

See all blog posts