{"id":1095585,"date":"2024-11-01T09:16:43","date_gmt":"2024-11-01T16:16:43","guid":{"rendered":""},"modified":"2024-11-01T09:16:47","modified_gmt":"2024-11-01T16:16:47","slug":"research-focus-week-of-october-28-2024","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/research-focus-week-of-october-28-2024\/","title":{"rendered":"Research Focus: Week of October 28, 2024"},"content":{"rendered":"\n

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code\/datasets, new hires and other milestones from across the research community at Microsoft.<\/p><\/blockquote><\/figure>\n\n\n\n

\"Research<\/figure>\n\n\n\n

NEW RESEARCH<\/h2>\n\n\n\n

FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents<\/h3>\n\n\n\n

Cloud incidents such as unplanned interruptions or performance degradation can reduce customer satisfaction and revenue. Recurring incidents, typically raised by system monitors, allow for timely resolution, but also demand significant human effort for troubleshooting. Automating the diagnosis of recurring incidents would help minimize service downtime, reduce customer impact, and decrease manual labor.<\/p>\n\n\n\n

In a recent paper: FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents<\/a>, researchers from Microsoft present an approach that significantly improves diagnostic accuracy. LLM-based agent approaches have proven effective in handling complex tasks requiring multiple logical steps, but still present reliability issues, because they lack specific diagnostic knowledge. FLASH incorporates status supervision to break down complex instructions into manageable pieces aligned with identified status. The researchers generate hindsight using LLMs from past failure experiences, progressively enhancing diagnostic reliability for subsequent incidents. An extensive study of over 250 production incidents from Microsoft in five different workflow automation scenarios shows that the FLASH agent approach outperforms state-of-the-art agent models by an average of 13.2% in terms of accuracy. This underscores the viability of automating the diagnostic process for recurring incidents.\u00a0<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h2>\n\n\n\n

METAREFLECTION: Learning Instructions for Language Agents using Past Reflections<\/h3>\n\n\n\n

Language agents are AI systems that can understand, reason and respond in natural language to complete various tasks. While the latest LLMs are capable enough to power reasonably good language agents, the closed-API model makes it hard to improve them when they perform sub-optimally. Recent studies have explored using techniques like self-reflection and prompt optimization to improve performance. Unfortunately, self-reflection can be used only during the agent\u2019s current run, while contemporary prompt optimization techniques are designed and tested to work on simple single-step agents.<\/p>\n\n\n\n

In a recent paper: METAREFLECTION: Learning Instructions for Language Agents using Past Reflections<\/a>, researchers from Microsoft introduce a novel offline reinforcement learning technique that enhances the performance of language agents by augmenting a semantic memory based on experiential learnings from past trials. They demonstrate the efficacy of METAREFLECTION across multiple domains, including complex logical reasoning, biomedical semantic similarity, open world question answering, and vulnerability threat detection, in Infrastructure-as-Code, spanning different agent designs. METAREFLECTION boosts language agents\u2019 performance by 4% to 16.82% over the baseline agent implementations and performs on par with existing state-of-the-art prompt optimization techniques while requiring fewer LLM calls.\u00a0<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\tSpotlight: blog post<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"GraphRAG\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

GraphRAG auto-tuning provides rapid adaptation to new domains<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tRead more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

NEW RESEARCH<\/h2>\n\n\n\n

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping<\/h3>\n\n\n\n

Generative AI applications rely on large, foundation models, particularly LLMs. LLMs often have tens to hundreds of billions of parameters, making them too large for a single graphics processing unit (GPU) to handle in terms of both memory and computation. Because of their size, training these models requires distributing the workload across hundreds or even thousands of GPUs. This can lead to significant communication overhead, a challenge that arises when data needs to be shared between different GPUs.\u00a0<\/p>\n\n\n\n

In a recent paper: Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping<\/a>, researchers from Microsoft introduce a system designed to enhance the efficiency of LLM training by reducing the time lost to communication between GPUs.\u00a0<\/p>\n\n\n\n

Domino breaks down data dependencies in a single batch of training into smaller, independent pieces. These smaller pieces are processed in parallel, and communication between GPUs happens simultaneously with computation, minimizing delays.\u00a0<\/p>\n\n\n\n

Test results comparing Domino to Megatron-LM show that Domino speeds up the training process by up to 1.3x on Nvidia DGX-H100 GPUs.\u00a0<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h2>\n\n\n\n

Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition<\/h3>\n\n\n\n

Data science involves large datasets, source code, domain expertise, and unwritten assumptions. Data scientists describe the need to \u201chave a conversation\u201d with their data to extract information from it. The natural language processing and code generation capabilities of large language models (LLMs) could help tackle the challenging task of data analysis, which requires expertise in data processing, programming, and statistics.\u00a0 AI chat interfaces for data analysis have grown in popularity. However, in a recent paper: Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task Decomposition<\/a>, researchers from Microsoft and the University of Toronto show serious challenges in verifying AI-generated results and guiding AI systems to produce the desired output.\u00a0<\/p>\n\n\n\n

The researchers developed two contrasting approaches to address these challenges. The first (Stepwise) decomposes the problem into step-by-step subgoals with pairs of editable assumptions and code until task completion. The second approach (Phasewise) decomposes the entire problem into three editable, logical phases: structured input\/output assumptions, execution plan, and code. A controlled, within-subjects experiment compared these systems against a conversational baseline. Users reported significantly greater control with the Stepwise and Phasewise systems, and found intervention, correction, and verification easier, compared to the baseline. The results suggest design guidelines and trade-offs for AI-assisted data analysis tools.\u00a0<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h2>\n\n\n\n

OmniParser for pure vision-based GUI agent<\/h3>\n\n\n\n

Large vision-language models (VLMs) such as GPT-4V and GPT-4o show promise in driving intelligent agent systems that operate within user interfaces (UI). However, VLMs\u2019 full potential remains underexplored in real-world applications, particularly when it comes to acting as general agents across diverse operating systems and applications with only vision input. One limiting factor is the absence of a robust technique for screen parsing which is capable of 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associating the intended action with the corresponding region on the screen.\u00a0<\/p>\n\n\n\n

In a recent article: OmniParser for pure vision-based GUI agent<\/a>, researchers from Microsoft present a compact screen parsing module that can convert UI screenshots into structured elements. OmniParser can be used with a variety of models to create agents capable of taking actions on UIs. When used with GPT-4V, OmniParser significantly improves the agent capability to generate precisely grounded actions for interface regions.\u00a0<\/p>\n\n\n\n

OmniParser with GPT-4V agent achieved the best performance on the recently released \u202fWindowsAgentArena (opens in new tab)<\/span><\/a>\u202fbenchmark.\u00a0<\/p>\n\n\n\n

\n
Read more<\/a><\/div>\n\n\n\n
Download OmniParser<\/a><\/div>\n<\/div>\n\n\n\n
\n\t\n\t
\n\t\t
\n\t\t\t
\n\t
\n\n\t\t\n\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t<\/div>\n\t<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n","protected":false},"excerpt":{"rendered":"

New Research | FLASH: Workflow automation agent for diagnosing recurring incidents; METAREFLECTION: Learning instructions for language agents using past reflections; Boosting LLM training efficiency through faster communication between GPUs; and more.<\/p>\n","protected":false},"author":43518,"featured_media":1099569,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13554,13560,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1095585","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-computer-interaction","msr-research-area-programming-languages-software-engineering","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561,992148],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[663303,793670],"related-projects":[678390],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Xuchao Zhang","user_id":42045,"display_name":"Xuchao Zhang","author_link":"Xuchao Zhang<\/a>","is_active":false,"last_first":"Zhang, Xuchao","people_section":0,"alias":"xuchaozhang"},{"type":"user_nicename","value":"Chetan Bansal","user_id":31394,"display_name":"Chetan Bansal","author_link":"Chetan Bansal<\/a>","is_active":false,"last_first":"Bansal, Chetan","people_section":0,"alias":"chetanb"},{"type":"user_nicename","value":"Rujia Wang","user_id":42549,"display_name":"Rujia Wang","author_link":"Rujia Wang<\/a>","is_active":false,"last_first":"Wang, Rujia","people_section":0,"alias":"rujiawang"},{"type":"user_nicename","value":"Minghua Ma","user_id":41218,"display_name":"Minghua Ma","author_link":"Minghua Ma<\/a>","is_active":false,"last_first":"Ma, Minghua","people_section":0,"alias":"minghuama"},{"type":"user_nicename","value":"Saravan Rajmohan","user_id":41039,"display_name":"Saravan Rajmohan","author_link":"Saravan Rajmohan<\/a>","is_active":false,"last_first":"Rajmohan, Saravan","people_section":0,"alias":"saravar"},{"type":"user_nicename","value":"Priyanshu Gupta","user_id":42237,"display_name":"Priyanshu Gupta","author_link":"Priyanshu Gupta<\/a>","is_active":false,"last_first":"Gupta, Priyanshu","people_section":0,"alias":"priyansgupta"},{"type":"user_nicename","value":"Ananya Singha","user_id":43440,"display_name":"Ananya Singha","author_link":"Ananya Singha<\/a>","is_active":false,"last_first":"Singha, Ananya","people_section":0,"alias":"ananyasingha"},{"type":"user_nicename","value":"Sumit Gulwani","user_id":33755,"display_name":"Sumit Gulwani","author_link":"Sumit Gulwani<\/a>","is_active":false,"last_first":"Gulwani, Sumit","people_section":0,"alias":"sumitg"},{"type":"user_nicename","value":"Arjun Radhakrishna","user_id":39405,"display_name":"Arjun Radhakrishna","author_link":"Arjun Radhakrishna<\/a>","is_active":false,"last_first":"Radhakrishna, Arjun","people_section":0,"alias":"arradha"},{"type":"user_nicename","value":"Gustavo Soares","user_id":39183,"display_name":"Gustavo Soares","author_link":"Gustavo Soares<\/a>","is_active":false,"last_first":"Soares, Gustavo","people_section":0,"alias":"gsoares"},{"type":"user_nicename","value":"Guanhua Wang","user_id":42816,"display_name":"Guanhua Wang","author_link":"Guanhua Wang<\/a>","is_active":false,"last_first":"Wang, Guanhua","people_section":0,"alias":"guanhuawang"},{"type":"user_nicename","value":"Olatunji Ruwase","user_id":33157,"display_name":"Olatunji Ruwase","author_link":"Olatunji Ruwase<\/a>","is_active":false,"last_first":"Ruwase, Olatunji","people_section":0,"alias":"olruwase"},{"type":"user_nicename","value":"Jack Williams","user_id":40156,"display_name":"Jack Williams","author_link":"Jack Williams<\/a>","is_active":false,"last_first":"Williams, Jack","people_section":0,"alias":"johnwilliams"},{"type":"user_nicename","value":"Ian Drosos","user_id":41922,"display_name":"Ian Drosos","author_link":"Ian Drosos<\/a>","is_active":false,"last_first":"Drosos, Ian","people_section":0,"alias":"t-iandrosos"},{"type":"user_nicename","value":"Advait Sarkar","user_id":37146,"display_name":"Advait Sarkar","author_link":"Advait Sarkar<\/a>","is_active":false,"last_first":"Sarkar, Advait","people_section":0,"alias":"advait"},{"type":"user_nicename","value":"Yadong Lu","user_id":43611,"display_name":"Yadong Lu","author_link":"Yadong Lu<\/a>","is_active":false,"last_first":"Lu, Yadong","people_section":0,"alias":"yadonglu"},{"type":"user_nicename","value":"Jianwei Yang","user_id":40261,"display_name":"Jianwei Yang","author_link":"Jianwei Yang<\/a>","is_active":false,"last_first":"Yang, Jianwei","people_section":0,"alias":"jianwyan"},{"type":"user_nicename","value":"Ahmed Awadallah","user_id":31979,"display_name":"Ahmed Awadallah","author_link":"Ahmed Awadallah<\/a>","is_active":false,"last_first":"Awadallah, Ahmed","people_section":0,"alias":"hassanam"}],"msr_type":"Post","featured_image_thumbnail":"\"background","byline":"","formattedDate":"November 1, 2024","formattedExcerpt":"New Research | FLASH: Workflow automation agent for diagnosing recurring incidents; METAREFLECTION: Learning instructions for language agents using past reflections; Boosting LLM training efficiency through faster communication between GPUs; and more.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1095585"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43518"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1095585"}],"version-history":[{"count":22,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1095585\/revisions"}],"predecessor-version":[{"id":1099755,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1095585\/revisions\/1099755"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1099569"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1095585"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1095585"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1095585"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1095585"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1095585"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1095585"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1095585"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1095585"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1095585"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1095585"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1095585"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}