{"id":1052379,"date":"2024-07-17T09:00:00","date_gmt":"2024-07-17T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1052379"},"modified":"2024-07-11T08:37:16","modified_gmt":"2024-07-11T15:37:16","slug":"research-focus-week-of-july-15-2024","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/research-focus-week-of-july-15-2024\/","title":{"rendered":"Research Focus: Week of July 15, 2024"},"content":{"rendered":"\n

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code\/datasets, new hires and other milestones from across the research community at Microsoft.<\/p><\/blockquote><\/figure>\n\n\n\n

\"Research<\/figure>\n\n\n\n

NEW RESEARCH<\/h3>\n\n\n\n

MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model<\/h2>\n\n\n\n

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In a recent article: MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model<\/a>, researchers from Microsoft present MG-TSD, a novel approach aimed at tackling this challenge.<\/p>\n\n\n\n

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models, yielding remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, the researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks, ranging from 4.7% to 35.8%.<\/p>\n\n\n\n

The paper introducing this research: MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process(opens in new tab) (opens in new tab)<\/span><\/a>, was presented at ICLR 2024 (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n

\n
Read the article<\/a><\/div>\n\n\n\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h3>\n\n\n\n

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference<\/h2>\n\n\n\n

Machine learning applications based on large language models (LLMs) have been widely deployed in consumer products. Increasing the model size and its training dataset have played an important role in this process. Since larger model size can bring higher model accuracy, it is likely that future models will also grow in size, which vastly increases the computational and memory requirements of LLMs.<\/p>\n\n\n\n

Mixture-of-Experts (MoE) architecture, which can increase model size without proportionally increasing computational requirements, was designed to address this challenge. Unfortunately, MoE\u2019s high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE\u2019s memory-hungry expert parameters to central processing unit (CPU) memory fall short.<\/p>\n\n\n\n

In a recent paper: Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference<\/a>, researchers from Microsoft address these challenges using algorithm-system co-design. Pre-gated MoE alleviates the dynamic nature of sparse expert activation, addressing the large memory footprint of MoEs while also sustaining high performance. The researchers demonstrate that pre-gated MoE improves performance, reduces graphics processing unit (GPU) memory consumption, and maintains model quality.<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n\t
\n\t\t\n\n\t\t

\n\t\tSpotlight: blog post<\/span>\n\t<\/p>\n\t\n\t

\n\t\t\t\t\t\t
\n\t\t\t\t\n\t\t\t\t\t\"GraphRAG\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t
\n\n\t\t\t\t\t\t\t\t\t

GraphRAG auto-tuning provides rapid adaptation to new domains<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

GraphRAG uses LLM-generated knowledge graphs to substantially improve complex Q&A over retrieval-augmented generation (RAG). Discover automatic tuning of GraphRAG for new datasets, making it more accurate and relevant.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t

\n\t\t\t\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t\t\tRead more\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t<\/div>\n\t<\/div>\n\t\n\n\n

NEW RESEARCH<\/h3>\n\n\n\n

What Matters in a Measure? A Perspective from Large-Scale Search Evaluation<\/h2>\n\n\n\n

Evaluation is a crucial aspect of information retrieval (IR) and has been thoroughly studied by academic and professional researchers for decades. Much of the research literature discusses techniques to produce a single number, reflecting the system\u2019s performance: precision or cumulative gain, for example, or dozens of alternatives. Those techniques\u2014metrics\u2014are themselves evaluated, commonly by reference to sensitivity and validity.<\/p>\n\n\n\n

To measure search in industry settings, many other aspects must be considered. For example, how much a metric costs; how robust it is to the happenstance of sampling; whether it is debuggable; and what is incentivized when a metric is taken as a goal. In a recent paper: What Matters in a Measure? A Perspective from Large-Scale Search Evaluation<\/a>, researchers from Microsoft discuss what makes a search metric successful in large-scale settings, including factors which are not often canvassed in IR research, but which are important in \u201creal-world\u201d use. The researchers illustrate this discussion with examples from industrial settings and elsewhere and offer suggestions for metrics as part of a working system.<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h3>\n\n\n\n

LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data<\/h2>\n\n\n\n

Partial differential equations (PDEs) are ubiquitous in mathematically-oriented scientific fields, such as physics and engineering. The ability to solve PDEs accurately and efficiently can empower deep understanding of the physical world. However, in many complex PDE systems, traditional solvers are too time-consuming. Recently, deep learning-based methods including neural operators have been successfully used to provide faster PDE solvers through approximating or enhancing conventional ones. However, this requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, also known as mean squared residual (MSR) loss constructed by the discretized PDE.<\/p>\n\n\n\n

In a recent paper: LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data<\/a>, researchers from Microsoft investigate the physical information in the MSR loss, or long-range entanglements. They identify the challenge: the neural network must model the long-range entanglements in the spatial domain of the PDE, whose patterns vary. To tackle the challenge, they propose LordNet, a tunable and efficient neural network for modeling various entanglements. Their tests show that Lordnet can be 40\u00d7 faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\n\n\n

NEW RESEARCH<\/h3>\n\n\n\n

FXAM: A unified and fast interpretable model for predictive analytics<\/h2>\n\n\n\n

Generalized additive model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serving predictive analytics in terms of both accuracy and training efficiency. In a recent paper: FXAM: A unified and fast interpretable model for predictive analytics<\/a>, researchers from Microsoft propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM\u2019s modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called three-stage iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. The researchers design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. They show that TSI is mathematically guaranteed to converge to the global optimum. They further propose a set of optimization techniques to speed up FXAM\u2019s training algorithm to meet the needs of interactive analysis.<\/p>\n\n\n\n

\n
Read the paper<\/a><\/div>\n<\/div>\n\n\n\n
\n\t\n\t
\n\t\t
\n\t\t\t
\n\t
\n\n\t\t\n\t\t
\n\t\t\t\t\t\t\n\t\t\t\t\t<\/div>\n\t<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n","protected":false},"excerpt":{"rendered":"

Advancing time series analysis with multi-granularity guided diffusion model; An algorithm-system co-design for fast, scalable MoE inference; What makes a search metric successful in large-scale settings; learning to solve PDEs without simulated data.<\/p>\n","protected":false},"author":37583,"featured_media":1052424,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13563,13555,13547],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[243984],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1052379","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-research-area-search-information-retrieval","msr-research-area-systems-and-networking","msr-locale-en_us","msr-post-option-blog-homepage-featured"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560,851467],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[267093,510017,879075],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Chang Xu","user_id":41107,"display_name":"Chang Xu","author_link":"Chang Xu<\/a>","is_active":false,"last_first":"Xu, Chang","people_section":0,"alias":"chanx"},{"type":"user_nicename","value":"Weiqing Liu","user_id":39300,"display_name":"Weiqing Liu","author_link":"Weiqing Liu<\/a>","is_active":false,"last_first":"Liu, Weiqing","people_section":0,"alias":"weiqiliu"},{"type":"user_nicename","value":"Jiang Bian","user_id":38481,"display_name":"Jiang Bian","author_link":"Jiang Bian<\/a>","is_active":false,"last_first":"Bian, Jiang","people_section":0,"alias":"jiabia"},{"type":"user_nicename","value":"Shijie Cao","user_id":40633,"display_name":"Shijie Cao","author_link":"Shijie Cao<\/a>","is_active":false,"last_first":"Cao, Shijie","people_section":0,"alias":"shijiecao"},{"type":"user_nicename","value":"Changho Hwang","user_id":41844,"display_name":"Changho Hwang","author_link":"Changho Hwang<\/a>","is_active":false,"last_first":"Hwang, Changho","people_section":0,"alias":"changhohwang"},{"type":"user_nicename","value":"Ting Cao","user_id":37446,"display_name":"Ting Cao","author_link":"Ting Cao<\/a>","is_active":false,"last_first":"Cao, Ting","people_section":0,"alias":"ticao"},{"type":"user_nicename","value":"Mao Yang","user_id":32798,"display_name":"Mao Yang","author_link":"Mao Yang<\/a>","is_active":false,"last_first":"Yang, Mao","people_section":0,"alias":"maoyang"},{"type":"user_nicename","value":"Paul Thomas","user_id":36042,"display_name":"Paul Thomas","author_link":"Paul Thomas<\/a>","is_active":false,"last_first":"Thomas, Paul","people_section":0,"alias":"pathom"},{"type":"user_nicename","value":"Nick Craswell","user_id":33088,"display_name":"Nick Craswell","author_link":"Nick Craswell<\/a>","is_active":false,"last_first":"Craswell, Nick","people_section":0,"alias":"nickcr"},{"type":"user_nicename","value":"Seth Spielman","user_id":43314,"display_name":"Seth Spielman","author_link":"Seth Spielman<\/a>","is_active":false,"last_first":"Spielman, Seth","people_section":0,"alias":"sethspielman"},{"type":"user_nicename","value":"Xiaotian Gao","user_id":39985,"display_name":"Xiaotian Gao","author_link":"Xiaotian Gao<\/a>","is_active":false,"last_first":"Gao, Xiaotian","people_section":0,"alias":"xiaog"},{"type":"user_nicename","value":"Xinran wei","user_id":41087,"display_name":"Xinran wei","author_link":"Xinran wei<\/a>","is_active":false,"last_first":"wei, Xinran","people_section":0,"alias":"weixinran"},{"type":"user_nicename","value":"Jia Zhang","user_id":41075,"display_name":"Jia Zhang","author_link":"Jia Zhang<\/a>","is_active":false,"last_first":"Zhang, Jia","people_section":0,"alias":"zhangjia"},{"type":"user_nicename","value":"Tie-Yan Liu","user_id":34431,"display_name":"Tie-Yan Liu","author_link":"Tie-Yan Liu<\/a>","is_active":false,"last_first":"Liu, Tie-Yan","people_section":0,"alias":"tyliu"},{"type":"user_nicename","value":"Justin Ding","user_id":32435,"display_name":"Justin Ding","author_link":"Justin Ding<\/a>","is_active":false,"last_first":"Ding, Justin","people_section":0,"alias":"juding"},{"type":"user_nicename","value":"Shi Han","user_id":33618,"display_name":"Shi Han","author_link":"Shi Han<\/a>","is_active":false,"last_first":"Han, Shi","people_section":0,"alias":"shihan"},{"type":"user_nicename","value":"Dongmei Zhang","user_id":31665,"display_name":"Dongmei Zhang","author_link":"Dongmei Zhang<\/a>","is_active":false,"last_first":"Zhang, Dongmei","people_section":0,"alias":"dongmeiz"}],"msr_type":"Post","featured_image_thumbnail":"\"Research","byline":"","formattedDate":"July 17, 2024","formattedExcerpt":"Advancing time series analysis with multi-granularity guided diffusion model; An algorithm-system co-design for fast, scalable MoE inference; What makes a search metric successful in large-scale settings; learning to solve PDEs without simulated data.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1052379"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/37583"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1052379"}],"version-history":[{"count":13,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1052379\/revisions"}],"predecessor-version":[{"id":1056402,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1052379\/revisions\/1056402"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1052424"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1052379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1052379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1052379"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1052379"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1052379"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1052379"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1052379"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1052379"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1052379"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1052379"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1052379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}