Microsoft Research Lab - India Articles http://approjects.co.za/?big=en-us/research/ Tue, 18 Jun 2024 03:53:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/mg-tsd-advancing-time-series-analysis-with-multi-granularity-guided-diffusion-model Tue, 18 Jun 2024 03:46:11 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1048293 Author: Chang Xu Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In order to tackle this challenge, researchers from Microsoft Research Asia introduce a novel approach called “MG-TSD”. The paper “MG-TSD: Multi-Granularity Time Series Diffusion Models […]

The post MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model appeared first on Microsoft Research.

]]>
Author: Chang Xu

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In order to tackle this challenge, researchers from Microsoft Research Asia introduce a novel approach called “MG-TSD”. The paper “MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process (opens in new tab)”, presented at ICLR 2024, capitalizes on the intrinsic granularity levels present in the data, utilizing them as predefined targets at various stages of the diffusion process. The aforementioned targets are employed to guide the learning trajectory of the diffusion models, thereby ensuring a more stable and accurate forecast.

It is noteworthy that  the MG-TSD method yields remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks with improvement ranging from 4.7% to 35.8%.

Guiding diffusion processes through intrinsic data granularities features in time series data

It can be observed that the forward process of the diffusion model, which sequentially corrupts the data distribution to a standard normal distribution, intuitively aligns with the process of smoothing fine-grained data into a coarser-grained representation. Both of these processes result in a gradual loss of finer distribution features. This suggests that intrinsic features within data granularities may also serve as a source of guidance.

Figure1: The process of smoothing data from finest-grained to coarsest-grained naturally aligns with the diffusion process
Figure1: The process of smoothing data from finest-grained to coarsest-grained naturally aligns with the diffusion process

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models. The coarse-grained data at different granularity levels are utilized as targets to guide the learning of the denoising process. These targets serve as constraints for the intermediate latent states, ensuring a regularized sampling path that preserves the trends and patterns within the coarse-grained data. The introduction of inductive bias facilitates the generation of coarser features during intermediate steps and facilitates the recovery of finer features in subsequent diffusion steps. Each granularity level can guide the diffusion process through different steps. When implementing this, both the coarse-grained data and the finest-grained data share different percentages of the variance schedule (a hyperparameter of the diffusion model), referred to as the “share ratio.” Consequently, this design reduces variability and results in high-quality predictions.

diagram, schematic
Figure2: Overview of the Multi-Granularity Time Series Diffusion (MG-TSD) model

MG-TSD achieves stable and outstanding prediction results

A comprehensive evaluation was conducted across six benchmarks and three performance metrics, in which nine baseline models were compared. The results demonstrate that the MG-TSD model achieves state-of-the-art (SOTA) status, with a substantial improvement ranging from 4.7% to 35.8% on the CRPS_sum metric across the six benchmarks. CRPS_sum indicates the similarity between two distributions; the smaller the value, the more similar they are.

Table 1: Comparison of CRPS_sum of models on six real-world datasets
Table 1: Comparison of CRPS_sum of models on six real-world datasets

Diffusion process aligns with data smoothing

The four subplots from Figure3(a) to Figure3(d) illustrate a gradual smoothing transformation of the distribution of increasingly coarser targets. The blue curve represents CRPS_sum values of coarse-grained targets and intermediate samples of the single-granularity diffusion model (1h) at each denoising step. As granularity transitions to coarser from left to right panels (4h→6h→12h→24h), targets progressively align with intermediate sample distributions at smaller denoising steps (approximately at steps 80→60→40→40). This comparison underscores the similarity between the diffusion process and the smoothing process, which transition from the finest-grained data to coarse-grained data. Both processes entail a gradual loss of finer characteristics from the finest-grained data through a smooth transformation.

Moreover, this observation is consistent with our selection of guiding steps for MG-TSD. The orange lines, which depict the performance of MG-TSD with different share ratios ranging from [0.2, 0.4, 0.6, 0.8, 1.0], and the blue lines, which represent the similarity of distributions, show a consistent trend. In other words, setting the guiding steps at a coarse granularity to match the steps where the diffusion intermediate samples have the closest distribution often achieves the best performance, as indicated by the grey region in the figure.

Figure 3: Selection of share ratio for MG-TSD models
Figure 3: Selection of share ratio for MG-TSD models

Coarse-grained samples demonstrate superior robustness in trend-capturing capabilities

Researchers visualize the ground truth and the predicted mean for both 1-hour and 4-hour granularity time series across four dimensions in the Solar dataset, as illustrated in Figure 4. In the MG-TSD model, the coarse-grained samples display a more robust capacity to capture the trends, subsequently guiding the generation of more precise fine-grained data.

Figure 4: MG-TSD and TimeGrad prediction intervals and test set ground-truth for Solar data of some illustrative dimensions of 370 dimensions from first rolling-window.
Figure 4: MG-TSD and TimeGrad prediction intervals and test set ground-truth for Solar data of some illustrative dimensions of 370 dimensions from first rolling-window.

For further information regarding MG-TSD, please refer to the project page at: https://github.com/Hundredl/MG-TSD (opens in new tab)

The post MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model appeared first on Microsoft Research.

]]>
AutoGen Update: Complex Tasks and Agents http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/autogen-update-complex-tasks-and-agents Tue, 04 Jun 2024 18:08:31 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035039 Adam Fourney discusses the effectiveness of using multiple agents, working together, to complete complex multi-step tasks. He will showcase their capability to outperform previous single-agent solutions on benchmarks like GAIA, utilizing customizable arrangements of agents that collaborate, reason, and utilize tools to achieve complex outcomes.

The post AutoGen Update: Complex Tasks and Agents appeared first on Microsoft Research.

]]>
Presented by Adam Fourney at Microsoft Research Forum, June 2024

Adam Fourney

“Agents are a very, very powerful abstraction over things like task decomposition, specialization, tool use, etc. Really, you think about which roles you need on your team, and you put together your team of agents, and you get them to talk to one another, and then you start making progress on your task.”

Adam Fourney, Principal Researcher, Microsoft Research AI Frontiers

Transcript: Lightning Talk

AutoGen Update: Complex Tasks and Agents

Adam Fourney, Principal Researcher, Microsoft Research AI Frontiers

Adam Fourney discusses the effectiveness of using multiple agents, working together, to complete complex multi-step tasks. He will showcase their capability to outperform previous single-agent solutions on benchmarks like GAIA, utilizing customizable arrangements of agents that collaborate, reason, and utilize tools to achieve complex outcomes.

Microsoft Research Forum, June 4, 2024

ADAM FOURNEY: Hello, my name is Adam Fourney, and today, I’ll be presenting our work on completing complex tasks with agents. And though I’m presenting, I’m sharing the contributions of many individuals as listed below. All right, so let’s just dive in.

So in this presentation, I’ll share our goal, which is to reliably accomplish long-running complex tasks using large foundational models. I’ll explain the bet that we’re taking on using multi-agent workflows as the platform or the vehicle to get us there, and I’ll share a little bit about our progress in using a four-agent workflow to achieve state-of-the-art performance on a recent benchmark.

So what exactly is a complex task? Well, if we take a look at the following example from the GAIA benchmark for General AI Assistants, it reads, “How many nonindigenous crocodiles were found in Florida from the years 2000 through 2020?” Well, to solve this task, we might begin by performing a search and discovering that the U.S. Geological Survey maintains an online database for nonindigenous aquatic species. If we access that resource, we can form an appropriate query, and we’ll get back results for two separate species. If we open the collection reports for each of those species, we’ll find that in one instance, five crocodiles were encountered, and in the other, just a single crocodile was encountered, giving a total of six separate encounters during those years. So this is an example of a complex task, and it has certain characteristics of tasks of this nature, which is that it benefits strongly from planning, acting, observing, and reflecting over multiple steps, where those steps are doing more than just generating tokens. Maybe they’re executing code. Maybe they’re using tools or interacting with the environment. And the observations they’re doing … they’re adding information that was previously unavailable. So these are the types of tasks that we’re interested in here. And as I mentioned before, we’re betting on using multi-agent workflows as the vehicle to get us there.

So why multi-agents? Well, first of all, the whole setup feels very agentic from, sort of, a first-principles point of view. The agents are reasoning, they’re acting, and then they’re observing the outcomes of their actions. So this is very natural. But more generally, agents are a very, very powerful abstraction over things like task decomposition, specialization, tool use, etc. Really, you think about which roles you need on your team, and you put together your team of agents, and you get them to talk to one another, and then you start making progress on your task. So to do all this, to build all this, we are producing a platform called AutoGen (opens in new tab), which is open source and available on GitHub. And I encourage you to check this out at the link below.

All right, so now let’s talk about the progress we’ve been making using this approach. So if you recall that question about crocodiles from the beginning, that’s from the GAIA benchmark for General AI Assistants. And we put together four agents to work on these types of problems. It consists of a general assistant, a computer terminal that can run code or execute programs, a web server that can browse the internet, and an orchestrator to, sort of, organize and oversee their work. Now with that team of four agents, we were actually able to, in March, achieve the top results on the GAIA leaderboard for that benchmark by about 8 points. But what’s perhaps more exciting to us is that we are able to more than double the performance on the hardest set of questions, the Level 3 questions, which the authors of that work describe as questions for a perfect general assistant, requiring to take arbitrarily long sequences of actions, use any number of tools, and to access the world in general. So this is all very exciting, and I want to share a little bit more about what those agents are actually doing.

So this is the loop or the plan that they are following. So it begins with the question or the prompt, and then we produce a ledger, which is like a working memory that consists of given or verified facts; facts that we need to look up, for example, on the internet; facts that we need to derive, perhaps through computation; and educated guesses. Now these educated guesses turn out to be really important because they give the language models space to speculate in a constrained environment without some of the downstream negative effects of hallucination. So once we have that ledger, we assign the tasks to the independent agents, and then we go into this inner loop, where we ask first, are we done? If not, well, are we still making progress? As long as we’re making progress, we’ll go ahead and we’ll delegate the next step to the next agent. But if we’re not making progress, we’ll note that down. We might still delegate one other step, but if that stall occurs for three rounds, then we will actually go back, update the ledger, come up with a new set of assignments for the agents, and then start over.

All right, so this is the configuration that’s been working well for us, and it’s all I have time to share with you today. But I mentioned our goal, our bet, and our progress, and I want to conclude by sharing our plans for the future. So already we’re starting to tackle increasingly more complex benchmarks and real-world scenarios with this configuration. And we’re really excited about opportunities to introduce new agents that, for example, learn and self-improve with experience; that understand images and screenshots a little better for maybe more effective web surfing or use of interfaces; and that are maybe a bit more systematic about exploring that solution space. So rather than just updating that ledger and then restarting when they get stuck, they can be a bit more pragmatic about the strategies that they’re employing.
All right, well, thank you for your attention, and thank you for attending the Microsoft Research Forum, and we look forward to you joining us next time.

The post AutoGen Update: Complex Tasks and Agents appeared first on Microsoft Research.

]]>
MatterGen: A Generative Model for Materials Design http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/mattergen-a-generative-model-for-materials-design Tue, 04 Jun 2024 18:07:15 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035024 Tian Xie introduces MatterGen, a generative model that creates new inorganic materials based on a broad range of property conditions required by the application, aiming to shift the traditional paradigm of materials design with generative AI.

The post MatterGen: A Generative Model for Materials Design appeared first on Microsoft Research.

]]>
Presented by Tian Xie at Microsoft Research Forum, June 2024

Tian Xie

“Materials design is the cornerstone of modern technology. Many of the challenges our society is facing today are bottlenecked by finding a good material. … If we can find a novel material that conducts lithium very well, it will be a key component for our next-generation battery technology. The same applies to many other domains.”

Tian Xie, Principal Research Manager, Microsoft Research AI for Science

Transcript: Lightning Talk

MatterGen: A Generative Model for Materials Design

Tian Xie, Principal Research Manager, Microsoft Research AI for Science

Tian Xie introduces MatterGen, a generative model that creates new inorganic materials based on a broad range of property conditions required by the application, aiming to shift the traditional paradigm of materials design with generative AI.

Microsoft Research Forum, June 4, 2024

TIAN XIE: Hello, everyone. My name is Tian, and I’m from Microsoft Research AI for Science. I’m excited to be here to share with you MatterGen, our latest model that brings generative AI to materials design.

Materials design is the cornerstone of modern technology. Many of the challenges our society is facing today are bottlenecked by finding a good material. For example, if we can find a novel material that conducts lithium very well, it will be a key component for our next-generation battery technology. The same applies to many other domains, like finding a novel material for solar cells, carbon capture, and quantum computers. Traditionally, materials design is conducted by search-based methods. We search through a list of candidates and gradually filter them using a list of design criteria for the application. Like for batteries, we need the materials to contain lithium, to be stable, to have a high lithium-ion conductivity, and each filtering step can be conducted using simulation-based methods or AI emulators. At the end, we get five to 10 candidates that we’re sending to the lab for experimental synthesis.

In MatterGen, we hope to rethink this process with generative AI. We’re aiming to directly generate materials given the design requirements for the target application, bypassing the process of searching through candidates. You can think of it as using text-to-image generative models like DALL-E to generate the images given a prompt rather than needing to search through the entire internet for images via a search engine. The core of MatterGen is a diffusion model specifically designed for materials. A material can be represented by its unit cell, the smallest repeating unit of the infinite periodic structure. It has three components: atom types, atom positions, and periodic lattice. We designed the forward process to corrupt all three components towards a random structure and then have a model to reverse this process to generate a novel material. Conceptually, it is similar to using a diffusion model for images, but we build a lot of inductive bias like equivariance and periodicity into the model because we’re operating on a sparse data region as in most scientific domains.

Given this diffusion architecture, we train the base model of MatterGen using the structure of all known stable materials. Once trained, we can generate novel, stable materials by sampling from the base model unconditionally. To generate the material given desired conditions, we further fine-tune this base model by adding conditions to each layer of the network using a ControlNet-style parameter-efficient fine-tuning approach. The condition can be anything like a specific chemistry, symmetry, or any target property. Once fine-tuned, the model can directly generate the materials given desired conditions. Since we use fine-tuning, we only need a small labeled dataset to generate the materials given the corresponding condition, which is actually very useful for the users because it’s usually computationally expensive to generate a property-labeled dataset for materials.

Here’s an example of how MatterGen generates novel materials in the strontium-vanadium- oxygen chemical system. It generates candidates with lower energy than two other competing methods: random structure search and substitution. The resulting structure looks very reasonable and is proven to be stable using computational methods. MatterGen also generates materials given desired magnetic, electronic, and mechanical properties. The most impressive result here is that we can shift the distribution of generated material towards extreme values compared with training property. This is very significant because most of the materials design problem involves finding materials with extreme properties, like finding superhard materials, magnets with high magnetism, which is difficult to do with traditional search-based methods and is the key advantage of generative models.

Our major next step is to bring this generative AI–designed materials into the real life, making real-world impact in a variety of domains like battery design, solar cell design, and carbon capture. One limitation is that we only have validated this AI-generated materials using computation. We’re working with experimental partners to synthesize them in the wet lab. It is a nontrivial process, but we keep improving our model, getting feedbacks from the experimentalist, and we are looking forward to a future where generative AI–designed materials can make real-world impact in a broad range of domains. Here’s a link to our paper in case you want to learn more about the details. We look forward to any comments and feedbacks that you might have. Thank you very much.

The post MatterGen: A Generative Model for Materials Design appeared first on Microsoft Research.

]]>
Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/driving-industry-evolution-exploring-the-impact-of-generative-ai-on-sector-transformation Tue, 04 Jun 2024 18:05:27 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035015 Jiang Bian discusses how generative AI transforms industries by bridging gaps between AI capabilities and sector needs. He will showcase domain-specific foundation models and versatile AI agents, setting new industry standards.

The post Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation appeared first on Microsoft Research.

]]>
Presented by Jiang Bian at Microsoft Research Forum, June 2024

headshot of Jiang Bian

There is “a substantial demand for advanced generative AI tailored to enhance core business operations. However, in our dialogues with strategic partners, we have identified crucial gaps in current generative AI capabilities versus the specific needs of industry applications. … Our research is crucial in addressing these limitations and amplifying the underappreciated potential of generative AI.”

Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Transcript: Lightning Talk

Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation

Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Jiang Bian discusses how generative AI transforms industries by bridging gaps between AI capabilities and sector needs. He will showcase domain-specific foundation models and versatile AI agents, setting new industry standards.

Microsoft Research Forum, June 4, 2024

JIANG BIAN: Hello, everyone. My name is Jiang. Today, I’m excited to discuss the work we are undertaking at Microsoft Research Asia focusing on leveraging generative AI to drive transformation and evolution across various industries.

Our efforts are inspired by our unique co-innovation initiative with world-renowned partners from a few core sectors, including finance, manufacturing, energy, and so on. These collaborations have highlighted a substantial demand for advanced generative AI tailored to enhance core business operations. However, in our dialogues with strategic partners, we have identified crucial gaps in current generative AI capabilities versus the specific needs of industry applications. These include a too-narrow focus on human-like AI but not critical industry applications, limitations in processing complex and noisy data, and concerns about reliability in complex decision-making scenarios. Our research is crucial in addressing these limitations and amplifying the underappreciated potential of generative AI in high-value sectors. We are focusing on two main approaches: developing domain-specific foundation models that enhance analytical and predictive capabilities or enable interactive and controllable simulations and creating a versatile foundation-model-as-agent system for diverse industry decision-making tasks.

Our first project is transforming the way industrial data is analyzed and utilized. Facing diverse data formats like tabular, time series, and graph from various sectors, we are employing Generative Data Learning to enhance the large language model with strong ability to interpret and process diverse data formats by transforming them into a unified, instruction-oriented language. With training over this diverse sector data for [numerous]tasks, this approach enables more intuitive data analytics and predictions across various industries. Initial experiments on a typical classification and regression task over tabular data have shown that even a relatively small-scale model enhanced by our Generative Data Learning approach can outperform both general large language models and traditional models like tree ensembles, particularly in few-shot scenarios. This suggests the significant potential for a single-model solution with no extensive model training or fine-tuning in exploring industrial data intelligence maybe with only few-shot examples.

Our second project is exploring building foundation models over domain-specific data, and we focus on financial markets given its fundamental data is orders. We have developed a dual-level foundation model called Large Market Model that uses transformers on both the order sequence to model the market dynamics and the order-batch sequence to align the market trend with control signals. The performance of financial market simulations based on this Large Market Model has been very promising. They have excelled in forecasting market trends, simulating extreme scenarios for stress tests, and detecting market manipulations efficiently.

Our third project focuses on creating a decision-making agent through the knowledge-augmented generation and adaptive retrieval. This agent is essentially a trainable model that generates and extracts domain-specific knowledge, dynamically updating itself and retrieving most appropriate knowledge to handle changing environment. This adaptive approach is particularly useful in many industrycontrol applications, such as HVAC control with the goal of optimizing energy use while maintaining comfort. Deploying this agent into this scenario has shown it can outperform traditional reinforcement learning methods, saving significantly more energy, especially in unknown environments or when facing perturbations.

In summary, at MSR Asia, we are committed to advancing the development of generative AI to catalyze industry evolution through innovative research and partnership. We will soon be sharing more details about these projects through upcoming papers and open-source initiatives. We invite you, especially our industry partners, to stay tuned and join us in driving these transformative efforts forward. Thank you.

“Foundation models, also known as large language models, possess immense potential across a variety of industries. Yet, some companies and organizations limit their use of these expansive AI models to niche areas, including intelligent customer service, chatbots, or text and image generation. In reality, these foundation models demonstrate robust abilities in reasoning, content creation, and generalization, making them exceptionally fit for high-stakes business tasks. These tasks range from taking accurate prediction and forecasting, optimizing industrial control and complex decision-making, and conducting intelligent and interactive industrial simulations.”

— Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Maximizing the business value of foundation models in industry applications

By Jiang Bian

As the development of large AI models, also known as foundation models, progresses, companies and organizations are becoming increasingly excited about their potential for enhancing productivity. However, a significant trend has been observed: many industry practitioners focus heavily on the human-like qualities of AI, such as conversational abilities, writing skills, creativity, and perceptual capabilities. In deploying these large AI models, there is a tendency to prioritize applications in intelligent customer service, chatbots, and other so-called ”human-like” functions. Unfortunately, this emphasis may restrict our comprehension and use of these potent models, hindering our ability to fully unleash their capabilities within various industries.

This limitation is not without reason. Incorporating foundation models into practical, production-oriented scenarios is still in its infancy, with few mature and widespread examples to follow. Viewing AI as a “production tool” is akin to possessing a tool before fully understanding its potential applications. Furthermore, humanity has rarely, if ever, encountered such a versatile yet uncertain tool that is not designed for specific tasks.

Additionally, the complexity and variety inherent in different industries require foundation models that move beyond traditional perceptions. This necessitates synchronized innovation in models at the industry level, enabling them to fully exploit the capabilities of foundation models across diverse industrial landscapes and to better align with AI applications. Instead of limiting AI to a “chat robot” role, we should broaden our perspective. Transforming industries in the AI era involves rethinking current business processes and frameworks, leading to collaborative models that can effortlessly integrate humans and foundation models.

Unlocking the boundless potential of foundation models in industry

Foundation models are endowed with broad capabilities in data representation, knowledge comprehension, and reasoning, allowing them to adjust seamlessly across various domains and scenarios, and swiftly adapt to new environments. Concurrently, digital platforms across industries have evolved, amassing substantial amounts of industry-specific data. This rich repository of knowledge and information positions foundation models to integrate effortlessly into industrial settings.

In practical terms, the advanced reasoning abilities of foundation models provide users with a deeper understanding of data. By extracting valuable insights from large datasets and identifying patterns and correlations, these models deliver more effective recommendations and deeper insights. This benefit is especially vital in industrial contexts, where prediction, decision-making, and simulation play crucial roles.

One of the standout features of foundation models is their exceptional ability to generalize. Before their advent, each industry scenario required specific data to train bespoke AI models, limiting scalability and hindering the full commercial exploitation of AI. Foundation models, with their access to a global pool of knowledge, markedly improve generalization. As a result, industries are freed from the necessity of developing unique models for every situation, overcoming a major limitation of traditional AI solutions.

Moreover, foundation models can work in tandem with generative AI to increase the accuracy, realism, and interactivity of industrial simulations and intelligent modeling, facilitating the creation of digital twins. These simulations and models aim to mimic and test real-world scenarios, which often involve complex roles and intricate environments. Traditional AI models may simplify real-world complexities or miss crucial extreme events, compromising the fidelity and authenticity of simulations. In contrast, generative large AI models, steeped in domain-specific knowledge, establish accurate mappings between specific data dimensions and real-world occurrences. This method allows for simulations that closely mirror reality, significantly aiding industrial forecasting and decision-making processes while maintaining adherence to industry standards.

In the industrial sector, tasks of paramount importance and commercial value include precise forecasting and control, efficient optimization of decisions, and complex duties associated with intelligent and interactive industrial simulations. These areas should be the primary focus for traditional industrial enterprises. Yet, when assessing existing foundation models like GPT and the actual needs within industrial domains, we uncover significant mismatches between the capabilities of these models and the real demands of industry. To bridge this gap and fully leverage their potential, several challenges must be addressed.

First, there is a notable absence of a universal framework capable of effectively extracting complex domain knowledge from diverse field data and using this knowledge to construct intelligent agents. Various domains contain rich and complex data, such as logistics companies dealing with customs information and cross-national policies, pharmaceutical industries with FDA drug review documents, and the legal industry with numerous regulations. Developing intelligent agents that are deeply rooted in domain knowledge calls for a more generalized framework. This framework should be proficient in extracting crucial domain knowledge, identifying hidden connections between data and knowledge, and managing this information efficiently.

Second, while foundation models are adept at generating textual content, their proficiency in processing and understanding structured data, like numerical or tabular information, is lacking. Industrial scenarios often involve structured data, such as health monitoring indicators, battery charge-discharge cycles, and financial transactions. Current large models are not specifically designed or optimized for processing such data, which complicates accurate prediction and classification tasks based on structured inputs.

Third, in practical applications, foundation models currently fall short in stability and reliability for decision-making. Critical industries like energy, logistics, finance, and healthcare require dependable decision-making for tasks such as optimizing logistics routes, controlling energy equipment, formulating investment strategies, and allocating medical resources. These tasks often involve numerous variables and constraints, especially under dynamic environmental changes. Foundation models have yet to fully adapt to these complex industrial tasks, making direct application challenging.

Lastly, there is a lack of insight into domain-specific foundational data, as well as methodologies and experience for developing domain-specific foundation models. Essential information in many specialized fields extends beyond mere text, incorporating unique data structures and semantic relationships. For example, transaction order information in the financial investment field or molecular structure details in the biopharmaceutical industry contain critical knowledge often embedded in such foundational data. A deeper, more nuanced analysis is required. Creating domain-specific foundation models grounded in this detailed understanding is crucial for effectively leveraging and unlocking the potential of data in these fields.

Constructing industry foundation models: harmonizing general knowledge and domain expertise

To expedite the adoption and application of foundation models in industry, we can concentrate on several pivotal areas.

First, we can harness rich and complex industrial domain data to construct a more versatile, efficient, and practical retrieval-augmented generation (RAG) framework. This framework is designed to adapt seamlessly to various vertical domains, extracting essential domain knowledge, uncovering hidden associations between data and knowledge, and effectively organizing and managing this wealth of information.

Diagram: A more universal, efficient, and practical retrieval-augmented generation (RAG) framework based on foundation models.
Figure 1. A more universal, efficient, and practical retrieval-augmented generation (RAG) framework based on foundation models.

Second, by carefully considering critical numerical data and the corresponding structured dependencies prevalent in industrial scenarios, we can design foundation models specifically optimized for industrial applications. These models effectively integrate general knowledge with domain-specific expertise derived from temporal or tabular data, thereby enabling more effective solutions for tasks such as prediction and classification within the industry.

Diagram: From traditional industry AI solutions to Industry foundation models integrating general and domain knowledge.
Figure 2. From traditional industry AI solutions to Industry foundation models integrating general and domain knowledge.

Another avenue we are actively exploring involves harnessing the potent generation, generalization, and transfer capabilities inherent in foundation models to elevate the quality and efficiency of industrial decision-making. We are pursuing two distinct paths: first, treating foundation models as intelligent agents, and; second, leveraging foundation models to assist reinforcement-learning agents.

Treating foundation models as intelligent agents: By leveraging the pre-existing knowledge encoded in foundation models and integrating offline reinforcement learning, we can continuously acquire new domain-specific insights and fine-tune the models. This evolutionary process enhances the optimization and decision-making capabilities of foundation models, enabling them to prioritize industry-specific tasks.

Foundation models optimized for specific tasks can play a pivotal role across various industrial contexts. In formula racing, for example, these foundation models can optimize tire-maintenance strategies. By considering tire wear and repair costs, they determine the optimal pit stop timing, thereby shortening race duration and improving car rankings. In chemical manufacturing, leveraging these foundation models can significantly enhance efficiency in product storage and pipeline coordination during production processes, ultimately boosting overall production-execution efficiency. Furthermore, due to their generalization capabilities and robustness, foundation models can be swiftly adapted to optimize air conditioning control, ensuring comfortable temperatures while minimizing energy consumption.

Diagram: Foundation models and offline reinforcement learning are being synergized to construct decision-making agents.
Figure 3. Foundation models and offline reinforcement learning are being synergized to construct decision-making agents.

Assisting reinforcement learning agents with foundation models: We can empower models to acquire universal representations that rapidly adapt to diverse environments and tasks, thereby enhancing their generalization capabilities. In this approach, we introduce a pre-trained world model that emulates human learning and decision-making processes, ultimately bolstering industrial decision-making. By harnessing a pre-trained world model with extensive knowledge and adopting a two-stage pre-training framework, developers can comprehensively and flexibly train foundation models for industrial decision-making, extending their applicability to any specific decision scenario.

We partnered with the Microsoft Xbox team to rigorously validate the effectiveness of our framework in game-testing scenarios. By harnessing this framework, we pre-trained a specialized world model tailored for game maps. This model directly tackles the challenge of long-term spatial reasoning and navigation, leveraging landmark observations within novel game environments. The results were remarkable: our pre-trained model significantly outperformed counterparts that lacked a world model or relied on traditional learning methods. As a result, game exploration efficiency was greatly enhanced.

Moreover, we can harness domain-specific foundational data and the precise semantic information it encapsulates to develop foundation models within the domain, thereby unlocking novel opportunities for intelligent, interactive decision-making, and simulation. For example, by analyzing transactional data from financial markets, we can construct robust investment models. These foundational datasets extend beyond mere textual characters; they embody intricate semantic structures and valuable information. Leveraging this financial foundation model, we can generate customized order flows for various market styles, simulate large-scale order transactions across diverse market environments, and conduct controlled experiments in the financial investment landscape. This approach empowers us to gain deeper insights into market fluctuations and devise strategies for extreme scenarios.

Diagram: Leveraging financial foundation models to implement order flow generation for different market styles, thereby simulating diverse market environments.
Figure 4. Leveraging financial foundation models to implement order flow generation for different market styles, thereby simulating diverse market environments.

Foundation models propel the next industrial digital transformation

Microsoft Research Asia has long recognized that the widespread adoption of AI in industry necessitates continuous technological exploration, experimentation, and breakthroughs. Through collaborative efforts with partners across various industries, we have developed open-source models, including the Qlib AI quantitative investment platform, the MARO multi-agent resource optimization platform, the FOST spatial-temporal prediction tool, and the BatteryML battery performance analysis and prediction platform. These industry-oriented AI platforms, tools, and models not only play a pivotal role in industry but also serve as critical data and foundational components for implementing cutting-edge foundation models.

Building upon successful experiences in industrializing AI, we have embarked on the exploration of domain-specific foundation models tailored for industry, drawing from the dimensions previously discussed. Our findings reveal that these foundation models possess significant potential to diverge from conventional large-scale model paradigms and profoundly impact industrial transformation.

Envision a future where foundation models empower knowledge management, extraction, and iterative processes across industries. Furthermore, we are actively investigating how foundation models can support companies in achieving automated research and development (R&D). This encompasses tasks such as automatically identifying R&D directions, generating algorithmic research proposals, automating R&D processes and scientific experiments, and iteratively refining research approaches. In essence, AI will autonomously propel data-centric industrial R&D, fundamentally revolutionizing industry operations.

Diagram: R&D agent: Automatically evolve the R&D cycle centered on industrial data.
Figure 5. R&D agent: Automatically evolve the R&D cycle centered on industrial data.

Foundation models are poised to become the driving force behind industrial digital transformation, mirroring the transformative impact of the internet and cloud computing. These models are set to unleash a new wave of industrial innovation. We eagerly anticipate collaborating with additional industry partners, immersing ourselves in real-world scenarios, and exploring diverse applications for foundation models within the industrial landscape, thereby fully unlocking their commercial potential.


Author

Dr. Jiang Bian currently serves as a senior principal research manager at Microsoft Research Asia. He leads the Machine Learning Group and the Industry Innovation Center at Microsoft Research Asia.

His team’s research spans deep learning, reinforcement learning, and privacy computing, with a focus on cutting-edge applications of AI in vertical domains such as finance, energy, logistics, manufacturing, healthcare, and sustainable development.

Dr. Jiang Bian has authored over a hundred research papers published in top-tier international conferences and journals. Also, he holds several U.S. patents. Dr. Jiang actively contributes to the academic community by serving on program committees for various prestigious international conferences and acting as a reviewer for leading international journals. In recent years, Dr. Jiang’s team has made significant strides in applying AI-based prediction and optimization techniques to critical scenarios across diverse fields, such as finance, logistics, and healthcare. Furthermore, they have generously shared relevant technologies and frameworks with the open-source community.

Dr. Jiang Bian completed his undergraduate studies at Peking University, earning a bachelor’s degree in computer science. He then pursued further studies at the Georgia Institute of Technology in the United States, where he obtained his Ph.D. in computer science.

The post Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation appeared first on Microsoft Research.

]]>
Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: A Case Study on CLIP http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/insights-into-the-challenges-and-opportunities-of-large-multi-modal-models-for-blind-and-low-vision-users-a-case-study-on-clip Tue, 04 Jun 2024 18:04:19 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035006 Daniela Massiceti delves into the transformative potential of multimodal models such as CLIP for assistive technologies. Specifically focusing on the blind/low-vision community, the talk explores the current distance from realizing this potential and the advancements needed to bridge this gap.

The post Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: A Case Study on CLIP appeared first on Microsoft Research.

]]>
Presented by Daniela Massiceti at Microsoft Research Forum, June 2024

Daniela Massiceti

“Today’s AI models hold incredible potential for assisting the blind community—from text recognition to object identification to question answering. Apps like Seeing AI are already deploying some of these AI features. But there is potential for much more.”

Daniela Massiceti, Senior Researcher, Microsoft Research Cambridge

Transcript: Lightning Talk

Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: A Case Study on CLIP

Daniela Massiceti, Senior Researcher, Microsoft Research Cambridge

Daniela Massiceti delves into the transformative potential of multimodal models such as CLIP for assistive technologies. Specifically focusing on the blind/low-vision community, the talk explores the current distance from realizing this potential and the advancements needed to bridge this gap.

Microsoft Research Forum, June 4, 2024

DANIELA MASSICETI: Hi there. My name is Daniela Massiceti, and I’m a senior researcher at Microsoft Research Cambridge. Today, I will be sharing our recent CVPR paper, which examines the challenges and opportunities of large multi-modal models for blind and low-vision users.

Today’s AI models hold incredible potential for assisting the blind community—from text recognition to object identification to question answering. Apps like Seeing AI are already deploying some of these AI features. But there is potential for much more. And I think this is hinted at by the recent partnership between OpenAI and Be My Eyes, with the promise that one day, human assistance could be replaced by AI agents that provide instantaneous assistance to blind users around the world. But despite their potential, no works have really looked at, well, how well do these models actually work on image and text data captured by blind users? And we know from the literature that this data is likely to be out of distribution or different in a number of ways. For example, blind users use a range of quite specialized assistive objects. They also are more likely to capture images with quality variation, things like camera blur and occlusion. And they’re also more likely to make use of non-visual vocabulary, for example, describing their objects by their physical rather than their visual properties.

Our work, therefore, set out to remedy this. Specifically, we systematically evaluated 25 variants of the CLIP model on data from blind and low-vision users. CLIP is one of today’s most widely used multi-modal models. It has over 15,000 citations and 75 million downloads. We used the ORBIT and the VizWiz-Classification datasets. Both of these are collected by blind users through real-world assistive applications. And we inspected CLIP’s performance on both a zero-shot image classification task directly as well as through examining the performance of models that use CLIP as a component, which is very widely done in the community. I unfortunately don’t have time to go into all the details of our work, but I will share our top three findings with you. First, we confirmed that CLIP does indeed underperform on data that is captured by blind and low-vision users. Second, these disparities trickle down to models that use CLIP as a component. And then third, these disparities stem from the fact that disability content is significantly underrepresented and sometimes missing completely from the datasets that are used to pretrain these large models. And I’ll dive into our three findings in a bit more detail.

So for the first finding, we found that CLIP underperforms on objects, image quality, and language that is typically used by blind users. On object type, CLIP recognizes disability objects like a Braille keyboard, for example, up to 28 percentage points less accurately than common objects like a TV remote. On image quality, CLIP is up to 23 percentage points more sensitive to images with things like camera blur and lighting compared to images that don’t have these quality issues. And on language, CLIP recognizes objects that are described by their material—so, for example, a leather boot—up to 12 percentage points less accurately than objects described by their color—for example, a brown boot. And we know that blind users rely heavily on this tactile rather than visual language.

Towards our second finding, we examined three models that use CLIP under the hood—an object detection model, an image segmentation model, and an image generation model—and found that all three struggle with disability content. For example, DALL-E 2, which relies on a CLIP vision encoder, cannot generate common disability objects like guide canes and Braille keyboards. Instead, as you can see here, it gives us very strange-looking walking canes and lots and lots of randomly placed white dots. In comparison, DALL-E 2 generated really high-quality and realistic images for almost all of the non-disability objects that we tested.

And then towards our third and final finding, we really wanted to understand where these performance disparities were stemming from. And so we quantified just how prevalent disability content is in three popular datasets that are commonly used to pretrain these large models: LAION-[400]Million, LAION-2 Billion, and the DataComp-1B dataset, or 1 billion dataset. Specifically, we counted how many times objects are mentioned in these datasets’ captions and found that disability objects appear up to 16 to 17 times less frequently than non-disability objects across all three of the datasets.

So as you can see, our work has identified a clear gap in current models’ capabilities for blind users, and this could have very real consequences if these models are then integrated into assistive technologies for the blind and low-vision community. So what should we, as a research community, be doing about it? First, I think more work is needed to understand how models come to learn or adapt to long-tailed data. Some of our early results show that few-shot learning approaches hold some promise, but they don’t always work, especially in more challenging scenarios, for example, when objects appear in highly cluttered scenarios. And second, I think it’s important for us to really focus on including more disability content in these large-scale pretraining datasets. And our team [is] currently working on developing equitable and fair practices alongside disabled communities to source data that is truly representative of their needs. And so with that, I will wrap up.

Thank you to all the people behind this work and thank you for listening.

The post Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: A Case Study on CLIP appeared first on Microsoft Research.

]]>
Panel Discussion: Generative AI for Global Impact: Challenges and Opportunities http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/panel-discussion-generative-ai-for-global-impact-challenges-and-opportunities Tue, 04 Jun 2024 18:02:54 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035000 Microsoft researchers discuss the challenges and opportunities of making AI more inclusive and impactful for everyone—from data that represents a broader range of communities and cultures to novel use cases for AI that are globally relevant.

The post Panel Discussion: Generative AI for Global Impact: Challenges and Opportunities appeared first on Microsoft Research.

]]>
Hosted by Jacki O’Neill, with Sunayana Sitaram, Daniela Massiceti, and Tanuja Ganu at Microsoft Research Forum, June 2024

Sunayana Sitaram

“One of the solutions that we’ve been using is to actually design with ‘human in the loop’ in mind because we know that these technologies are not perfect. And so, we really want to figure out ways in which humans and AI systems can work together in order to create the most effective outcome.”

Sunayana Sitaram, Principal Researcher, Microsoft Research India

Transcript: Panel Discussion

Generative AI for Global Impact: Challenges and Opportunities

Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi (host)
Sunayana Sitaram, Principal Researcher, Microsoft Research India
Daniela Massiceti, Senior Researcher, Microsoft Research Cambridge
Tanuja Ganu, Principal Research SDE Manager, Microsoft Research India

Microsoft researchers discuss the challenges and opportunities of making AI more inclusive and impactful for everyone—from data that represents a broader range of communities and cultures to novel use cases for AI that are globally relevant.

Microsoft Research Forum, June 4, 2024

JACKI O’NEILL: I’m delighted to be hosting what promises to be a really engaging panel today with three fabulous panelists. In my talk, I talked about the importance of building globally equitable generative AI systems for diverse communities and application areas, and I hope that I’ve convinced you all of the importance of doing this if generative AI is not going to compound existing systemic inequalities. In this panel, we’re going to dive much deeper into the application areas, the user populations, the problems, and the solutions of doing this with our three expert panelists: Sunayana Sitaram, Tanuja Ganu, and Daniela Massiceti. So without further ado, I’d like to ask each of the panelists to introduce themselves.

TANUJA GANU: Thank you, Jacki, and hello, everyone. My name is Tanuja Ganu, and I’m principal research engineering manager at Microsoft Research in India. My background is in applied AI, and my work is focused on developing and validating technologies that would drive positive change in society. I have been leading an incubation center in MSR India called SCAI—Societal Impact through Cloud and AI—and in last 1½ years, I’m spending a lot of time on how we can take the potential of generative AI to empower every individual across the globe and catalyze the change in some of the domains like education. Thank you.

SUNAYANA SITARAM: Hi, everyone. I’m Sunayana Sitaram. I’m principal researcher at Microsoft Research India, and my background is in natural language processing. My research involves trying to make sure that large language models, or generative AI as they’re also known, work well for all languages and cultures. And over the last couple of years, my research group has really looked into how to evaluate how well these large language models are doing for different languages across the world, including languages that have smaller amounts of data compared to English but are still spoken by millions of people worldwide. Thank you.

DANIELA MASSICETI: Hi, everyone. My name is Daniela Massiceti, and I’m a senior researcher at Microsoft Research based in Australia. My background is in machine learning, but nowadays, I work much more at the intersection of machine learning and human-computer interaction, particularly looking at multi-modal models. So these are models that work with both image and text input. And my main focus is, how do we ensure that these AI models or AI systems work well for the users who are in the tails of the user distribution? And in particular, the research that I’ve done along with my team, it particularly looks at people with disabilities, who will, of course, be major beneficiaries of these multi-modal models.

O’NEILL: Thank you so much. I’d like to start by asking you what you see as the core problems we face building equitable generative AI that works well for diverse communities and user groups. Tanuja, would you like to start us off?

GANU: Let me start off by saying that I feel that this is an exciting time to be in technology, and I’m really thrilled with the remarkable progress and the vast potential of generative AI. And we are already seeing successful deployments of generative AI in enterprise applications like GitHub Copilot for programmers or Office 365 Copilot for enterprise users, which is showing the improved efficiency and quality as well as giving ability to the users to focus more on their creative work. So the natural next question is, how can we take this power of generative AI and empower every individual, every individual across the globe—the people who are coming from different nationalities, different ethnicities, cultures, as well as with varied, kind of, technology access and financial, kind of, affordability, as well? So when we are looking at this technological evolution, I think it’s crucial that we, kind of, prioritize and focus and address the digital divide and we really, kind of, actively work to reduce this particular gap. So taking these points into account, there are [a] few sociotechnical challenges that we need to address when we want to make sure that generative AI technology truly, kind of, works for every individual. So firstly, I think the first important challenge is making sure that these technologies are able to provide seamless interaction across thousands of world languages. And it’s not only about language, but it’s also about incorporating and preserving cultural nuances in these different kind of communities and user groups. The second important challenge is about designing for existing infrastructural constraints. Infrastructural constraints are like the existing technologies need to have low-end mobile phones as primary interface in some of the cases or dealing with low or intermittent network connectivity and overall low affordability when we are especially looking at vast majority of populations from Global South. The third important problem that I consider is the varied access levels depending upon the literacy levels as well as the access needs depending upon disabilities. And the fourth important challenge is really overarching as in, how can we expand and how can we revisit the responsible AI and safe deployment principles taking into account these culturally and linguistically varied user groups and expanding to include the dimensions of equity, access, and inclusion? So I think these are really some of the important challenges.

O’NEILL: Thank you so much, Tanuja. I think you’ve really given us a great overview there. Daniela, I wonder if you could deep dive a bit on the accessibility questions that Tanuja raised.

MASSICETI: Yeah, sure thing, Jacki. So, yeah, I can definitely bring some perspectives here from the work that my team—me and my team—have done in the accessibility space. So we know, as I said earlier, that these multi-modal models really hold the potential to transform assistive technologies for communities with disabilities. But up until now, very few works have actually quantified, well, how well are these models going to work for these communities? And so a piece of work that we recently did, which was published in CVPR, basically aimed to do exactly this. Specifically, we looked at images and text captured by users who are blind and then evaluated how well CLIP, which is a very popular multi-modal model, actually works on their data. And I wanted to share, kind of, three insights that came from this work which speak to the core challenges I think that lie ahead of us realizing truly equitable AI systems in the future.

So the first is that the datasets typically used to train these AI models do not include data from communities with disabilities. In our work, we analyzed three large-scale datasets that are typically used to pretrain these large multi-modal models, and we found that disability content—things like guide canes, Braille displays—are significantly underrepresented or actually just not present at all in these datasets. And so this means that then any model that is trained on this dataset will perform poorly on any task that involves identifying, locating, or answering questions about any of these particular objects. And I don’t think that this problem of data inclusion is just the case for the blind and low-vision community but many, many marginalized communities who may not be included in these datasets. And the second core problem is that I think we’re moving toward this paradigm where we have a very small number of enormous models—these so-called foundation models—which are being widely used by many, many downstream models and applications. But if these foundation models don’t work well in the first instance for marginalized communities, then we have the potential to see this compounding essentially in any downstream application that uses these foundation models. And this is exactly what we saw in our CVPR work.

We identified that CLIP, as a base model, significantly underperforms on data from blind and low-vision users. But then when CLIP is embedded as a component in other models, these failures persist and in some cases are even amplified. So, for example, we looked at DALL-E 2, which uses a CLIP vision encoder under the hood, and we basically saw that it couldn’t generate any decent images of any of the disability objects we tested. You know, when we asked it for a guide cane, it gave us very funky-looking walking sticks. And when we asked it for Braille keyboards, it again gave us these random arrangements of white dots on a page.

And in the final core problem I’ll reflect on is that I think we don’t often embed ourselves deeply enough in marginalized communities to really understand the ways that AI models need to work for these communities. So, for example, one of the findings in our CVPR paper was that CLIP has trouble recognizing objects if users describe them by their material rather than their color. So, for example, a user might say find my leather bag rather than my brown bag. And we only really knew to test for this because our team collectively has over 20-plus years of experience in working with the blind and low-vision community to know that users often use these material-based descriptions when they’re talking about their objects. And so without this insight, we would never have uncovered this particular failure mode, and so I think it’s really important, to achieve truly equitable AI models, we really need to deeply embed ourselves in the communities that we’re working with.

O’NEILL: Thank you, Daniela. So Sunayana, Daniela’s given us a really good overview of the challenges with the multi-modal models and the image models. I know that your research is primarily thinking about how different language communities can interact with these language models. I’m wondering, what do you see as the problems for making these models work well for anyone, anywhere, whatever language they speak?

SITARAM: Right. So as Daniela mentioned, there is a data divide, right, even when it comes to languages because most language models today are trained predominantly on data that comes from the web. And we know that not all languages and cultures are equally represented on the web, right. So at the very first step of the pipeline, you now have this inequity because of different representation of different languages and cultures. But I think that’s not the only problem. There are a lot of other decisions that are taken during the model-building process which could also influence downstream performance. So, for example, in some of our research earlier last year, which was published in EMNLP, we found that the tokenizer, which is the component that actually breaks words down into smaller pieces, that doesn’t work equally well for all languages, and that actually has a significant impact on downstream performance. So things like this, you know, decisions that are taken during the model-building process can also really influence the performance. And finally, you know, one of the biggest challenges I see—and I may be a little biased because this is my area of research—is that, you know, we are not able to actually evaluate these models across all languages and cultures well. And this is because of a variety of reasons, including the fact that, you know, not too many benchmarks exist with the sufficient linguistic and cultural diversity. But because we are not doing a good job of evaluation, we don’t even know how well these models work for different languages and cultures. And so I think, you know, beyond data, there are many other challenges that need to be addressed in order to make these models actually work for all languages and cultures.

O’NEILL: Yeah, thank you so much. I think it’s really clear from your answers how these technologies are the biggest challenges for making these technologies work at both the societal level and also the level of the actual models themselves, you know, whether they’re vision or multi-modal models or language models, and we know that this has a direct impact on various user populations. As Tanuja mentioned in the beginning, you know, we’re seeing a lot of enterprise applications and enterprise technologies being developed, whether that’s for helping you code or ideate or answer emails. But are there other user populations who could really benefit from applications of generative AI which works well? Tanuja?

GANU: Yeah, so I think there are a lot of interesting and impactful applications which are emerging for generative AI in domains like education or health care and agriculture. So let me give you an example from our work in education, where we are developing [an] AI assistant, which is called Shiksha copilot, that provides agency to the teachers in public schools in India for generating personalized and engaging learning experiences like activities, assessments, the teaching material for their students. So what is important here is that the content generated is completely grounded in the local curriculum and the interaction is completely in local language, which is Kannada in this particular case. It’s also important that the content, kind of, preserves the cultural or local norms. So let’s take an example of a teacher teaching components of food or balanced diet as the topic. So it should include the examples which are coming from the local diet and cuisine, maybe giving an example of biryani or maybe giving an example of ragi mudde, which is made up of finger millet. So it’s also additionally important that the teacher is able to use and generate the lesson plans on the mobile phone or their desktop, whichever are the, kind of, resources which are available to them, and they should be able to utilize this particular Shiksha copilot while using in the classrooms where AV systems might not be available. So they can generate the lesson plan on the phone, and they can take it to the classroom and completely utilize it in the offline manner. So I think these are all the challenges that we discussed earlier; those become really important when we are doing these kind of real-world deployments. So with Shiksha copilot, we have completed a successful small pilot with 50 teachers in India, and now we are gearing towards a scaled pilot with thousand teachers. And I feel like applications like these can have a really transformative effect in the education system and create a positive impact for students and teachers across the globe.

O’NEILL: Thank you. Daniela, for the accessibility populations, what type of applications and populations are important in this space?

MASSICETI: Yeah, sure thing. So an estimated 1.3 billion people—around 16 percent of the global population—live with some level of disability today. So I think it’s really exciting to see these generative AI applications coming online for these communities, and our team has done, as you may already have gathered, a lot of work with the blind and low-vision community. And so I wanted to call out a couple of promising generative AI applications for this particular community. The first is Microsoft’s own actually: Seeing AI. So Seeing AI is a mobile app for users who are blind and low vision, and they’re really leading the charge in innovating new assistive user experiences using models like GPT-4. So, for example, they’ve built in features which allow users to answer really detailed questions about a document they’ve scanned as well as get these beautifully detailed captions or descriptions of photos that they’ve taken. And you can really see the impact of these. For example, maybe when you’re visiting a museum, you can snap a picture and get these beautiful descriptions around the artworks that are … of the artworks that are around you. I’ll also call out the partnership which was recently announced or announced last year between Be My Eyes and OpenAI. So Be My Eyes is a video-calling app which connects blind users with sighted volunteers when they need help on a particular task. So, for example, they snap a picture of a packet of potatoes or a packet of tomatoes and then ask the sighted volunteer if they’re out of date, for example. And the promise with the OpenAI partnership is that perhaps some point in the future, these sighted volunteers may be replaced by a model like GPT-4 with vision, enabling pretty much instantaneous and fully automated assistance for blind users anywhere in the world. So I think that’s really exciting. And in fact, I—along with some other colleagues at Microsoft Research—worked very closely with OpenAI and teams across Microsoft to red team the GPT-4 with vision model and really ensure that it met Microsoft’s high bar before it was publicly released. And I think this is a really tangible demonstration of Microsoft’s commitment to delivering safe and responsible AI technologies to its customers.

O’NEILL: Thank you so much. So how do we, given these large populations who could really benefit, how do we go about building solutions for them that actually work?

GANU: So maybe I will take this. So given that we are working with really diverse populations, I think it’s extremely useful that we work with user-centered design or participatory design approach and collect the voices of the users and especially the marginalized communities and the underserved communities right from the start at the design time. It’s also important while we are dealing with this nascent or emerging technology that we do have the right safeguards while deploying the system and we are able to collect the feedback at every stage when we, kind of, deploy the systems, such as using the expert-in-the-loop kind of deployment, where the expert has the ability to verify as well as override the responses as and when required. So to give an example, this was one of the, kind of, conscious decisions when we started working with Shiksha copilot, to start with the teachers and not with the students first, where teacher is the expert in the loop, and we can extend the benefits of the technology to the students through teachers to start with and eventually, kind of, go to the students.

Also, while we are working and looking at various applications across population scale, as I mentioned earlier, in domains like agriculture, education, health care, and other domains, what we are seeing is that there are common problems or universal challenges which are repeated across all these particular domains. As Sunayana talked about earlier, multilingual interaction is a huge problem across all domains. The other important problem is that most of the knowledge base that is required for grounding or, kind of, generating these AI experiences on is non-digitally native and multi-modal. So how do we extract the information from these multi-modal, non-digitally native content is a challenge across these different domains. So what we are doing as part of our project, which is called Project VeLLM, which stands for “uniVersal Empowerment with Large Language Models,” is we are building this versatile platform, which you can think of as building blocks or tool set providing all these different functionalities which are common across these different, kind of, applications. And now the other developers do not have to start from scratch. They can use these building blocks and create their equitable AI experiences rapidly across different domains.

SITARAM: Generalizing a little bit from what Tanuja just said about expert in the loop, I think that, you know, one of the solutions that we’ve been using is to actually design with “human in the loop” in mind because we know that these technologies are not perfect. And so, you know, we really want to figure out ways in which humans and AI systems can work together in order to create the most effective outcome. And in our research, we’ve actually been doing this for evaluation of, you know, multilingual scenarios. So, for example, we know that, you know, large language models can do a good job of evaluation, but we also know that they don’t do a very good job on some languages and along some dimensions, right. So those languages and those dimensions should ideally be left to a human to do, whereas for the ones that we are very confident that the LLM is doing a good job, we can actually rely on it more with some human oversight in order to scale up the process of evaluation. So this idea of actually using humans and AI together and designing for this kind of hybrid system, I think, is really crucial. And, of course, we need to keep revisiting this design as these AI systems become more and more capable.

MASSICETI: Yeah, so many points I can agree with there and build on. I think what’s common with both Tanuja’sand Sunayana’s answers is really this need to, kind of, bring models and humans together. And I think one real limitation we’ve seen in our work across many of the models we’ve worked with is that they really often generate quite generic responses, you know. So if you prompt an LLM to write you an email, the tone and style don’t quite, sort of, quite feel like yours. And so I think as we look to this next decade of generative AI solutions, I really hope to see that we’re going to see more personalized AI models and solutions come through much more strongly, solutions where you as the user have much more control, much more agency, around how your model works for you. And I think that’s another example of how human users and the AI model need to come together in order to create something even more powerful. And I think this is going to be even more impactful for marginalized—even more important even—for marginalized communities, whose needs often differ a lot from, kind of, the average or the generic needs.

And to, kind of, just bring one concrete example to the table, our team has been building a personalizable object recognizer over the last year. So here, a blind user can pretty much teach the object recognizer their personal objects, things like their sunglasses, their partner’s sunglasses, maybe their favorite T-shirt. And they do this by taking short videos of these objects, and then the personalized recognizer can then help them locate these things at any point in the future. And so in this sense, the user is really given the agency. It’s really this example of a human-in-the-loop paradigm, where a user is given the agency to personalize their AI system to meet their exact needs. So, yeah, it’s really exciting. This feature has actually just been released in Seeing AI, and so we’re really keen to begin imagining how we might see more personalizable generative AI experiences for users in the near future.

O’NEILL: Yeah, I really love that idea. I think we would all benefit from more personalized AI, even when you’re just trying to craft an email or something like that. The challenge people often face is it doesn’t really sound like them.

MASSICETI: Exactly.

O’NEILL: And then if you have to edit it too much, then, you know, you reduce the benefit. So I think there’s so many areas right across the board where personalization could help. So finally, as we’re coming to a close, I really would love to finish by asking each of you what you think the biggest research questions that are still open are, what the biggest gaps are, and how you would advise the research community to go about solving them.

MASSICETI: Yeah, it’s a big, big question. I’ll maybe take a stab first. So I think a couple of us have already touched on this point before, but the data divide, I think, is really a big, big challenge. You know, the fact that data is widely available for some communities but then totally absent or very sparse for others. And I think this is one of the biggest hurdles we need to address as a research community in order to really move the needle on equitable AI because it’s impacting everything from the way that we can train models but also, as Sunayanasaid, to how we can evaluate these models, as well. But I want to, kind of, call out that even though we’ve identified the problem—we, kind of, know what the problem is; you know, we need to include data from these communities—I think there’s just so many open questions around how we actually do this well and how we actually do this right. And so I want to bring up two specific challenges or open questions that I feel are very prevalent.

The first is, what do equitable paradigms actually look like when we’re collecting data from or about a marginalized community? These communities, as we know, have often historically been exploited. And so we really need to find fair ways of not only involving these communities in these data collection efforts, but also compensating them for their efforts as these models are then trained on this data and then are deployed and used more broadly. But then the second open question, I think, is that we really need deep technical innovation in adapting models to new data. You know, we’ve obviously seen a lot of adaptation methods coming online—fine-tuning, LoRA—and they do really well at, kind of, adapting these models to new datasets and tasks. But what we’re seeing in our current experiments is that these approaches don’t work so well when that new data that’s coming in is very different from the pretraining dataset. So in one particular example, we gave Stable Diffusion 10 training images of a funky-looking cat statue, and it learned it really well, and it could generate actually really realistic images of this statue. But then when we did the same for a guide cane, Stable Diffusion just still cannot generate realistic-looking images of guide canes. And so I think we really need to build as a research community a deeper understanding around how we get models to learn new concepts or new things, even when they aren’t well represented in the pretraining datasets.

O’NEILL: Thanks so much, Daniela. Tanuja, is there anything you want to add?

GANU: So for me, it feels like we are just, kind of, beginning to scratch the surface, and there is a lot more work underway across the dimensions of culture, cost, human values, cognition, universal access, and many other dimensions. So while the journey is long and we are trying to solve some of these hard and important problems, it is important that we, kind of, continue to make progress systematically and iteratively and we continue to, kind of, collect feedback and critical feedback at each of these stages. We definitely need to do lot more work also looking at different types of models as in large language models for more complex tasks. But can we look at smaller language models, especially when we are looking at the infrastructural challenges, as I discussed earlier. How can we use combination of these models? How can we generate and collect data from different cultures and involve these communities to, kind of … because these are very implicit things and not documented, kind of, information about different cultures. So how do we, kind of, learn for those is also important question. And I think collaboration is the key here. It’s important that we involve the experts from multiple disciplines, user communities, researchers, and policymakers and accelerate the progress in the right direction. We are already doing some of these collaborations with academia and NGOs, with the programs like Microsoft Research AI & Society Fellows, and some of the existing collaborations with our community and partners in India and Africa. But I think we’ll just need to continue doing it more and continue making steady progress on this important problem.

SITARAM: I completely agree with what both Daniela and Tanuja said. And talking more about the language and culture aspect, I think we need to figure out a way to involve these local communities in the design and training as well as evaluation phases of model building. And we need to do this at scale if we really want to reach all languages, all cultures, etc., right. So I think that is the thing that we really need to figure out how to do. So there are a couple of projects that we’ve been working on that have attempted to do this. One of them is called DOSA, where we collected a dataset of cultural artifacts from different users in India. And this was meant to be a participatory design approach where people would tell us what cultural artifacts were really important to them, and then we would collect this data from the ground up and try to evaluate whether LLMs did a good job or not, right. That’s one example. The other project that we’ve been working on is called Pariksha, where we employ workers from this ethical data company called Karya to do evaluation of Indian language models. So here we’re really asking the users, who speak multiple languages, to tell us whether these models work for them or not. And so I feel like we need to figure out more ways in which we can involve these local communities but at scale so that we can really impact the model-building process and then so that we can actually make these models work well for everybody.

O’NEILL: I couldn’t agree with you more, Sunayana. I think involving user communities in technology design in general is one of the most important things that we can do, and this is even more so with underserved communities. I would just like to add something to that, though, which is that we really need multidisciplinary research that goes beyond anything that we’ve done before, involving researchers and practitioners and community members. And it’s important to remember that machine learning engineers and researchers on their own can’t solve the problem of building globally equitable generative AI. This is something that we really need to do in a large scale. We need to transcend disciplinary boundaries if we’re going to build technology that really works for everyone, everywhere. And on that note, I’d like to say thank you to the panelists. It’s been a great discussion and thank you to the audience.

MASSICETI: Thanks very much.

GANU: Thank you so much.

SITARAM: Thank you.

The post Panel Discussion: Generative AI for Global Impact: Challenges and Opportunities appeared first on Microsoft Research.

]]>
Keynote: Building Globally Equitable AI http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/keynote-building-globally-equitable-ai Tue, 04 Jun 2024 18:01:06 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1034994 Jacki O'Neill discusses the importance of creating globally equitable generative AI. She addresses the technical and sociotechnical challenges that must be tackled to positively transform work futures worldwide.

The post Keynote: Building Globally Equitable AI appeared first on Microsoft Research.

]]>
Presented by Jacki O’Neill at Microsoft Research Forum, June 2024

Jacki O'Neill

“It’s only by working together can we solve the challenges we face with generative AI at scale and, in doing so, capitalize on the opportunities these new technologies offer. … Now is the time to change the dialogue and change the direction of these new powerful technologies to ensure that they are globally equitable by design.”

Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi

Transcript: Keynote

Building Globally Equitable AI

Jacki O’Neill, Lab Director, Microsoft Research Africa, Nairobi

Jacki O’Neill discusses the importance of creating globally equitable generative AI. She addresses the technical and sociotechnical challenges that must be tackled to positively transform work futures worldwide.

Microsoft Research Forum, June 4, 2024

JACKI O’NEILL: Hi, I’m Jacki, and I head up Microsoft Research Africa, Nairobi. Welcome to the Microsoft Research Forum. I’m going to talk about the importance of building globally equitable generative AI.

Given its ability to generate and process human-like natural language, generative AI is set to transform the way we interact with technology. Like the graphical user interface before it, generative AI promises to make computing and AI more accessible to a wider range of people. This promise encompasses several features. Their natural language interfaces mean users can interact with these models conversationally. They can ask questions, give commands, and get tasks done. This can lead to a reduction of complexity across applications and devices as one can imagine navigating through and creating content using natural language without having to open different applications to find and extract information or even know which tool that content was created in. Given this, LLMs could reduce the burden of repetitive and nonessential tasks—from helping us craft email to summarizing documents and supporting report writing—giving us more time to focus on the work we love. Finally, multi-modal interactions with image, speech, and video processing and generation further enhance the transformational power of these tools, all of which could make both the power of AI specifically and that of computing more generally much more widely accessible, including to a mobile-first or mobile-only audience, thus reaching the billions of people who don’t work at desks.

As a result, generative AI is likely to transform the future of work across the globe in ways as yet unimagined and has sparked excitement about its potential impact on the Sustainable Development Goals. However, generative AI may not be equally useful for everyone, and its impact will not necessarily be evenly distributed globally, across regions, communities, or demographics, and as a consequence, there’s a risk of compounding existing systemic inequalities. For example, those that can most benefit from the promise of generative AI include populations in the Global South, who’ve been previously excluded due to both the traditional digital divides and the AI divides. The traditional digital divide has three levels, including access to digital technology, having the skills and knowledge required for their effective use, and the ability to translate use into desired outcomes. Generative AI then brings additional elements to the digital divide. The AI divide encompasses the socioeconomic conditions around who gets to create, deploy, and benefit from AI. The data divide refers to the consequences of not having good representation and equivalence in the training data and data-related processes, such as labeling and reinforcement learning. And the third divide is compute given the GPU requirements to build, train, and deploy these immense and immensely powerful models.

So how does generative AI impact these divides? Well, it reduces the traditional divide in some ways because natural language interfaces mean AI is more accessible to more people than ever before. For example, this latest generation of general-purpose off-the-shelf tools can and are being deployed to improve productivity by businesses around the globe, including many small businesses who were previously excluded from the AI revolution because they just didn’t have access to machine learning professionals in their companies. In terms of devices, many of the current AI tools can be used on smartphones, although they do require data. But there’s a plethora of specific feature-phone services which are being created in areas such as health and agriculture which don’t require the end user to have data. Whilst it’s too early to definitively talk about the ability to translate use into outcomes, research on small and medium businesses’ adoption of generative AI in Kenya and Nigeria suggests that it provides value for those who start using it in a number of use cases, such as writing emails in English.

The AI divides however remain, and there’s much work to be done to arrive at globally equitable generative AI. Today, I want to focus on the data divide, which stems from the fact that the vast majority of training data comes from the English-speaking Global North. This has an impact on the representations of both language and of knowledge in AI systems and consequently on their ability to process and produce appropriate output. But what does this mean in practice?

Let’s start with a look at language. Last year, research has shown that when compared to state-of-the-art non-autoregressive models, or SOTA models, on standard NLP tasks—natural language processing tasks and benchmarks—those SOTA models outperform large language models, including GPT-4. Large language models tended to work well on high-resource language families with Latin scripts but less well on low-resource languages with limited training data or non-Latin scripts. However, generative models introduced new challenges for NLP benchmarking, many of them due to prompt sensitivity. That is, even small changes in the prompt construction can impact performance, making consistent benchmarking difficult. For example, even asking the LLM to provide explanations for its output can change performance as does the choice of examples used in the prompt. Nonetheless, currently, African language performance on traditional metrics isn’t yet at a par with English performance. But this doesn’t tell the whole story.

In naturalistic interactions, GPT-4’s performance seems pretty amazing. For example, in a collaborative project with the University of Washington Global Health Department, we’ve been looking at building NLP, or natural language processing, tools to support medical facilitators. These facilitators manage peer support groups on WhatsApp for young people living with HIV in informal settlements in Nairobi. The data consists of chat messages in English, Swahili, and Sheng, a local dialect, and includes code-mixing, emojis, and “chat speak.” You can see an example of the data here. This message contains a mixture of English and Swahili code-mixing with its translation by human annotators. We found that even the best multilingual SOTA models performed so badly even after fine-tuning, that we stopped working on this project. Then, along came GPT-4, and suddenly, these tools seem possible again. What’s going on? Why are NLP benchmarks telling us one thing about African language performance and application-based practice telling us another?

Well, one part of the explanation is that previous models typically just couldn’t handle code-mixing, whereas generative models are much better equipped to handle natural language. Therefore, they’re not only able to handle naturally produced code-mixed language, but they can also handle chat speak with its abbreviations, colloquialisms, emojis, and so on. Now, we found that whilst both GPT-4 and LLaMA showed impressive results in sentiment analysis on this dataset, GPT-4 appears to use more of the whole context of the sentence to produce slightly more robust predictions. Returning to our example, if we assume some correlation between explanations and prediction, we can see that GPT-4 gave more nuanced predictions, whereas LLaMA did not seem to pick up on the more positive, although conditional, sentiment in the second part of the sentence.

Despite these impressive advances, there’s still plenty of work to be done. There are still many under-resourced African languages where performance lags far behind. For example, the models make more mistakes on Sheng, which is not included in the training data, and speech models lag behind, often failing at the code-mixing hurdle. This is important because voice interfaces are likely to be essential to enabling even broader access to AI. So this is an area of continued research for African and other under-resourced languages. But language is not the only concern. Whilst language is most researched, the widespread deployment globally of the latest generation of foundation models reveals another equally pressing problem. Models hallucinate, fail, or reproduce stereotypes in African and other Global South contexts.

On the positive side, we’ve seen adoption of text and text-to-image generation tools, generative AI search, AI-augmented design tools, speech generation tools by small businesses in Nigeria and Kenya from a range of sectors, including law, design, outdoor recreation, and retail. These businesses successfully use generative AI to support communication, especially being professional and polite and to the point in emails. A common example that we saw across sectors is illustrated here: how do I tell my client he’s four months late now to pay his fees, and I don’t want to sound rude? And we saw this across pretty much all of the small businesses where they needed customers to pay. They also used AI to support creative work, such as content creation and ideation and so on. They described how it helped save time. For example, it reduced the time for ideation. As an architectural designer said, “We would manually, kind of, like, bounce ideas off each other. … Arriving at 10 strong ideas would take two or three sessions, whereas now we get the same results in one.” Even the lawyers who charged by the hour wanted to reduce their mundane work so they could spend more time on creative work. They would have liked to deploy generative AI to reduce the time spent on small-case document review. As a senior lawyer said, “We could have spent that 15 hours on important things. Once the machine, the AI, had given us the report, we’d be thinking creatively now.” So far so good.

Problems often arise, though, when they want to use generative AI in work involving the African context, which—as SMBs in Africa—is quite often. Whilst generative AI can sometimes help to navigate cultural and contextual boundaries, it’s more likely to hallucinate when the proportion of training data is low—i.e., in most African context—and a whole host of problems starts arising, from accent recognition in meeting transcription to speech production. For example, the CEO of an IT company used voice cloning for training videos but found it gave her a British accent. And as she said, it “takes away from my originality, which is I’m not British; I’m Kenyan.” Or the poor context and consistency we’ve seen in image production systems creating unusable and sometimes stereotypical images of Africans and African landscape, not to mention the tendency to produce answers which neglect or misrepresent African people, history, culture, and knowledge. And even where information about Africa is generated, it often portrays the Western perspective. This was perhaps most clearly encapsulated by one of the lawyers, who explained, “Even if you put into the particular AI that you’re asking from a Kenyan perspective—while in Kenya, does this law apply?—they’ll reference the people in the US, which is insane because we have Kenyan authors; we’ve done the actual work.” Overall then, it can leave the feeling that generative AI is really Americanized.

This regional bias goes way beyond demographic biases like race, although it would be compounded by them. Whole continents and their knowledge are severely underrepresented, and this comes through clearly in use, both in usability and use cases that are directly impacted by this. Indeed, AI has a language problem, but just as importantly, it has a knowledge problem, and this is likely to compound existing systemic inequalities. But we’re at the very early stage of generative AI and the impacts it will have on work. This is a fast-moving field, and there’s an immense opportunity to take control of the agenda and build truly globally equitable AI systems. This requires ensuring that diverse contexts and applications, with their diverse datasets, drive the development of generative AI. We need to be intentional and embrace these approaches. Machine learning methods like fine-tuning and retrieval-augmented generation, or RAG, are unlikely to work well if we don’t design and build for these diverse contexts from the beginning.

This is not something that any one group or company, nor any one discipline, can or should do on their own. This needs to be a collaborative effort incorporating different voices, different perspectives, and different disciplines working more closely together than ever before. And so just before I finish, I want to highlight one initiative that’s attempting to address some of the concerns raised: the African Health Stories Project.

This is a multi-institution, multi-country, multidisciplinary project. Microsoft Research is working with public health researchers at Stellenbosch University, human-computer interaction researchers at the University of Swansea, and machine learning and data science researchers at the University of Pretoria to create culturally appropriate and sensitive stories supporting good health behaviors. We will use generative AI to create interactive visual, oral, and text stories which enable patients to better understand how to apply health advice to their local circumstances. Together, as a multidisciplinary team, we will use this specific real-world application area to probe, evaluate, and extend the ability of generative AI to create situated and culturally appropriate content at the same time as addressing a real health need. Because it’s only by working together can we solve the challenges we face with generative AI at scale and, in doing so, capitalize on the opportunities these new technologies offer.

We have plenty of work to do, but now is the time to change the dialogue and change the direction of these new powerful technologies to ensure that they are globally equitable by design. Thank you.

The post Keynote: Building Globally Equitable AI appeared first on Microsoft Research.

]]>
Podcast: Evaluating LLMs using novel approaches. With Dr. Sunayana Sitaram http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/podcast-evaluating-llms-with-novel-approaches-with-dr-sunayana-sitaram Mon, 20 May 2024 18:34:40 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1038582 Episode 015 | May 21, 2024 [Music] Sunayana Sitaram: Our ultimate goal is to build evaluation systems and also other kinds of systems in general where humans and LLMs can work together. We’re really trying to get humans to do the evaluation, get LLM’s to do the evaluation, use the human data in order to […]

The post Podcast: Evaluating LLMs using novel approaches. With Dr. Sunayana Sitaram appeared first on Microsoft Research.

]]>
Episode 015 | May 21, 2024

Black and white image of Sunayana Sitaram

[Music]

Sunayana Sitaram: Our ultimate goal is to build evaluation systems and also other kinds of systems in general where humans and LLMs can work together. We’re really trying to get humans to do the evaluation, get LLM’s to do the evaluation, use the human data in order to improve the LLM. And then just this continues in a cycle. And the ultimate goal is, send the things to the LLM that it’s good at doing and send the rest of the things that the LLM can’t do to humans who are like the ultimate authority on the evaluation.

Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham.

[Music]

Sridhar Vedantham: LLM’s are perhaps the hottest topic of discussion in the tech world today. And they’re being deployed across domains, geographies, industries and applications. I have an extremely interesting conversation with Sunayana Sitaram, principal researcher at Microsoft Research about LLMs, where they work really well and also challenges that arise when trying to build models with languages that may be under resourced. We also talk about the critical work she and her team are doing in creating state-of-the-art methods to evaluate the performance of LLMs, including those LLMs that are based on Indic languages.

Related

[Music]

Sridhar Vedantham: Sunayana, welcome to the podcast.

Sunayana Sitaram: Thank you.

Sridhar Vedantham: And I’m very excited to have you here because we get to talk about a subject that seems to be top of mind for everybody right now. Which is obviously LLMs.  And what excites me even more is I think, we’re going to be talking about LLMs in a way that’s slightly different from what the common discourse is today, right?

Sunayana Sitaram: That’s right.

Sridhar Vedantham: OK. So before we jump into it, why don’t you give us a little bit of background about yourself and how you came to be at MSR?

Sunayana Sitaram: Sure. So it’s been eight years now since I came to MSR. I came here as a postdoc after finishing my PhD at Carnegie Mellon. And so yeah, it’s been around 15 years now for me in the field, and it’s been super exciting, especially the last few years.

Sridhar Vedantham: So, I’m guessing that these eight years have been interesting, otherwise we won’t be having this conversation. What areas of research, I mean, have you changed course over the years and how is that progressed?

Sunayana Sitaram: Yeah, actually, I’ve been working pretty much on the same thing for the last 15 years or so. So I’ll describe how I got started. When I was an undergrad, I actually met the principal of a blind children’s school who himself was visually impaired. And he was talking about some of the technologies that he uses in order to be independent. And one of those was using optical character recognition and text to speech in order to take documents or letters that people sent him and have them read out without having to depend on somebody. And he was in Ahmedabad, which is where I grew up. And his native language was Gujarati.  And he was not able to do this for that language. Whereas for English, the tools that he required to be independent were available. And so, he told me like it would be really great if somebody could actually build this kind of system in Gujarati. And that is when it sort of it was like a, you know, aha moment for me. And I decided to take that up as my undergrad project. And ever since then, I’ve been trying to work on technologies trying to bridge that gap between English and other languages- under resourced languages. And so, since then, I’ve worked on very related areas. So, my PhD thesis was on text to speech systems for low resource languages. And after I came to MSR I started working on what is called code switching, which is a very common thing that multilinguals all over the world do. So they use multiple languages in the same conversation or sometimes even in the same sentence. And so you know, this was a project called Project Melange that was started here and that really pioneered the code switching work in the research community in NLP. And after that it’s been about LLMs and evaluation but again from a multilingual under resource languages standpoint.

Sridhar Vedantham: Right. So I have been here for quite a while at MSR myself and one thing that I always heard is that there is this in general, a wide gulf in terms of the resources available for a certain set of languages to do say NLP type work. And the other languages is just the tail, it’s a long tail, but the tail just falls off dramatically. So, I wanted you to answer me in a couple of ways. One is, what is the impact that this generally has in the field of NLP itself and in the field of research into language technologies, and what’s the resultant impact on LLMs?

Sunayana Sitaram: Yeah, that’s a great question. So, you know the paradigm has shifted a little bit after LLM’s have come into existence. Before this, so this was around say a few years ago, the paradigm would be that you would need what is called unlabeled data. So, that is raw text that you can find on the web, say Wikipedia or something like that, as well as labeled data. So, this is something that a human being has actually sat and labeled for some characteristic of that text, right? So these are the two different kinds of texts that you need if you want to build a text based language model for a particular language. And so there were languages where, you know, you would find quite a lot of data on the web because it was available in the form of documents or social media, etc. for certain languages. But nobody had actually created the labeled resources for those languages, right? So that was the situation a few years ago. And you know the paradigm at that time was to use both these kinds of data in order to build these models, and our lab actually wrote quite a well-regarded paper called, ‘The State and Fate of Linguistic Diversity and Inclusion’, where they grouped different languages into different classes based on how much data they had labeled, as well as unlabeled.

Sridhar Vedantham: Right.

Sunayana Sitaram: And it was very clear from that work that, you know only around 7 or 8 languages of the world actually can be considered to be high resource languages which have this kind of data. And most of the languages of the world spoken by millions and millions of speakers don’t have these resources. Now with LLMs, the paradigm changed slightly, so there was much less reliance on this labeled data and much more on the vast amount of unlabeled data that exists, say, on the web. And so, you know, we were wondering what would happen with the advent of LLMs now to all of the languages of the world, which ones would be well represented, which ones wouldn’t etc. And so that led us to do, you know, the work that we’ve been doing over the last couple of years. But the story is similar, that even on the web some of these languages dominate and so many of these models have, you know, quite a lot of data from only a small number of languages, while the other languages don’t have much representation.

Sridhar Vedantham: OK. So, in real terms, in this world of LLMs that we live in today, what kind of impact are we looking at? I mean, when you’re talking about inequities and LLMs and in this particular field, what’s the kind of impact that we’re seeing across society?

Sunayana Sitaram: Sure. So when it comes to LLMs and language coverage, what we found from our research is that there are a few languages that LLMs perform really well on. Those languages tend to be high resource languages for which there is a lot of data on the web and they also tend to be languages that are written in the Latin script because of the way the LLMs are designed currently with the tokenization. And the other languages, unfortunately there is a large gap between the performance in English and other languages, and we also see that a lot of capabilities that we see in LLMs in English don’t always hold in other languages. So a lot of capabilities, like really good reasoning skills, etc, may only be present in English and a few other languages, and they may not be seen in other languages. And this is also true when you go to smaller models that you see that their language capabilities fall off quite drastically compared to the really large models that we have, like the GPT 4 kind of models. So when it comes to real world impact of this, you know, if you’re trying to actually integrate one of these language models into an application and you’re trying to use it in a particular language, chances are that you may not get as good performance in many languages compared to English. And this is especially true if you’re already used to using these systems in English and you want to use them in a second language. You expect them to have certain capabilities which you’ve seen in English, and then when you use them in another language, you may not find the same capabilities. So in that sense, I think there’s a lot of catching up to do for many languages. And the other issue also is that we don’t even know how well these systems perform for most languages of the world because we’ve only been able to evaluate them on around 50 to 60 or maybe 100 languages. So for the rest of the 6000ish languages of the world, many of which don’t even have a written form, most of which are not there on the web. We don’t even know whether these language models are, you know, able to do anything in them at all. So I think that is another, you know, big problem that is there currently.

Sridhar Vedantham: So, if you want to change the situation where we say that you know even if you’re a speaker of a language that might be small, maybe say only two million speakers as opposed to a larger language that might have 100 million or 200 million speakers. How do we even go about addressing inequities like that because at some level it just seems unfair, that for no fault of their own, you know, large sections of population could be excluded from the benefits of LLM’s, right? Because there could be any number of languages in which the number of speakers might be, say, 1,000,000 or 100,000.

Sunayana Sitaram: Right. I think that’s a very hard question. How to actually involve language communities into our efforts, but do that at scale, so that we can actually involve all language communities, all cultures, etc. into the whole building process. So we’ve had some success with doing this with some language communities. So there is a project called ELLORA in MSR India that you know Kalika leads where you know they work with specific language communities, try to understand what the language communities actually need and then try to co-build those resources with them. And so you know in that sense, you know, working directly with these language communities, especially those that have a desire to build this technology and can contribute to some of the data aspects, etc. That’s definitely one way of doing things. We’ve also done some work recently where we’ve engaged many people in India in trying to contribute resources in terms of cultural artifacts and also evaluation. And so you know, we’re trying to do that with the community itself, with the language community that is underrepresented in these LLMs, but doing that at scale is the challenge to try and really bring everyone together. Another way of course, is just raising awareness about the fact that this issue exists, and so I think our work over the last couple of years has really, you know, moved the needle on that. So we’ve done the most comprehensive multilingual evaluation effort that exists both within the large models as well as across different sized models which we call Mega and Megaverse respectively.

Sridhar Vedantham: So if I can just interrupt here, what I’d like is if you could, you know, spend a couple of minutes maybe talking about what evaluating an LLM actually means and how do you go about that?

Sunayana Sitaram: Sure. So when we talk about evaluating LLM’s right, there are multiple capabilities that we expect LLMs to possess. And so, our evaluation should ideally try to test for all of those different kinds of capabilities. So, this could be the ability to reason, this could be the ability to produce output that actually sounds natural to a native speaker of the language. It could be completing some particular task, it could be not hallucinating or not making up things. And also of course, responsible AI, you know, metrics. So things like, you know, being safe and fair, no bias, etc. Only if all of those things work in a particular language, can you say that, that LLM actually works for that language. And so there are several dimensions that we need to consider when we are evaluating these LLMs.

[Music]

Sridhar Vedantham: Before I interrupted you, you were talking about certain projects that you were working on, which are to do with evaluating LLMs, right? I think there’s something called Mega there’s something called Megaverse. Could you tell us a little bit about those and what exactly they do?

Sunayana Sitaram: Sure. So Mega project we started when ChatGPT came out basically. And the question that we were trying to answer was how well these kinds of LLMs perform on languages of the world. So with Mega what we did was, we took already existing open source benchmarks that tested for different kinds of capabilities. So some of them were question-answering benchmarks. Some of them were testing for whether it can summarize text properly or not. Some of them were testing for other capabilities like reasoning etc. And we tested a bunch of models across all of these benchmarks and we covered something like 80 different languages across all these benchmarks. And our aim with Mega was to figure out what the gap was between English and other languages for all of these different tasks, but also what the gap was between the older models, so the models pre LLM and LLMs. Whether we’ve become better or worse in terms of linguistic diversity and performance on different languages in the new era of LLMs or not. And that was the aim with Mega.

Sridhar Vedantham: Sorry, but what was the result of it? Have we become better or not?

Sunayana Sitaram: Yeah. So we have mixed results. So for certain languages, we are doing quite well, but for some languages, unfortunately the larger models don’t do as well as some of the older models used to do. And the older models used to be specialized and trained on labeled data as I said in the beginning, right? And that would help them also be better at all the languages under consideration, whereas with the LLMs we were not really using labeled data in a particular language to train them. And so, we found that, you know, in some cases, the performance of English had shot up drastically and so the gap between English and other languages had also increased in the LLM’s case.

Sridhar Vedantham: Ok. So, the performance of English, I mean the LLMs, that’s much better than what there was earlier, but the other languages didn’t manage to show the same performance increase.

Sunayana Sitaram: That’s right. They didn’t always show. Some of them did, some of the higher resource languages written in the Latin script, for example, did perform quite well, but some of the others did not.

Sridhar Vedantham: OK. And after Mega, then what happened?

Sunayana Sitaram: Yes. So with Mega, we were primarily trying to evaluate the GPT family of models with the older generation of models as I mentioned. But then we realized by the time we finished the work on Mega, there was a plethora of models that came out. So there’s Llama and, you know other models by competitors as well as smaller models, the SLMs, you know, like the Llama sized models, Mistral etc, right. So, there were all of these different models. And then we wanted to see across different models, especially when you’re trying to compare larger models with smaller models, how do these trends look? And that is what we call Megaverse, where we do all of these different evaluations, but not just for the GPT family, but across different models. And what we found in Megaverse were the trends were similar that there were some languages that were doing well, some of the other lower resource languages, especially the ones written in other scripts, were not doing so well. So, for example, the Indian languages were not doing very well across the board. But we also found that the larger frontier models, like the GPT model, they were doing much better than the smaller models for multilingual. And this is again something that you know was shown for the first time in our work that there is this additional gap when you have this large model and small model and there are important practical implications of this. So, say you’re trying to integrate the small model into your workflow as a startup or something like that in a particular language then because it is cheaper it is much more cost efficient, etc, you may not get the same performance in non-English languages as you would get with the larger model right? So that has an impact in the real world.

Sridhar Vedantham: Interesting. And how do you draw this line between what constitutes a large language model and what constitutes a small language model? And I’m also increasingly hearing of this thing called a tiny language model.

Sunayana Sitaram: That’s right. Yeah. So the large language models are the GPTs, the Geminis, you know, those kinds of models. Everything else, we just club as a smaller language model. We don’t really draw a line there. I haven’t actually seen any tiny models that do very well on multilingual. They’re tiny, because they are, you know, trained on a smaller set of data, they have fewer parameters and typically we haven’t seen too many multilingual tiny models so we haven’t really evaluated those. Although there is a new class of models that have started coming up, which are language specific models. So, for example a lot of the Indic model developers have started building specialized model for one language or a small family of languages.

Sridhar Vedantham: OK, so going back to something you said earlier, how do these you know kind of models that people are building for specific Indian languages actually work or perform, given that, I think we established quite early in this podcast that, these are languages that are highly under resourced in terms of data to build models.

Sunayana Sitaram: That’s right. So I think it’s, it’s not just a problem of them being under resourced, it’s also that the proportion of data in the model for a particular language that is not English, say Hindi or Malayalam or Kannada, is very tiny compared to English. And so there are ways to actually change this by doing things with the model after it has been trained. So this is called fine tuning. So what you could do is you could take, say, an open source model which is like a medium sized or a small model and then you could fine tune it or specialize it with data in a particular language, and that actually makes it work much better for that particular language because the distributions shift towards the language that you’re targeting. And so, it’s not just, you know, about the amount of data, but also the proportion of data and how the model has been trained in these giant models that cover hundreds of languages in a single model versus, you know, having a model that is specialized to just one language which makes it do much better. So these Indic models we have found actually do better than the open source models that they were built on top of, because now they have been specialized to a particular language.

Sridhar Vedantham: Ok. I know that your work focuses primarily on the evaluation of LLM’s, right? There must be a lot of other people who are also doing similar work in terms of evaluating performance of LLM on different parameters. How do you differentiate your work from what others are doing?

Sunayana Sitaram: Yeah, that’s a great question. So we’ve been doing evaluation work pre LLM actually. We started this a few years ago. And so we’ve actually done several evaluation projects. The previous one was called LITMUS where we were trying to evaluate without even having a benchmark in a particular language, right? And so we’ve built up a lot of expertise in how to do evaluation, and this has actually become a very hard problem in the LLM world because it’s becoming increasingly difficult to figure out what the strengths and weaknesses are of these LLMs because of how they’re built and how they behave, right. And so I think we bring in so much rich evaluation expertise that we’ve been able to do these kinds of, you know, Mega evaluations in a very systematic way where we’ve taken care of all of the we’ve taken care of all of the hanging loose threads that otherwise others don’t take care of. And that is why we managed to do these comprehensive giant exercises of Mega and Megaverse and also got these clear trends from them. So in that sense I would say that our evaluation research is very mature and we’ve been spending a lot of time thinking about how to evaluate which is unique in our group.

Sridhar Vedantham: OK, so one thing I’ve been curious about for a while is there seems to be a lot of cultural and social bias that creeps into these models, right? How does one even try to address these issues?

Sunayana Sitaram: That’s right. So, I think over the last few months, building culture, specific language models or even evaluating whether language models are appropriate for a particular culture, etc, that has become a really hot topic. Because people have started seeing that, you know,   most of these language models are a little tilted towards western protestant and, rich, industrialized kind of worldviews and the values that they encode may not be appropriate for all cultures. And so there have been some techniques that we’ve been working on in order to again shift the balance back into other target cultures that we want to fine tune the model for, so again, you know, you could take data that has characteristics of a particular culture, values of a particular culture, and then do some sort of fine tuning on a model in order to shift  its distribution more towards a target culture. There are techniques that are coming to be for these kinds of culture specific language models. However, I still think that we are far away from a comprehensive solution, because even defining what culture is and what constitutes, you know, say an Indic culture LLM, I think that’s a really hard problem.  Because culture is complex and there are so many factors that go into determining what culture is, and also it’s deeply personal. So, each individual has their own mix of factors that determine their own culture, right? So, generalizing that to an entire population is also quite hard, I think to do. So, I think we’re still in the very initial stages in terms of actually figuring out how well aligned these models are to different cultures and also trying to sort of align them to any specific target cultures. But it is a hot topic that a lot of people are currently working on.

Sridhar Vedantham: Yeah. You know, while you’re talking and giving me this answer, I was thinking that if you’re going to go culture by culture, first of all, you know, what is the culture, what are you doing about subcultures and how many cultures are there in the world, so I was just wondering how it’s going to even work in the long term? But I guess you answered the question by saying it’s just starting. Now let’s see how it goes.

Sunayana Sitaram: Absolutely.

Sridhar Vedantham:  It’s a very, very open canvas right now, I guess.

Sunayana Sitaram: Yeah.

Sridhar Vedantham: Sunayana, you know you’ve been speaking a lot about evaluation and so on and so forth and especially in the context of local languages and smaller languages and Indic languages and so on. Are these methods of evaluation that you talk about, are they applicable to different language groups and languages spoken in different geographies too?

Sunayana Sitaram: Absolutely. So in the Mega and Megaverse work, we covered 80 languages and many of them were not Indic languages. In fact, in the Megaverse work, we included a whole bunch of African languages as well. So the techniques, you know, would be applicable to all languages for which we have data for which data exists on the web. Where it is challenging is the languages that are only spoken, that are not written, and languages for which there is absolutely no data or representation available, on the web, for example. So, unfortunately, there aren’t benchmarks available for those languages, and so we would need to look at other techniques. But other than that, our evaluation techniques are for, you know,  all languages, all non-English languages.

[Music]

Sridhar Vedantham: There is something that I heard recently from you which again I found extremely interesting. It’s a project called Pariksha, which I know, in Hindi and derived from Sanskrit, basically means test or exam. And I remember this project because I’m very scared of tests and exams, and I’ve always been from school. But what is this?

Sunayana Sitaram: Yes, Pariksha is actually quite a new project. It’s under the project VeLLM that is on universal empowerment with Large Language Models and Pariksha is something that we are super excited about because it’s a collaboration with Karya, which is an ethical data company that was spun off from MSR India. So what we realized a few months ago is that you know there is just so much happening in the Indic LLM space and there are so many people building specialized models either for a single language or for a group of languages like Dravidian languages, for example. And of course, there are also the GPTs of the world which do support Indian languages as well, right. So now at last count, there are something like 30 different Indic LLMs available today. And if you’re a model builder, how do you know whether your Indic LLM is good or better than all of the other LLMs? If you’re somebody who wants to use these models, how do you know which ones to pick for your application? And if you’re a researcher, you know, how do you know what the big challenges are that still remain, right?

Sridhar Vedantham: Right

Sunayana Sitaram: And so to address this, of course, you know, one way to do this is to do evaluation, right. And try to figure out, you know, compare all these models on some standard benchmarks and then try to figure out which ones are the best. However, what we found from our work with Mega and Megaverse is that the Indian language benchmarks unfortunately are usually translations of already existing English benchmarks and also many of them are already present in training data of these large models, which means that we can’t use the already existing benchmarks to get a very good idea about whether these Indic LLMs are culturally appropriate, whether they capture linguistic nuances in the Indian languages or not, right. So we decided to sort of reinvent evaluation for these Indic LLMs and that’s where Pariksha came in. But then how do we scale if we want to actually, you know, get this kind of evaluation done and we were looking at human evaluation to do this, right. And so, we thought of partnering with Karya on this. Because Karya has reached in all the states in India and they have, you know, all of these workers who can actually do this kind of evaluation for different Indian languages. And so, what Pariksha is, it’s a combination of human evaluation as well as automated evaluation. And with this combination we can scale and we can do thousands and thousands of evaluations, which we have already done actually on all of these different models. And so this is the first time actually that all of the different Indic LLMs that are available are being compared to each other in a fair way. And we are able to come up with a leaderboard now of all of the Indic models for each Indic language that we are looking at. So that’s what Pariksha is. It’s quite a new project and we’ve already done thousands of evaluations and we are continuing to scale this up even further.

Sridhar Vedantham: So how does someone, you know, if I have a LLM of my own in any given Indic language, how do I sign up for Pariksha, or how do I get myself to be evaluated against the others?

Sunayana Sitaram: Yeah. So you can contact any of us for that, the Pariksha team. And we will basically include this model, the new model into the next round of evaluation. So what we do with Pariksha is we do several rounds. So we’ve already finished a pilot round and we’re currently doing the first round of evaluations. So we would include the new model in the next round of evaluations. And you know, as long as it’s an open source model or there is an API access available for that model, we can evaluate the model for you. We are also planning to release all the artifacts from Pariksha, including all the evaluation prompts. So even if it is a closed model, you can use these to do your own evaluation as well later to figure out how you compare with the other models on the leaderboard.

Sridhar Vedantham: Right. Quick question. When you say that you’re working with Karya, and you also say that you’re looking at human evaluation along with the regular methods of evaluation. Why do you need human evaluation at all in these situations? It’s just simpler to throw everything into a machine and let it do the work?

Sunayana Sitaram: Yeah, that’s a great question. So we did some work on, you know, making machines evaluators. So basically asking GPT itself to be the evaluator and it does a very good job at that. However, it has some blind spots. So we found that GPT is not a very good evaluator in languages other than English. Basically, it’s not a good evaluator in the languages that it doesn’t do well in otherwise and so using only automated techniques to do evaluation may actually give you the wrong picture. It may give you the wrong sort of trends, right? And so we need to be very careful. And so our ultimate goal is to build evaluation systems and also other kinds of systems in general where humans and LLMs can work together. And so the human evaluation part is to have checks and balances on the LLM evaluation part. Initially, what we are doing is we’re getting the same things evaluated by the human, and the LLM is doing the exact same evaluation. So we have a point by point comparison of what the humans are saying and what the LLM is saying so that we can really see where the LLM goes wrong, right. Where it doesn’t agree with humans. And then we use all of this information to improve the LLM evaluator itself. So we’re really trying to get humans to, you know, do the evaluation, get LLM’s to do the evaluation, use the human data in order to improve the LLM. And then just this continues in a cycle. And the ultimate goal is, send the things to the LLM that it’s good at doing and send the rest of the things that the LLM can’t do to humans who are like the ultimate authority on the evaluation. So it’s like this hybrid system that we are designing with Pariksha.

Sridhar Vedantham: Interesting. OK, so I know we are kind of running out of time. My last question to you would be, where do you see evaluation of LLM’s and where do you see your work going or progressing in the near future?

Sunayana Sitaram: So evaluation for me is a path to understanding what these systems can do and cannot do and then improving them, right. So our evaluations are always actionable. So we try to figure out why something is not working well. So even in the Mega paper, we had lots of analysis about what factors may lead to, you know, lower performance in certain languages, etc. So I see all of this as providing a lot of rich information to model developers in order to figure out what the next steps should be, how they should be designing the next generation of models and I think that has already happened. It’s, you know, systems have already improved from the time we started working on Mega and a lot of the issues that we pointed out in Mega, like tokenization etcetera, now they are well known in the field and people are actually taking steps in order to make those better in these language specific models etc. So I see the work as being, you know, first of all raising awareness about the problems that exist, but also providing actionable insights on how we could improve things. And with Pariksha also the idea is to release all the artifacts from our evaluation so that Indic model developers can use those in order to improve their systems. And so I see that you know better evaluation will lead to better quality models. That’s the aim of the work.

Sridhar Vedantham: Sunaya, thank you so much for your time. I really had a lot of fun during this conversation.

Sunayana Sitaram: Same here. Thank you so much.

Sridhar Vedantham: Thank you.

[Music ends]

The post Podcast: Evaluating LLMs using novel approaches. With Dr. Sunayana Sitaram appeared first on Microsoft Research.

]]>
Guarding human health: AI empowers innovative applications in healthcare http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/guarding-human-health-ai-empowers-innovative-applications-in-healthcare Thu, 16 May 2024 08:52:00 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1034823 “If life is a marathon, then health is the key to its duration.” Health is not only the foundation of happiness and societal progress but also a pivotal aspect of the intelligent era. AI’s integration into healthcare represents a transformative tool for maintaining and enhancing human well-being. From aiding early disease detection and progression prediction […]

The post Guarding human health: AI empowers innovative applications in healthcare appeared first on Microsoft Research.

]]>
“If life is a marathon, then health is the key to its duration.” Health is not only the foundation of happiness and societal progress but also a pivotal aspect of the intelligent era. AI’s integration into healthcare represents a transformative tool for maintaining and enhancing human well-being. From aiding early disease detection and progression prediction to personalizing precision medicine and accelerating medical research and drug development, AI’s unique value and potential are increasingly evident.

Over recent years, Microsoft Research Asia has deepened its collaboration with medical institutions and academic experts, attracting professionals in healthcare to foster AI’s profound application in medical health, thereby contributing to a healthier global community.

Early detection, early treatment: AI in disease detection and rehabilitation

The early diagnosis of diseases is vital for enhancing treatment outcomes and patient quality of life. Rehabilitation training, a critical component of many treatment regimens, plays a significant role in restoring patient functions. Traditional methods face limitations due to resource distribution, geographic constraints, and a shortage of medical professionals, affecting the accessibility and efficiency of healthcare services. AI can support medical staff by providing automated, intelligent early disease detection, enabling timely intervention and treatment.

AI-powered voice recognition for speech rehabilitation in cleft palate patients

Cleft palate and cleft lip, prevalent congenital deformities in the oral and maxillofacial region, often result in hyper nasal speech due to velopharyngeal insufficiency. Microsoft Research Asia, in collaboration with medical institutions, recognizes hypernasality detection as crucial for treating cleft palate patients.

Source: Operation Smile
Source: Operation Smile

Traditionally, speech-language pathologists assess hypernasality, but their limited availability and concentration in certain hospitals necessitate extensive, costly cross-regional patient travel. An automated hypernasality assessment method would not only aid pathologists in making accurate evaluations but also facilitate remote patient diagnosis and treatment, significantly reducing costs.

Leveraging transfer learning technology, Microsoft Research Asia has developed an innovative approach using an automatic speech recognition (ASR) model to enhance hypernasality assessment. This innovative model excels in extracting acoustic features and demonstrates robust generalization capabilities. Comparative studies on two cleft palate datasets reveal that it surpasses existing methods, significantly enhancing the precision of pathologists’ diagnostic processes.

Following hypernasality evaluations, physicians devise tailored speech therapy regimens for patients. Microsoft Research Asia has advanced this process by developing the Masked Pre-training Pronunciation Assessment (MPA) model. This model supports end-to-end training and adapts to both unsupervised and supervised learning environment, enabling user-friendly remote deployment. Utilizing reference texts and integrating masking with prediction tactics, the MPA model adeptly circumvents issues of misalignment or misrecognition in pronunciation assessments, offering more precise speech correction support for individuals with cleft palate.

Microsoft Research Asia is actively collaborating with healthcare providers to assess the feasibility of deploying this cutting-edge speech assessment technology. The goal is to enhance the efficiency of medical diagnoses and treatments, lower the financial burden on patients, and extend the benefits of this technology to numerous cleft palate sufferers in isolated regions.

Related papers:

Voice analysis model enhances Alzheimer’s disease screening

Alzheimer’s disease, a prevalent neurodegenerative condition primarily affecting the elderly, leads to progressive cognitive decline, including memory loss, language difficulties, and impaired reasoning. While there’s no cure for Alzheimer’s, early detection and intervention are key to decelerating its progression.

pexels.com
Source: pexels.com

Traditional diagnostic methods, such as brain scans, blood tests, and cognitive assessments, are extensive and expensive. However, research indicates that early Alzheimer’s can be detected through speech analysis, identifying symptoms like fluent aphasia and word retrieval challenges.

Capitalizing on this insight, Microsoft Research Asia has pioneered speech and language analysis technologies to detect Alzheimer’s indicators from sophisticated acoustic and linguistic data. A novel task-oriented model has also been introduced, correlating language descriptions with cognitive tasks.

In the ADReSS dataset’s subtask involving “Cookie Theft” picture descriptions (Figure 1), these methods attained a 91.4% accuracy rate. This innovative approach, merging speech and semantic analysis, significantly increases disease detection accuracy. The model’s high efficiency and performance on new test sets offer promising prospects for Alzheimer’s screening at scale.

Figure 1: “Cookie Theft” used for the descriptive task of detecting Alzheimer’s disease, presented by DementiaBank Pitt Corpus, Becker et al., in 1994.
Figure 1: “Cookie Theft” used for the descriptive task of detecting Alzheimer’s disease, presented by DementiaBank Pitt Corpus, Becker et al., in 1994.

Related paper:

Advancing autism diagnosis: Unsupervised detection of stereotypical behaviors

Autism Spectrum Disorder (ASD) typically manifests in early childhood, presenting challenges in social interaction and communication, accompanied by repetitive behaviors. These behaviors, which may include actions like persistent hand-flapping or head-banging, serve as vital indicators for ASD diagnosis. Early detection and intervention are crucial for improving outcomes, yet traditional methods relying on prolonged observation by specialists are not always efficient. Hence, the development of a swift, automated detection system is invaluable.

Source: unsplash.com
Source: unsplash.com

Traditional approaches have utilized computer vision and supervised learning to analyze video data of individuals with ASD. However, these methods face limitations due to the diverse range of stereotypical behaviors and privacy concerns associated with video data collection.

Addressing these challenges, Microsoft Research Asia, in collaboration with medical institutions, has innovated an unsupervised approach using video anomaly recognition. The new Dual-Stream Stereotypical Behavior Detector (DS-SBD) model leverages the temporal dynamics of human posture and repetitive motion patterns. Remarkably, DS-SBD requires only non-anomalous behavior for training and can identify previously unseen stereotypical behaviors during inference, such as identifying circling behaviors in the training data.

Figure 2: The DS-SBD model’s predictive accuracy spikes when detecting atypical behaviors such as abnormal hand clapping.
Figure 2: The DS-SBD model’s predictive accuracy spikes when detecting atypical behaviors such as abnormal hand clapping.

Extensive studies validate that DS-SBD’s unsupervised technique has increased the micro-average AUROC from 60.43% to 71.04% and the macro-average AUROC from 56.45% to 73.39%, signifying a substantial improvement in both accuracy and the scope of detectable behaviors. This breakthrough outperforms current state-of-the-art methods and is poised to set a new standard in the field. While DS-SBD marks a significant advancement in recognizing stereotypical behaviors, it represents only one facet of the broader ASD diagnostic process. Comprehensive early diagnosis and intervention strategies will benefit from continued interdisciplinary collaboration and societal engagement.

Related paper:

Advancing neonatal seizure detection through brainwave analysis

Epilepsy in children is a multifaceted, often recurring neurological disorder that predominantly occurs in the formative years (0-18 years). The prompt identification of epilepsy in newborns is vital to safeguard their developmental trajectory.

Source: unsplash.com
Source: unsplash.com

The genesis of epileptic seizures lies in the abnormal discharges of neurons within the brain, rendering brainwave analysis a pivotal tool for epilepsy diagnosis. Nonetheless, the nascent state of neonatal brain development, coupled with the pronounced noise in brainwave data and the marked variability among infants, renders the detection of neonatal epilepsy a formidable medical challenge.

Microsoft Research Asia and its collaborators have unveiled a deep learning paradigm, harnessing artificial intelligence and electroencephalogram (EEG) signals – dubbed the Spatial-Temporal EEG Network (STATENet). This model adeptly processes neural signals, nimbly adjusts to neonatal EEG channel variations, and addresses the challenges outlined above. Additionally, the team has introduced a model-level integration technique that synergistically combines outcomes from various spatial-temporal deep models, thereby bolstering the STATENet model’s generalization ability across diverse neonatal subjects.

Extensive studies utilizing a comprehensive dataset of real-world neonatal EEG data have demonstrated the STATENet model’s substantial enhancement in detection precision. The model’s area under the precision-recall curve (AUPRC) witnessed an improvement exceeding 30% relative to prevailing top-tier methods, equipping physicians with a novel diagnostic instrument for pediatric epilepsy.

Moreover, Microsoft Research Asia has pioneered the inaugural cross-dataset EEG model capable of deciphering any EEG data, thus achieving a ‘one-to-many’ brainwave comprehension. This breakthrough underpins the AI Neurologist system, designed to augment brainwave signal analysis in both clinical and research settings, elevating diagnostic accuracy from 75% to 90% in a case study. The associated models are now open source on GitHub, inviting global research participation to extend this technology’s impact across the medical spectrum and catalyze new diagnostic and therapeutic innovations.

Figure 3: The AI Neurologist system
Figure 3: The AI Neurologist system

Related papers:

Enhancing disease progression prediction and personalized care: The role of AI in precision medicine

Precision medicine represents a transformative approach to healthcare, tailoring treatments to individual patient profiles. Despite the promise, the complexity and unique nature of diseases present significant hurdles. AI emerges as a powerful ally, leveraging data analysis, pattern detection, and predictive modeling to forecast disease progression and risks. This capability is especially crucial in chronic disease management, aiding clinicians and patients in mitigating illness severity and preventing complications.

Graph neural networks: A novel approach to Parkinson’s disease progression

Parkinson’s disease, a prevalent neurodegenerative condition in seniors, progresses gradually. Patient conditions may remain stable or even improve over time with proper medication and therapy, maintaining optimal physical functions. Yet, Parkinson’s presents a spectrum of symptoms, from sleep disturbances to motor challenges, making disease progression prediction complex.

Source: pixabay.com
Source: pixabay.com

Researchers at Microsoft Research Asia advocate for the analysis of multimodal data to extract similar symptoms, thus enhancing prediction accuracy. Graph neural networks (GNNs) excel in mapping patient interconnections, forming networks where nodes represent patients linked by shared attributes. Selecting these attributes, however, largely demands expert knowledge and experience.

To overcome this, Microsoft Research Asia collaborated closely with medical institutions.  Based on recommendations from professional medical personnel, a new algorithm called AdaMedGraph was proposed. AdaMedGraph autonomously identifies key features to construct patient similarity graphs, harmonizing with existing knowledge and integrating expert-designed graphs into a comprehensive model. Unifying individual and collective data, this innovation simplifies the graph construction process.

Evaluated on two public datasets, the Parkinson’s Progression Markers Initiative (PPMI) and the Parkinson’s Disease Biomarkers Program (PDBP), AdaMedGraph outperformed benchmarks in predicting Parkinson’s progression over 24 months, setting the stage for personalized treatment strategies.

Moreover, AdaMedGraph’s robust generalization ability shines in metabolic syndrome prediction, achieving an AUROC of 0.675 on test datasets. This underscores the model’s efficacy in integrating intra- and inter-patient data for individual disease progression forecasting, inspiring new avenues in medical research.

Related paper:

Enhancing interdisciplinary collaboration to maximize AI’s potential

Microsoft Research Asia endeavors extend beyond mere disease detection and progression prediction. In collaboration with the medical sector, Microsoft Research Asia is probing the vast capabilities of AI in advancing drug development and medical research. This includes leveraging state-of-the-art technology in constructing artificial retinas, analyzing drug dependency, advancing cancer therapies, and exploring human metabolism, among other areas.

As AI technology matures and progresses, its practical application potential becomes increasingly evident. Yet, unlocking AI’s full value across diverse sectors necessitates essential interdisciplinary and cross-domain collaboration. “The synergistic collaboration with medical professionals from healthcare and research institutions have enabled Microsoft Research Asia to conduct extensive research projects within the medical and health domain. Our continuous exploration into AI’s application in critical healthcare aspects—ranging from disease detection to rehabilitation and disease progression forecasting—is a testament to our collective dedication. We invite more exceptional individuals passionate about interdisciplinary research to join us in our quest to safeguard human health and foster medical advancements,” expressed Lili Qiu, assistant managing director of Microsoft Research Asia.

Note: The medical health research conducted by Microsoft Research Asia, as discussed in this article, is purely exploratory and guided by professional medical entities and research institutions. Our aim is to further scientific advancement and offer theoretical and technical support for the future medical applications benefiting humanity. All research is in strict adherence to Microsoft’s responsible AI principles, upholding fairness, inclusiveness, reliability and safety, transparency, privacy and security, and accountability. The technologies and methodologies referenced herein are in the R&D phase and are not yet commercialized products or services, nor do they represent medical advice or treatment plans. For health-related concerns, we advise consulting with certified medical practitioners.

The post Guarding human health: AI empowers innovative applications in healthcare appeared first on Microsoft Research.

]]>
Dongqi Han: An interdisciplinary odyssey with AI and other fields http://approjects.co.za/?big=en-us/research/lab/microsoft-research-india/articles/dongqi-han-an-interdisciplinary-odyssey-with-ai-and-other-fields Tue, 07 May 2024 02:35:13 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1031622 Deciding between fundamental and applied research is a dilemma that confronts many in the scientific community. Dongqi Han, on the cusp of graduation, ambitiously aspired to bridge this divide by pursuing both avenues of research in his future endeavors. After a comprehensive evaluation, Dongqi Han selected Microsoft Research Asia (MSR Asia) for his initial foray […]

The post Dongqi Han: An interdisciplinary odyssey with AI and other fields appeared first on Microsoft Research.

]]>
Deciding between fundamental and applied research is a dilemma that confronts many in the scientific community. Dongqi Han, on the cusp of graduation, ambitiously aspired to bridge this divide by pursuing both avenues of research in his future endeavors.

After a comprehensive evaluation, Dongqi Han selected Microsoft Research Asia (MSR Asia) for his initial foray into fulfilling his aspirations. Prior to completing his doctorate, he undertook an internship at MSR Asia – Shanghai. During his six-month internship, Han gained firsthand experience of the lab’s commitment to pioneering basic research and its active engagement in fostering industrial collaborations, thereby facilitating the practical application of innovative findings. This experience sowed the seeds for his eventual formal engagement with the lab.

“MSR Asia has established a dynamic platform that seamlessly integrates fundamental research with practical industrial applications. Within this environment, I have the opportunity to work alongside eminent researchers, delving into the underlying principles and methodologies of intelligence. Moreover, I am able to harness the power of AI in domains such as healthcare. This synergy of theory and practice was a pivotal factor in my decision to join the lab after graduation. Undoubtedly, it represents an ideal launchpad for my career in scientific research,” Dongqi Han articulated.

Dongqi Han
Dongqi Han

Unveiling the core of intelligence: At the crossroads of AI and neuroscience

In recent years, the synergy between computer technologies like AI and various other fields has grown remarkably. MSR Asia is at the forefront, spearheading pivotal research and progressively amplifying its investments. Shanghai, a metropolis celebrated for its diversity and home to esteemed academic and leading medical institutions, offers fertile ground for the interdisciplinary fusion of AI and other fields. Consequently, the confluence of AI with neuroscience and other healthcare domains has emerged as a key research focus for MSR Asia – Shanghai.

Dr. Dongqi Han, a graduate of the Okinawa Institute of Science and Technology Graduate University (OIST) in Japan, majored in cognitive neuro-robotics, which encompasses the study of robotics integrated with neuroscience. As a neuroscientist, Dr. Han’s expertise significantly enhances the professional capabilities of MSR Asia’s interdisciplinary team, focusing on AI and brain science research.

Dongqi Han, positioned second from the left in the second row, alongside his team colleagues.
Dongqi Han, positioned second from the left in the second row, alongside his team colleagues.

Dongqi Han’s research primarily explores two areas: the convergence of AI with neuroscience, and AI’s applications in healthcare. He believes that this synergy is not only theoretically profound but also immensely practical. Dr. Han asserts, “To create more intuitive and effective interfaces, whether they be brain-computer or human-computer, a more intricate comprehension of the human cognitive and perceptual processes is essential.” In healthcare, neurological disorders impact approximately one billion individuals globally. Studies at the nexus of AI and brain science have the potential to enrich the knowledge base for both clinicians and patients, leading to improved diagnosis, prevention, and management of these conditions.

“AI and brain science are deeply intertwined, both delving into the core and mechanisms of intelligence. They encounter similar issues and can mutually benefit from shared insights.”

Indeed, AI technologies often draw from the brain’s neural networks, with structures like multilayer perceptron (MLP) and long short-term memory (LSTM) networks mirroring our own cognitive architecture. By examining human and animal cognition—learning, memory, decision-making—we can augment AI’s capabilities. A critical hurdle for AI is “catastrophic forgetting”, where new learning can erase old knowledge, a flaw not seen in the human brain. Dr. Han and his team colleagues are dedicated to resolving such AI challenges by gleaning lessons from our neurological processes.

Conversely, the robust data processing and modeling prowess of AI holds the potential to advance neuroscience research and its applications significantly. The human brain is composed of approximately 10^11 neurons interconnected by roughly 10^15 synapses. Harnessing AI to model the brain’s operational principles and computational strategies is essential to manage this immense data complexity and to substantiate the veracity of theoretical frameworks in neuroscience.

Presently, Dr. Han and his team colleagues, along with collaborators, have garnered preliminary findings from their research endeavors. Their study primarily examines two distinct human behavioral types: habitual and goal-directed behaviors [1]. For instance, habitual behavior, akin to the automatic act of selecting a familiar route home post-work, requires no conscious deliberation. In contrast, goal-directed behavior involves intentional consideration of both purpose and outcome, exemplified by plotting a course to the Civil Affairs Bureau to obtain a marriage certificate. While these behavioral models elucidate numerous aspects of biological conduct, the mechanisms by which the brain decides between these patterns and their mutual interactions remain an enigma.

Computational modeling of habitual and goal-directed behaviors
Computational modeling of habitual and goal-directed behaviors

Dongqi Han said, “Our team employs deep learning and machine learning methodologies to model and investigate the characteristics and underlying neural mechanisms of two distinct behavioral types. This endeavor not only contributes to the advancement of cognitive science and psychology but also serves as a source of inspiration for the development of innovative AI algorithms.”

Dr. Han’s recent research, conducted alongside his colleagues, revolves around emulating the brain’s neural circuitry. This has culminated in the development of a novel neural network model named CircuitNet [2]. Characterized by densely interconnected neurons within neural clusters and sparse connections across different brain regions, this model mirrors the human brain’s unique wiring. The team at MSR Asia is delving into the intricacies and benefits of such a neural architecture. Dr. Han, who has been involved in this project since his internship, has seen CircuitNet come to fruition through collaborative efforts, culminating in its selection for presentation at ICML 2023.

The model structure of CircuitNet
The model structure of CircuitNet

CircuitNet represents an advancement in neural network architectures, offering enhanced performance with a reduced parameter count, thus leading to greater energy efficiency. Remarkably, the human brain operates on less than 20 watts of power on average—a stark contrast to the substantial energy demands of large-scale AI models such as GPT-4, which may require hundreds to thousands of watts. Moving forward, Dongqi Han and his team colleagues are dedicated to unraveling the human brain’s mechanisms for energy conservation, drawing inspiration from CircuitNet’s design.

Dr. Han’s research extends to deep reinforcement learning and embodied AI, with the goal of refining AI to improve learning, decision-making, and real-world interaction capabilities of intelligent robots. He observes that while current large AI models predominantly generate content like text and images, embodied AI outputs dynamic actions, introducing a myriad of real-world uncertainties. For instance, a robot engaged in painting might encounter various challenges such as errors, equipment failure, or interference, all influencing the final outcome. Navigating these complexities requires sophisticated action selection processes. Dr. Han believes that by drawing parallels to human cognitive decision-making, we can expedite the advancement of embodied intelligence.

Fostering innovation: The power of interdisciplinary learning

Dongqi Han’s insatiable curiosity about the world fuels his wide-ranging and intense passion for scientific inquiry. His academic journey began with an undergraduate major in theoretical physics—a field he regards as exceptionally demanding. It necessitates a learner to possess stringent logical reasoning, robust mathematical prowess, and the capacity for experimental design and data analysis. These skills are instrumental in enhancing an individual’s intellectual caliber. Furthermore, physical science serves as the bedrock for numerous contemporary technologies, offering expansive applications.

During his doctoral studies, Dongqi Han took a photo with his mentor and lab mates.
During his doctoral studies, Dongqi Han took a photo with his mentor and lab mates.

During his undergraduate study in physics, Dongqi Han was deeply engrossed in the detection of physical parameters within tokamaks—devices pivotal for controlled nuclear fusion [4]. He considers nuclear fusion as a boundless, efficient, and clean energy source, believing its mastery to herald unparalleled benefits for humanity. Initially, Dr. Han viewed achieving controlled nuclear fusion as his scientific beacon. Yet, as his studies progressed, he discerned that the real hurdles to implementation lay not in the realm of theoretical physics, but within the ambit of engineering challenges. This revelation steered him to realize that his knowledge in theoretical physics might not directly contribute to the fruition of his aspirations. It was during this phase, amidst his tokamak investigations, that Dr. Han’s encounter with machine learning technology sparked a transformative shift in his academic pursuit, leading him to specialize in cognitive neuro-robotics for his doctoral research.

Embracing interdisciplinary learning, Dongqi Han embarked on a journey of discovery, building his foundation from the ground up. His initial year in the doctoral program was marked by an intensive curriculum that spanned basic neuroscience, machine learning, robotics control automation, and the integrative domains of cognitive science and psychology. Despite the rigorous challenges of navigating multiple disciplines, this cross-pollination of knowledge ignited innovative ideas. These insights have proven to be invaluable, enriching his research in AI and brain science.

In exploring the domains of reinforcement learning and embodied intelligence, Dongqi Han aims to integrate the methodical thinking approaches of physics. This will involve transferring the discipline’s rigorous thought processes (for example, commonly used measurement statistical methods) and logical reasoning, honed through physical experimentation, into the realm of AI study.

“The benefits of interdisciplinary learning are not only reflected in the cross-domain application of knowledge but also in the borrowing and inspiration of logical thinking. For example, when solving problems in neural network machine learning, traditional machine learning thinking often first considers data volume and model scale. Training in neuroscience can allow us to start from the perspective of the human or animal brain, thinking about problems in a more expansive and flexible way,” said Dongqi Han.

Collaborative synergy: Harnessing advanced technology for real-world solutions

Dongqi Han and his team colleagues are not only dedicated to interdisciplinary research but also actively engage in cross-disciplinary collaborations. They partner with universities and medical institutions around the globe, harnessing advanced technologies to tackle real-world challenges and forge superior solutions.

Dongqi Han and his team colleagues, in partnership with Fudan University, have pioneered an AI model for machine vision that replicates human visual perception [3], with a focus on boosting the energy efficiency of computer vision systems. Dr. Han notes, “Through collaborative research, we have discovered notable distinctions between human and computer vision. Human vision exhibits a significantly higher resolution at the fovea—the central point—compared to the peripheral vision. Additionally, the human brain processes information by transmitting spike-based signals, a characteristic that is mirrored in the architecture of spiking neural networks.” To harness the superior aspects of human vision, the team employed a spiking neural network to emulate a visual system with variable resolution and neuronal spike-based communication. This innovative model is poised to revolutionize energy efficiency, potentially achieving up to a hundredfold improvement in performing visual tasks with significantly reduced energy demands.

A human-eye-like spiking neural network performing a visual search task
A human-eye-like spiking neural network performing a visual search task

In a groundbreaking robotics initiative, researchers from MSR Asia and Korea Advanced Institute of Science and Technology (KAIST) have been pioneering the use of wearable, non-invasive devices to decipher human brainwave signals. This technology is poised to significantly enhance a robot’s ability to interpret human intentions with greater precision. “Consider an elderly individual reaching towards the kitchen—does this gesture indicate hunger or the need to retrieve an item? Or when they gesture towards a table adorned with both a water bottle and a tissue box, which is their intended choice?” Dongqi Han said, “KAIST’s robust expertise in neuroscience, combined with our advanced AI algorithms, creates a powerful alliance. Together, we’re pushing the boundaries of what’s possible in robotics, enabling more nuanced task execution through a fusion of brain science and AI. This interdisciplinary collaboration is sparking innovative research avenues for both teams.

From delving into games to persevering in scientific research

Dongqi Han’s diverse interests extend beyond his professional pursuits. He likes hiking and badminton, as well as various types of video games. “My passion for video games began in elementary school, evolving into a dedication to a particularly challenging game that demanded patience and perseverance. The process of advancing through persistent practice instilled in me a profound sense of accomplishment and shaped my approach to life, fostering patience and resilience against adversity” , Dr. Han shares.

This mindset has also permeated his research philosophy, which is characterized by deep, sustained inquiry. The supportive and diverse research environment at MSR Asia encourages vibrant collaboration and communication with colleagues from diverse disciplines and walks of life, bolstering his commitment to long-term research and connecting him with fellow researchers who share his vision.

Related links:

[1] Synergizing Habits and Goals with Variational Bayes, Nature Communications, 2024
(preprint at https://osf.io/preprints/psyarxiv/v63yj)

[2] CircuitNet:A Generic Neural Network to Realize Universal Circuit Motif Modeling
https://proceedings.mlr.press/v202/wang23k/wang23k.pdf

[3] Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network
https://arxiv.org/abs/2310.06578

[4] In situ relative self-dependent calibration of electron cyclotron emission imaging via shape matching
https://doi.org/10.1063/1.5038866

The post Dongqi Han: An interdisciplinary odyssey with AI and other fields appeared first on Microsoft Research.

]]>