Microsoft Research Lab - Asia Articles http://approjects.co.za/?big=en-us/research/ Wed, 11 Sep 2024 05:22:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 VALL-E 2: Enhancing the robustness and naturalness of text-to-speech models http://approjects.co.za/?big=en-us/research/articles/vall-e-2-enhancing-the-robustness-and-naturalness-of-text-to-speech-models/ Tue, 10 Sep 2024 10:33:40 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1083075 Author: Shujie Liu In recent years, the rapid advancement of AI has continually expanded the capabilities of Text-to-Speech (TTS) technology. Ongoing optimizations and innovations in TTS have enriched and simplified voice interaction experiences. These research developments hold significant potential across various fields, including education, entertainment, and multilingual communication, etc. Traditional TTS systems, trained with high-quality […]

The post VALL-E 2: Enhancing the robustness and naturalness of text-to-speech models appeared first on Microsoft Research.

]]>
Author: Shujie Liu

In recent years, the rapid advancement of AI has continually expanded the capabilities of Text-to-Speech (TTS) technology. Ongoing optimizations and innovations in TTS have enriched and simplified voice interaction experiences. These research developments hold significant potential across various fields, including education, entertainment, and multilingual communication, etc.

Traditional TTS systems, trained with high-quality clean data from the recording studio, still suffer from poor generalization. Speaker similarity and speech naturalness decrease dramatically for unseen speakers in the zero-shot scenario. To address this issue, researchers from MSR Asia developed VALL-E by introducing the LLM technique (training a model to predict the next token with large unsupervised sequential data)  into the speech processing tasks. VALL-E is the first neural codec language model using discrete codes derived from an off-the-shelf neural audio codec model. It regards TTS as a conditional language model, emerging in-context learning capabilities. VALL-E is capable of synthesizing high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as a prompt. However, due to the auto-regressive modeling and the random sampling inference, VALL-E suffers from robustness and efficiency problems.

To deal with these problems, researchers proposed VALL-E 2, which leverages repetition-aware sampling and grouped code modeling techniques, achieving human parity in zero-shot TTS performance on LibriSpeech and VCTK datasets. Repetition aware sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. This approach not only stabilizes decoding but also circumvents the infinite loop issue observed in VALL-E. Additionally, grouped token modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling.

With these two techniques, VALL-E 2 surpasses previous systems in terms of speech robustness, naturalness, and speaker similarity. VALL-E 2 demonstrated constant ability to synthesize high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases.

VALL-E 2 paper: https://arxiv.org/abs/2406.05370 (opens in new tab)

VALL-E 2 demo: https://aka.ms/valle2 (opens in new tab)

Illustration of VALL-E 2
Figure 1: Illustration of VALL-E 2

Enhancing models with repetition-aware sampling and grouped token modeling

AR and NAR modelling for VALL-E 2
Figure 2: AR and NAR modelling for VALL-E 2

As illustrated in Figure 2, VALL-E 2 employs a hierarchical structure that is analogous to that of VALL-E. This structure comprises an Autoregressive (AR) codec language model and a Non-Autoregressive (NAR) codec language model. The AR model generates sequence of the first codec code for each frame in an autoregressive manner, while the NAR model generates each remaining code sequence based on the preceding code sequences in a non-autoregressive manner. Both models utilize the same Transformer architecture with a text embedding layer, a code embedding layer, a Transformer layer, and a code prediction layer. The AR model and NAR model have different attention mask strategies: the AR model uses the causal attention strategy, and the NAR model uses the full attention strategy.

Repetition-aware sampling and grouped token modeling
Figure 3: Repetition-aware sampling and grouped token modeling

In light of past experience, it has been observed that the random sampling used in VALL-E for inference can lead to instability in output. Although the probability of error tokens (red tokens in Figure 3) is low, they are still inevitable to be sampled due to the massive sampling steps. To stabilize the inference process, nucleus sampling is usually leveraged to sample tokens from the set of most probable tokens with a cumulative probability less than a preset threshold. The nucleus sampling method can reduce the occurrence of erroneous words, but it can also lead to the model generating silence output to avoid making mistakes.

To achieve a balance between random sampling and nucleus sampling, researchers proposed a repetition-aware sampling. Given the probability distribution predicted by the AR model, researchers first generate the target code by nucleus sampling with a pre-defined top-p value. Then, researchers calculate the repetition ratio of the predicted token in the preceding code sequence with a fixed window size. If the repetition ratio exceeds a pre-defined repetition threshold, researchers will replace the predicted target code with a randomly sampled code drawn from the original probability distribution. This repetition aware sampling method allows the decoding process to benefit from the stability of nucleus sampling while avoiding the infinite loop issue through the use of random sampling.

Meanwhile, it has been demonstrated thatthe autoregressive architecture of VALL-E is bound to the same high frame rate as the off-the-shelf audio codec model, which cannot be adjusted, resulting in a slow inference speed, especially for the inference of AR model. To speed up the inference process of VALL-E 2, researchers leveraged the grouped token modeling method, wherein the codec code sequence is partitioned into groups of a certain size and each group of codec codes is modeled as one frame. In the AR model, researchers leverage a group embedding layer to project the code embedding to the group embedding as the network input, and a group prediction layer for the prediction of codes in one group. In this way, we can get rid of the frame rate constraint of the off-the-shelf neural audio codec model and reduce the frame rate by integer multiples. It is not only beneficial for the inference efficiency but also the overall speech quality by mitigating the long context modeling problem.

Significant improvements in robustness, naturalness, and similarity

Evaluation results compared with strong baselines
Figure 4: Evaluation results compared with strong baselines

To show the performance of VALL-E 2, researchers conducted experiments on LibriSpeech and VCTK datasets, and compare the results with multiple baselines, in terms of robustness, naturalness, and similarity score. These scores are relative numbers calculated based on the results reported in the original papers, irrespective of differences in model architecture and training data. As illustrated in Figure 4, VALL-E 2 can significantly improve the performance compared with previous methods, and even achieves human parity zero-shot TTS performance for the first time. In this context, human parity indicates that the robustness, naturalness, and similarity metrics of VALL-E 2 surpass those of the ground truth samples ( WER(GroundTruth)-WER(VALL-E 2) >0, CMOS(VALL-E 2) – CMOS(GroundTruth) >0, and SMOS(VALL-E 2) -SMOS(GroundTruth)>0), meaning that VALL-E 2 can generate accurate, natural speech in the exact voice of the original speaker. It is important to note that this conclusion is drawn solely from experimental results on the LibriSpeech and VCTK datasets.

Through the introduction of repetition aware sampling and grouped code modeling, VALL-E 2 is capable of reliably synthesizing speech for complex sentences, including those that are challenging to read or contain numerous repeated phrases. The benefits of this work could support meaningful initiatives, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis.

Note: VALL-E 2 is purely a research project. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public. VALL-E 2 could synthesize speech that maintains speaker identity and could be used for educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on. While VALL-E 2 can generate a voice similar to the natural voice, the similarity and naturalness depend on the length and quality of the speech prompt, the background noise, as well as other factors. It may carry potential risks in the misuse of the model, such as spoofing voice identification or impersonating a specific speaker. We conducted experiments under the assumption that the user agrees to be the target speaker in speech synthesis. If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model. If you suspect that VALL-E 2 is being used in a manner that is abusive or illegal or infringes on your rights or the rights of other people, you can report it at the Report Abuse Portal (https://msrc.microsoft.com/report/ (opens in new tab)).

With the rapid development of AI technology, ensuring that these technologies are trustworthy remains a significant challenge. Microsoft has proactively implemented a series of measures to anticipate and mitigate the risks associated with AI. Committed to advancing AI development in line with human-centered ethical principles, Microsoft introduced six Responsible AI Principles in 2018: fairness, inclusiveness, reliability and safety, transparency, privacy and security, and accountability. To operationalize these principles, Microsoft subsequently released the Responsible AI Standards and established a governance framework to ensure that each team integrates these principles and standards into their daily work. Additionally, Microsoft continuously collaborates with researchers and academic institutions worldwide to advance the practice and technology of responsible AI.

The post VALL-E 2: Enhancing the robustness and naturalness of text-to-speech models appeared first on Microsoft Research.

]]>
MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model http://approjects.co.za/?big=en-us/research/articles/mg-tsd-advancing-time-series-analysis-with-multi-granularity-guided-diffusion-model/ Tue, 18 Jun 2024 03:46:11 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1048293 Author: Chang Xu Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In order to tackle this challenge, researchers from Microsoft Research Asia introduce a novel approach called “MG-TSD”. The paper “MG-TSD: Multi-Granularity Time Series Diffusion Models […]

The post MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model appeared first on Microsoft Research.

]]>
Author: Chang Xu

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In order to tackle this challenge, researchers from Microsoft Research Asia introduce a novel approach called “MG-TSD”. The paper “MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process (opens in new tab)”, presented at ICLR 2024, capitalizes on the intrinsic granularity levels present in the data, utilizing them as predefined targets at various stages of the diffusion process. The aforementioned targets are employed to guide the learning trajectory of the diffusion models, thereby ensuring a more stable and accurate forecast.

It is noteworthy that  the MG-TSD method yields remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks with improvement ranging from 4.7% to 35.8%.

Guiding diffusion processes through intrinsic data granularities features in time series data

It can be observed that the forward process of the diffusion model, which sequentially corrupts the data distribution to a standard normal distribution, intuitively aligns with the process of smoothing fine-grained data into a coarser-grained representation. Both of these processes result in a gradual loss of finer distribution features. This suggests that intrinsic features within data granularities may also serve as a source of guidance.

Figure1: The process of smoothing data from finest-grained to coarsest-grained naturally aligns with the diffusion process
Figure1: The process of smoothing data from finest-grained to coarsest-grained naturally aligns with the diffusion process

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models. The coarse-grained data at different granularity levels are utilized as targets to guide the learning of the denoising process. These targets serve as constraints for the intermediate latent states, ensuring a regularized sampling path that preserves the trends and patterns within the coarse-grained data. The introduction of inductive bias facilitates the generation of coarser features during intermediate steps and facilitates the recovery of finer features in subsequent diffusion steps. Each granularity level can guide the diffusion process through different steps. When implementing this, both the coarse-grained data and the finest-grained data share different percentages of the variance schedule (a hyperparameter of the diffusion model), referred to as the “share ratio.” Consequently, this design reduces variability and results in high-quality predictions.

diagram, schematic
Figure2: Overview of the Multi-Granularity Time Series Diffusion (MG-TSD) model

MG-TSD achieves stable and outstanding prediction results

A comprehensive evaluation was conducted across six benchmarks and three performance metrics, in which nine baseline models were compared. The results demonstrate that the MG-TSD model achieves state-of-the-art (SOTA) status, with a substantial improvement ranging from 4.7% to 35.8% on the CRPS_sum metric across the six benchmarks. CRPS_sum indicates the similarity between two distributions; the smaller the value, the more similar they are.

Table 1: Comparison of CRPS_sum of models on six real-world datasets
Table 1: Comparison of CRPS_sum of models on six real-world datasets

Diffusion process aligns with data smoothing

The four subplots from Figure3(a) to Figure3(d) illustrate a gradual smoothing transformation of the distribution of increasingly coarser targets. The blue curve represents CRPS_sum values of coarse-grained targets and intermediate samples of the single-granularity diffusion model (1h) at each denoising step. As granularity transitions to coarser from left to right panels (4h→6h→12h→24h), targets progressively align with intermediate sample distributions at smaller denoising steps (approximately at steps 80→60→40→40). This comparison underscores the similarity between the diffusion process and the smoothing process, which transition from the finest-grained data to coarse-grained data. Both processes entail a gradual loss of finer characteristics from the finest-grained data through a smooth transformation.

Moreover, this observation is consistent with our selection of guiding steps for MG-TSD. The orange lines, which depict the performance of MG-TSD with different share ratios ranging from [0.2, 0.4, 0.6, 0.8, 1.0], and the blue lines, which represent the similarity of distributions, show a consistent trend. In other words, setting the guiding steps at a coarse granularity to match the steps where the diffusion intermediate samples have the closest distribution often achieves the best performance, as indicated by the grey region in the figure.

Figure 3: Selection of share ratio for MG-TSD models
Figure 3: Selection of share ratio for MG-TSD models

Coarse-grained samples demonstrate superior robustness in trend-capturing capabilities

Researchers visualize the ground truth and the predicted mean for both 1-hour and 4-hour granularity time series across four dimensions in the Solar dataset, as illustrated in Figure 4. In the MG-TSD model, the coarse-grained samples display a more robust capacity to capture the trends, subsequently guiding the generation of more precise fine-grained data.

Figure 4: MG-TSD and TimeGrad prediction intervals and test set ground-truth for Solar data of some illustrative dimensions of 370 dimensions from first rolling-window.
Figure 4: MG-TSD and TimeGrad prediction intervals and test set ground-truth for Solar data of some illustrative dimensions of 370 dimensions from first rolling-window.

For further information regarding MG-TSD, please refer to the project page at: https://github.com/Hundredl/MG-TSD (opens in new tab)

The post MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model appeared first on Microsoft Research.

]]>
Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation http://approjects.co.za/?big=en-us/research/articles/driving-industry-evolution-exploring-the-impact-of-generative-ai-on-sector-transformation/ Tue, 04 Jun 2024 18:05:27 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1035015 Jiang Bian discusses how generative AI transforms industries by bridging gaps between AI capabilities and sector needs. He will showcase domain-specific foundation models and versatile AI agents, setting new industry standards.

The post Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation appeared first on Microsoft Research.

]]>
Presented by Jiang Bian at Microsoft Research Forum, June 2024

headshot of Jiang Bian

There is “a substantial demand for advanced generative AI tailored to enhance core business operations. However, in our dialogues with strategic partners, we have identified crucial gaps in current generative AI capabilities versus the specific needs of industry applications. … Our research is crucial in addressing these limitations and amplifying the underappreciated potential of generative AI.”

Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Transcript: Lightning Talk

Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation

Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Jiang Bian discusses how generative AI transforms industries by bridging gaps between AI capabilities and sector needs. He will showcase domain-specific foundation models and versatile AI agents, setting new industry standards.

Microsoft Research Forum, June 4, 2024

JIANG BIAN: Hello, everyone. My name is Jiang. Today, I’m excited to discuss the work we are undertaking at Microsoft Research Asia focusing on leveraging generative AI to drive transformation and evolution across various industries.

Our efforts are inspired by our unique co-innovation initiative with world-renowned partners from a few core sectors, including finance, manufacturing, energy, and so on. These collaborations have highlighted a substantial demand for advanced generative AI tailored to enhance core business operations. However, in our dialogues with strategic partners, we have identified crucial gaps in current generative AI capabilities versus the specific needs of industry applications. These include a too-narrow focus on human-like AI but not critical industry applications, limitations in processing complex and noisy data, and concerns about reliability in complex decision-making scenarios. Our research is crucial in addressing these limitations and amplifying the underappreciated potential of generative AI in high-value sectors. We are focusing on two main approaches: developing domain-specific foundation models that enhance analytical and predictive capabilities or enable interactive and controllable simulations and creating a versatile foundation-model-as-agent system for diverse industry decision-making tasks.

Our first project is transforming the way industrial data is analyzed and utilized. Facing diverse data formats like tabular, time series, and graph from various sectors, we are employing Generative Data Learning to enhance the large language model with strong ability to interpret and process diverse data formats by transforming them into a unified, instruction-oriented language. With training over this diverse sector data for [numerous]tasks, this approach enables more intuitive data analytics and predictions across various industries. Initial experiments on a typical classification and regression task over tabular data have shown that even a relatively small-scale model enhanced by our Generative Data Learning approach can outperform both general large language models and traditional models like tree ensembles, particularly in few-shot scenarios. This suggests the significant potential for a single-model solution with no extensive model training or fine-tuning in exploring industrial data intelligence maybe with only few-shot examples.

Our second project is exploring building foundation models over domain-specific data, and we focus on financial markets given its fundamental data is orders. We have developed a dual-level foundation model called Large Market Model that uses transformers on both the order sequence to model the market dynamics and the order-batch sequence to align the market trend with control signals. The performance of financial market simulations based on this Large Market Model has been very promising. They have excelled in forecasting market trends, simulating extreme scenarios for stress tests, and detecting market manipulations efficiently.

Our third project focuses on creating a decision-making agent through the knowledge-augmented generation and adaptive retrieval. This agent is essentially a trainable model that generates and extracts domain-specific knowledge, dynamically updating itself and retrieving most appropriate knowledge to handle changing environment. This adaptive approach is particularly useful in many industrycontrol applications, such as HVAC control with the goal of optimizing energy use while maintaining comfort. Deploying this agent into this scenario has shown it can outperform traditional reinforcement learning methods, saving significantly more energy, especially in unknown environments or when facing perturbations.

In summary, at MSR Asia, we are committed to advancing the development of generative AI to catalyze industry evolution through innovative research and partnership. We will soon be sharing more details about these projects through upcoming papers and open-source initiatives. We invite you, especially our industry partners, to stay tuned and join us in driving these transformative efforts forward. Thank you.

“Foundation models, also known as large language models, possess immense potential across a variety of industries. Yet, some companies and organizations limit their use of these expansive AI models to niche areas, including intelligent customer service, chatbots, or text and image generation. In reality, these foundation models demonstrate robust abilities in reasoning, content creation, and generalization, making them exceptionally fit for high-stakes business tasks. These tasks range from taking accurate prediction and forecasting, optimizing industrial control and complex decision-making, and conducting intelligent and interactive industrial simulations.”

— Jiang Bian, Senior Principal Research Manager, Microsoft Research Asia

Maximizing the business value of foundation models in industry applications

By Jiang Bian

As the development of large AI models, also known as foundation models, progresses, companies and organizations are becoming increasingly excited about their potential for enhancing productivity. However, a significant trend has been observed: many industry practitioners focus heavily on the human-like qualities of AI, such as conversational abilities, writing skills, creativity, and perceptual capabilities. In deploying these large AI models, there is a tendency to prioritize applications in intelligent customer service, chatbots, and other so-called ”human-like” functions. Unfortunately, this emphasis may restrict our comprehension and use of these potent models, hindering our ability to fully unleash their capabilities within various industries.

This limitation is not without reason. Incorporating foundation models into practical, production-oriented scenarios is still in its infancy, with few mature and widespread examples to follow. Viewing AI as a “production tool” is akin to possessing a tool before fully understanding its potential applications. Furthermore, humanity has rarely, if ever, encountered such a versatile yet uncertain tool that is not designed for specific tasks.

Additionally, the complexity and variety inherent in different industries require foundation models that move beyond traditional perceptions. This necessitates synchronized innovation in models at the industry level, enabling them to fully exploit the capabilities of foundation models across diverse industrial landscapes and to better align with AI applications. Instead of limiting AI to a “chat robot” role, we should broaden our perspective. Transforming industries in the AI era involves rethinking current business processes and frameworks, leading to collaborative models that can effortlessly integrate humans and foundation models.

Unlocking the boundless potential of foundation models in industry

Foundation models are endowed with broad capabilities in data representation, knowledge comprehension, and reasoning, allowing them to adjust seamlessly across various domains and scenarios, and swiftly adapt to new environments. Concurrently, digital platforms across industries have evolved, amassing substantial amounts of industry-specific data. This rich repository of knowledge and information positions foundation models to integrate effortlessly into industrial settings.

In practical terms, the advanced reasoning abilities of foundation models provide users with a deeper understanding of data. By extracting valuable insights from large datasets and identifying patterns and correlations, these models deliver more effective recommendations and deeper insights. This benefit is especially vital in industrial contexts, where prediction, decision-making, and simulation play crucial roles.

One of the standout features of foundation models is their exceptional ability to generalize. Before their advent, each industry scenario required specific data to train bespoke AI models, limiting scalability and hindering the full commercial exploitation of AI. Foundation models, with their access to a global pool of knowledge, markedly improve generalization. As a result, industries are freed from the necessity of developing unique models for every situation, overcoming a major limitation of traditional AI solutions.

Moreover, foundation models can work in tandem with generative AI to increase the accuracy, realism, and interactivity of industrial simulations and intelligent modeling, facilitating the creation of digital twins. These simulations and models aim to mimic and test real-world scenarios, which often involve complex roles and intricate environments. Traditional AI models may simplify real-world complexities or miss crucial extreme events, compromising the fidelity and authenticity of simulations. In contrast, generative large AI models, steeped in domain-specific knowledge, establish accurate mappings between specific data dimensions and real-world occurrences. This method allows for simulations that closely mirror reality, significantly aiding industrial forecasting and decision-making processes while maintaining adherence to industry standards.

In the industrial sector, tasks of paramount importance and commercial value include precise forecasting and control, efficient optimization of decisions, and complex duties associated with intelligent and interactive industrial simulations. These areas should be the primary focus for traditional industrial enterprises. Yet, when assessing existing foundation models like GPT and the actual needs within industrial domains, we uncover significant mismatches between the capabilities of these models and the real demands of industry. To bridge this gap and fully leverage their potential, several challenges must be addressed.

First, there is a notable absence of a universal framework capable of effectively extracting complex domain knowledge from diverse field data and using this knowledge to construct intelligent agents. Various domains contain rich and complex data, such as logistics companies dealing with customs information and cross-national policies, pharmaceutical industries with FDA drug review documents, and the legal industry with numerous regulations. Developing intelligent agents that are deeply rooted in domain knowledge calls for a more generalized framework. This framework should be proficient in extracting crucial domain knowledge, identifying hidden connections between data and knowledge, and managing this information efficiently.

Second, while foundation models are adept at generating textual content, their proficiency in processing and understanding structured data, like numerical or tabular information, is lacking. Industrial scenarios often involve structured data, such as health monitoring indicators, battery charge-discharge cycles, and financial transactions. Current large models are not specifically designed or optimized for processing such data, which complicates accurate prediction and classification tasks based on structured inputs.

Third, in practical applications, foundation models currently fall short in stability and reliability for decision-making. Critical industries like energy, logistics, finance, and healthcare require dependable decision-making for tasks such as optimizing logistics routes, controlling energy equipment, formulating investment strategies, and allocating medical resources. These tasks often involve numerous variables and constraints, especially under dynamic environmental changes. Foundation models have yet to fully adapt to these complex industrial tasks, making direct application challenging.

Lastly, there is a lack of insight into domain-specific foundational data, as well as methodologies and experience for developing domain-specific foundation models. Essential information in many specialized fields extends beyond mere text, incorporating unique data structures and semantic relationships. For example, transaction order information in the financial investment field or molecular structure details in the biopharmaceutical industry contain critical knowledge often embedded in such foundational data. A deeper, more nuanced analysis is required. Creating domain-specific foundation models grounded in this detailed understanding is crucial for effectively leveraging and unlocking the potential of data in these fields.

Constructing industry foundation models: harmonizing general knowledge and domain expertise

To expedite the adoption and application of foundation models in industry, we can concentrate on several pivotal areas.

First, we can harness rich and complex industrial domain data to construct a more versatile, efficient, and practical retrieval-augmented generation (RAG) framework. This framework is designed to adapt seamlessly to various vertical domains, extracting essential domain knowledge, uncovering hidden associations between data and knowledge, and effectively organizing and managing this wealth of information.

Diagram: A more universal, efficient, and practical retrieval-augmented generation (RAG) framework based on foundation models.
Figure 1. A more universal, efficient, and practical retrieval-augmented generation (RAG) framework based on foundation models.

Second, by carefully considering critical numerical data and the corresponding structured dependencies prevalent in industrial scenarios, we can design foundation models specifically optimized for industrial applications. These models effectively integrate general knowledge with domain-specific expertise derived from temporal or tabular data, thereby enabling more effective solutions for tasks such as prediction and classification within the industry.

Diagram: From traditional industry AI solutions to Industry foundation models integrating general and domain knowledge.
Figure 2. From traditional industry AI solutions to Industry foundation models integrating general and domain knowledge.

Another avenue we are actively exploring involves harnessing the potent generation, generalization, and transfer capabilities inherent in foundation models to elevate the quality and efficiency of industrial decision-making. We are pursuing two distinct paths: first, treating foundation models as intelligent agents, and; second, leveraging foundation models to assist reinforcement-learning agents.

Treating foundation models as intelligent agents: By leveraging the pre-existing knowledge encoded in foundation models and integrating offline reinforcement learning, we can continuously acquire new domain-specific insights and fine-tune the models. This evolutionary process enhances the optimization and decision-making capabilities of foundation models, enabling them to prioritize industry-specific tasks.

Foundation models optimized for specific tasks can play a pivotal role across various industrial contexts. In formula racing, for example, these foundation models can optimize tire-maintenance strategies. By considering tire wear and repair costs, they determine the optimal pit stop timing, thereby shortening race duration and improving car rankings. In chemical manufacturing, leveraging these foundation models can significantly enhance efficiency in product storage and pipeline coordination during production processes, ultimately boosting overall production-execution efficiency. Furthermore, due to their generalization capabilities and robustness, foundation models can be swiftly adapted to optimize air conditioning control, ensuring comfortable temperatures while minimizing energy consumption.

Diagram: Foundation models and offline reinforcement learning are being synergized to construct decision-making agents.
Figure 3. Foundation models and offline reinforcement learning are being synergized to construct decision-making agents.

Assisting reinforcement learning agents with foundation models: We can empower models to acquire universal representations that rapidly adapt to diverse environments and tasks, thereby enhancing their generalization capabilities. In this approach, we introduce a pre-trained world model that emulates human learning and decision-making processes, ultimately bolstering industrial decision-making. By harnessing a pre-trained world model with extensive knowledge and adopting a two-stage pre-training framework, developers can comprehensively and flexibly train foundation models for industrial decision-making, extending their applicability to any specific decision scenario.

We partnered with the Microsoft Xbox team to rigorously validate the effectiveness of our framework in game-testing scenarios. By harnessing this framework, we pre-trained a specialized world model tailored for game maps. This model directly tackles the challenge of long-term spatial reasoning and navigation, leveraging landmark observations within novel game environments. The results were remarkable: our pre-trained model significantly outperformed counterparts that lacked a world model or relied on traditional learning methods. As a result, game exploration efficiency was greatly enhanced.

Moreover, we can harness domain-specific foundational data and the precise semantic information it encapsulates to develop foundation models within the domain, thereby unlocking novel opportunities for intelligent, interactive decision-making, and simulation. For example, by analyzing transactional data from financial markets, we can construct robust investment models. These foundational datasets extend beyond mere textual characters; they embody intricate semantic structures and valuable information. Leveraging this financial foundation model, we can generate customized order flows for various market styles, simulate large-scale order transactions across diverse market environments, and conduct controlled experiments in the financial investment landscape. This approach empowers us to gain deeper insights into market fluctuations and devise strategies for extreme scenarios.

Diagram: Leveraging financial foundation models to implement order flow generation for different market styles, thereby simulating diverse market environments.
Figure 4. Leveraging financial foundation models to implement order flow generation for different market styles, thereby simulating diverse market environments.

Foundation models propel the next industrial digital transformation

Microsoft Research Asia has long recognized that the widespread adoption of AI in industry necessitates continuous technological exploration, experimentation, and breakthroughs. Through collaborative efforts with partners across various industries, we have developed open-source models, including the Qlib AI quantitative investment platform, the MARO multi-agent resource optimization platform, the FOST spatial-temporal prediction tool, and the BatteryML battery performance analysis and prediction platform. These industry-oriented AI platforms, tools, and models not only play a pivotal role in industry but also serve as critical data and foundational components for implementing cutting-edge foundation models.

Building upon successful experiences in industrializing AI, we have embarked on the exploration of domain-specific foundation models tailored for industry, drawing from the dimensions previously discussed. Our findings reveal that these foundation models possess significant potential to diverge from conventional large-scale model paradigms and profoundly impact industrial transformation.

Envision a future where foundation models empower knowledge management, extraction, and iterative processes across industries. Furthermore, we are actively investigating how foundation models can support companies in achieving automated research and development (R&D). This encompasses tasks such as automatically identifying R&D directions, generating algorithmic research proposals, automating R&D processes and scientific experiments, and iteratively refining research approaches. In essence, AI will autonomously propel data-centric industrial R&D, fundamentally revolutionizing industry operations.

Diagram: R&D agent: Automatically evolve the R&D cycle centered on industrial data.
Figure 5. R&D agent: Automatically evolve the R&D cycle centered on industrial data.

Foundation models are poised to become the driving force behind industrial digital transformation, mirroring the transformative impact of the internet and cloud computing. These models are set to unleash a new wave of industrial innovation. We eagerly anticipate collaborating with additional industry partners, immersing ourselves in real-world scenarios, and exploring diverse applications for foundation models within the industrial landscape, thereby fully unlocking their commercial potential.


Author

Dr. Jiang Bian currently serves as a senior principal research manager at Microsoft Research Asia. He leads the Machine Learning Group and the Industry Innovation Center at Microsoft Research Asia.

His team’s research spans deep learning, reinforcement learning, and privacy computing, with a focus on cutting-edge applications of AI in vertical domains such as finance, energy, logistics, manufacturing, healthcare, and sustainable development.

Dr. Jiang Bian has authored over a hundred research papers published in top-tier international conferences and journals. Also, he holds several U.S. patents. Dr. Jiang actively contributes to the academic community by serving on program committees for various prestigious international conferences and acting as a reviewer for leading international journals. In recent years, Dr. Jiang’s team has made significant strides in applying AI-based prediction and optimization techniques to critical scenarios across diverse fields, such as finance, logistics, and healthcare. Furthermore, they have generously shared relevant technologies and frameworks with the open-source community.

Dr. Jiang Bian completed his undergraduate studies at Peking University, earning a bachelor’s degree in computer science. He then pursued further studies at the Georgia Institute of Technology in the United States, where he obtained his Ph.D. in computer science.

The post Driving Industry Evolution: Exploring the Impact of Generative AI on Sector Transformation appeared first on Microsoft Research.

]]>
Guarding human health: AI empowers innovative applications in healthcare http://approjects.co.za/?big=en-us/research/articles/guarding-human-health-ai-empowers-innovative-applications-in-healthcare/ Thu, 16 May 2024 08:52:00 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1034823 “If life is a marathon, then health is the key to its duration.” Health is not only the foundation of happiness and societal progress but also a pivotal aspect of the intelligent era. AI’s integration into healthcare represents a transformative tool for maintaining and enhancing human well-being. From aiding early disease detection and progression prediction […]

The post Guarding human health: AI empowers innovative applications in healthcare appeared first on Microsoft Research.

]]>
“If life is a marathon, then health is the key to its duration.” Health is not only the foundation of happiness and societal progress but also a pivotal aspect of the intelligent era. AI’s integration into healthcare represents a transformative tool for maintaining and enhancing human well-being. From aiding early disease detection and progression prediction to personalizing precision medicine and accelerating medical research and drug development, AI’s unique value and potential are increasingly evident.

Over recent years, Microsoft Research Asia has deepened its collaboration with medical institutions and academic experts, attracting professionals in healthcare to foster AI’s profound application in medical health, thereby contributing to a healthier global community.

Early detection, early treatment: AI in disease detection and rehabilitation

The early diagnosis of diseases is vital for enhancing treatment outcomes and patient quality of life. Rehabilitation training, a critical component of many treatment regimens, plays a significant role in restoring patient functions. Traditional methods face limitations due to resource distribution, geographic constraints, and a shortage of medical professionals, affecting the accessibility and efficiency of healthcare services. AI can support medical staff by providing automated, intelligent early disease detection, enabling timely intervention and treatment.

AI-powered voice recognition for speech rehabilitation in cleft palate patients

Cleft palate and cleft lip, prevalent congenital deformities in the oral and maxillofacial region, often result in hyper nasal speech due to velopharyngeal insufficiency. Microsoft Research Asia, in collaboration with medical institutions, recognizes hypernasality detection as crucial for treating cleft palate patients.

Source: Operation Smile
Source: Operation Smile

Traditionally, speech-language pathologists assess hypernasality, but their limited availability and concentration in certain hospitals necessitate extensive, costly cross-regional patient travel. An automated hypernasality assessment method would not only aid pathologists in making accurate evaluations but also facilitate remote patient diagnosis and treatment, significantly reducing costs.

Leveraging transfer learning technology, Microsoft Research Asia has developed an innovative approach using an automatic speech recognition (ASR) model to enhance hypernasality assessment. This innovative model excels in extracting acoustic features and demonstrates robust generalization capabilities. Comparative studies on two cleft palate datasets reveal that it surpasses existing methods, significantly enhancing the precision of pathologists’ diagnostic processes.

Following hypernasality evaluations, physicians devise tailored speech therapy regimens for patients. Microsoft Research Asia has advanced this process by developing the Masked Pre-training Pronunciation Assessment (MPA) model. This model supports end-to-end training and adapts to both unsupervised and supervised learning environment, enabling user-friendly remote deployment. Utilizing reference texts and integrating masking with prediction tactics, the MPA model adeptly circumvents issues of misalignment or misrecognition in pronunciation assessments, offering more precise speech correction support for individuals with cleft palate.

Microsoft Research Asia is actively collaborating with healthcare providers to assess the feasibility of deploying this cutting-edge speech assessment technology. The goal is to enhance the efficiency of medical diagnoses and treatments, lower the financial burden on patients, and extend the benefits of this technology to numerous cleft palate sufferers in isolated regions.

Related papers:

Voice analysis model enhances Alzheimer’s disease screening

Alzheimer’s disease, a prevalent neurodegenerative condition primarily affecting the elderly, leads to progressive cognitive decline, including memory loss, language difficulties, and impaired reasoning. While there’s no cure for Alzheimer’s, early detection and intervention are key to decelerating its progression.

pexels.com
Source: pexels.com

Traditional diagnostic methods, such as brain scans, blood tests, and cognitive assessments, are extensive and expensive. However, research indicates that early Alzheimer’s can be detected through speech analysis, identifying symptoms like fluent aphasia and word retrieval challenges.

Capitalizing on this insight, Microsoft Research Asia has pioneered speech and language analysis technologies to detect Alzheimer’s indicators from sophisticated acoustic and linguistic data. A novel task-oriented model has also been introduced, correlating language descriptions with cognitive tasks.

In the ADReSS dataset’s subtask involving “Cookie Theft” picture descriptions (Figure 1), these methods attained a 91.4% accuracy rate. This innovative approach, merging speech and semantic analysis, significantly increases disease detection accuracy. The model’s high efficiency and performance on new test sets offer promising prospects for Alzheimer’s screening at scale.

Figure 1: “Cookie Theft” used for the descriptive task of detecting Alzheimer’s disease, presented by DementiaBank Pitt Corpus, Becker et al., in 1994.
Figure 1: “Cookie Theft” used for the descriptive task of detecting Alzheimer’s disease, presented by DementiaBank Pitt Corpus, Becker et al., in 1994.

Related paper:

Advancing autism diagnosis: Unsupervised detection of stereotypical behaviors

Autism Spectrum Disorder (ASD) typically manifests in early childhood, presenting challenges in social interaction and communication, accompanied by repetitive behaviors. These behaviors, which may include actions like persistent hand-flapping or head-banging, serve as vital indicators for ASD diagnosis. Early detection and intervention are crucial for improving outcomes, yet traditional methods relying on prolonged observation by specialists are not always efficient. Hence, the development of a swift, automated detection system is invaluable.

Source: unsplash.com
Source: unsplash.com

Traditional approaches have utilized computer vision and supervised learning to analyze video data of individuals with ASD. However, these methods face limitations due to the diverse range of stereotypical behaviors and privacy concerns associated with video data collection.

Addressing these challenges, Microsoft Research Asia, in collaboration with medical institutions, has innovated an unsupervised approach using video anomaly recognition. The new Dual-Stream Stereotypical Behavior Detector (DS-SBD) model leverages the temporal dynamics of human posture and repetitive motion patterns. Remarkably, DS-SBD requires only non-anomalous behavior for training and can identify previously unseen stereotypical behaviors during inference, such as identifying circling behaviors in the training data.

Figure 2: The DS-SBD model’s predictive accuracy spikes when detecting atypical behaviors such as abnormal hand clapping.
Figure 2: The DS-SBD model’s predictive accuracy spikes when detecting atypical behaviors such as abnormal hand clapping.

Extensive studies validate that DS-SBD’s unsupervised technique has increased the micro-average AUROC from 60.43% to 71.04% and the macro-average AUROC from 56.45% to 73.39%, signifying a substantial improvement in both accuracy and the scope of detectable behaviors. This breakthrough outperforms current state-of-the-art methods and is poised to set a new standard in the field. While DS-SBD marks a significant advancement in recognizing stereotypical behaviors, it represents only one facet of the broader ASD diagnostic process. Comprehensive early diagnosis and intervention strategies will benefit from continued interdisciplinary collaboration and societal engagement.

Related paper:

Advancing neonatal seizure detection through brainwave analysis

Epilepsy in children is a multifaceted, often recurring neurological disorder that predominantly occurs in the formative years (0-18 years). The prompt identification of epilepsy in newborns is vital to safeguard their developmental trajectory.

Source: unsplash.com
Source: unsplash.com

The genesis of epileptic seizures lies in the abnormal discharges of neurons within the brain, rendering brainwave analysis a pivotal tool for epilepsy diagnosis. Nonetheless, the nascent state of neonatal brain development, coupled with the pronounced noise in brainwave data and the marked variability among infants, renders the detection of neonatal epilepsy a formidable medical challenge.

Microsoft Research Asia and its collaborators have unveiled a deep learning paradigm, harnessing artificial intelligence and electroencephalogram (EEG) signals – dubbed the Spatial-Temporal EEG Network (STATENet). This model adeptly processes neural signals, nimbly adjusts to neonatal EEG channel variations, and addresses the challenges outlined above. Additionally, the team has introduced a model-level integration technique that synergistically combines outcomes from various spatial-temporal deep models, thereby bolstering the STATENet model’s generalization ability across diverse neonatal subjects.

Extensive studies utilizing a comprehensive dataset of real-world neonatal EEG data have demonstrated the STATENet model’s substantial enhancement in detection precision. The model’s area under the precision-recall curve (AUPRC) witnessed an improvement exceeding 30% relative to prevailing top-tier methods, equipping physicians with a novel diagnostic instrument for pediatric epilepsy.

Moreover, Microsoft Research Asia has pioneered the inaugural cross-dataset EEG model capable of deciphering any EEG data, thus achieving a ‘one-to-many’ brainwave comprehension. This breakthrough underpins the AI Neurologist system, designed to augment brainwave signal analysis in both clinical and research settings, elevating diagnostic accuracy from 75% to 90% in a case study. The associated models are now open source on GitHub, inviting global research participation to extend this technology’s impact across the medical spectrum and catalyze new diagnostic and therapeutic innovations.

Figure 3: The AI Neurologist system
Figure 3: The AI Neurologist system

Related papers:

Enhancing disease progression prediction and personalized care: The role of AI in precision medicine

Precision medicine represents a transformative approach to healthcare, tailoring treatments to individual patient profiles. Despite the promise, the complexity and unique nature of diseases present significant hurdles. AI emerges as a powerful ally, leveraging data analysis, pattern detection, and predictive modeling to forecast disease progression and risks. This capability is especially crucial in chronic disease management, aiding clinicians and patients in mitigating illness severity and preventing complications.

Graph neural networks: A novel approach to Parkinson’s disease progression

Parkinson’s disease, a prevalent neurodegenerative condition in seniors, progresses gradually. Patient conditions may remain stable or even improve over time with proper medication and therapy, maintaining optimal physical functions. Yet, Parkinson’s presents a spectrum of symptoms, from sleep disturbances to motor challenges, making disease progression prediction complex.

Source: pixabay.com
Source: pixabay.com

Researchers at Microsoft Research Asia advocate for the analysis of multimodal data to extract similar symptoms, thus enhancing prediction accuracy. Graph neural networks (GNNs) excel in mapping patient interconnections, forming networks where nodes represent patients linked by shared attributes. Selecting these attributes, however, largely demands expert knowledge and experience.

To overcome this, Microsoft Research Asia collaborated closely with medical institutions.  Based on recommendations from professional medical personnel, a new algorithm called AdaMedGraph was proposed. AdaMedGraph autonomously identifies key features to construct patient similarity graphs, harmonizing with existing knowledge and integrating expert-designed graphs into a comprehensive model. Unifying individual and collective data, this innovation simplifies the graph construction process.

Evaluated on two public datasets, the Parkinson’s Progression Markers Initiative (PPMI) and the Parkinson’s Disease Biomarkers Program (PDBP), AdaMedGraph outperformed benchmarks in predicting Parkinson’s progression over 24 months, setting the stage for personalized treatment strategies.

Moreover, AdaMedGraph’s robust generalization ability shines in metabolic syndrome prediction, achieving an AUROC of 0.675 on test datasets. This underscores the model’s efficacy in integrating intra- and inter-patient data for individual disease progression forecasting, inspiring new avenues in medical research.

Related paper:

Enhancing interdisciplinary collaboration to maximize AI’s potential

Microsoft Research Asia endeavors extend beyond mere disease detection and progression prediction. In collaboration with the medical sector, Microsoft Research Asia is probing the vast capabilities of AI in advancing drug development and medical research. This includes leveraging state-of-the-art technology in constructing artificial retinas, analyzing drug dependency, advancing cancer therapies, and exploring human metabolism, among other areas.

As AI technology matures and progresses, its practical application potential becomes increasingly evident. Yet, unlocking AI’s full value across diverse sectors necessitates essential interdisciplinary and cross-domain collaboration. “The synergistic collaboration with medical professionals from healthcare and research institutions have enabled Microsoft Research Asia to conduct extensive research projects within the medical and health domain. Our continuous exploration into AI’s application in critical healthcare aspects—ranging from disease detection to rehabilitation and disease progression forecasting—is a testament to our collective dedication. We invite more exceptional individuals passionate about interdisciplinary research to join us in our quest to safeguard human health and foster medical advancements,” expressed Lili Qiu, assistant managing director of Microsoft Research Asia.

Note: The medical health research conducted by Microsoft Research Asia, as discussed in this article, is purely exploratory and guided by professional medical entities and research institutions. Our aim is to further scientific advancement and offer theoretical and technical support for the future medical applications benefiting humanity. All research is in strict adherence to Microsoft’s responsible AI principles, upholding fairness, inclusiveness, reliability and safety, transparency, privacy and security, and accountability. The technologies and methodologies referenced herein are in the R&D phase and are not yet commercialized products or services, nor do they represent medical advice or treatment plans. For health-related concerns, we advise consulting with certified medical practitioners.

The post Guarding human health: AI empowers innovative applications in healthcare appeared first on Microsoft Research.

]]>
Dongqi Han: An interdisciplinary odyssey with AI and other fields http://approjects.co.za/?big=en-us/research/articles/dongqi-han-an-interdisciplinary-odyssey-with-ai-and-other-fields/ Tue, 07 May 2024 02:35:13 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1031622 Deciding between fundamental and applied research is a dilemma that confronts many in the scientific community. Dongqi Han, on the cusp of graduation, ambitiously aspired to bridge this divide by pursuing both avenues of research in his future endeavors. After a comprehensive evaluation, Dongqi Han selected Microsoft Research Asia (MSR Asia) for his initial foray […]

The post Dongqi Han: An interdisciplinary odyssey with AI and other fields appeared first on Microsoft Research.

]]>
Deciding between fundamental and applied research is a dilemma that confronts many in the scientific community. Dongqi Han, on the cusp of graduation, ambitiously aspired to bridge this divide by pursuing both avenues of research in his future endeavors.

After a comprehensive evaluation, Dongqi Han selected Microsoft Research Asia (MSR Asia) for his initial foray into fulfilling his aspirations. Prior to completing his doctorate, he undertook an internship at MSR Asia – Shanghai. During his six-month internship, Han gained firsthand experience of the lab’s commitment to pioneering basic research and its active engagement in fostering industrial collaborations, thereby facilitating the practical application of innovative findings. This experience sowed the seeds for his eventual formal engagement with the lab.

“MSR Asia has established a dynamic platform that seamlessly integrates fundamental research with practical industrial applications. Within this environment, I have the opportunity to work alongside eminent researchers, delving into the underlying principles and methodologies of intelligence. Moreover, I am able to harness the power of AI in domains such as healthcare. This synergy of theory and practice was a pivotal factor in my decision to join the lab after graduation. Undoubtedly, it represents an ideal launchpad for my career in scientific research,” Dongqi Han articulated.

Dongqi Han
Dongqi Han

Unveiling the core of intelligence: At the crossroads of AI and neuroscience

In recent years, the synergy between computer technologies like AI and various other fields has grown remarkably. MSR Asia is at the forefront, spearheading pivotal research and progressively amplifying its investments. Shanghai, a metropolis celebrated for its diversity and home to esteemed academic and leading medical institutions, offers fertile ground for the interdisciplinary fusion of AI and other fields. Consequently, the confluence of AI with neuroscience and other healthcare domains has emerged as a key research focus for MSR Asia – Shanghai.

Dr. Dongqi Han, a graduate of the Okinawa Institute of Science and Technology Graduate University (OIST) in Japan, majored in cognitive neuro-robotics, which encompasses the study of robotics integrated with neuroscience. As a neuroscientist, Dr. Han’s expertise significantly enhances the professional capabilities of MSR Asia’s interdisciplinary team, focusing on AI and brain science research.

Dongqi Han, positioned second from the left in the second row, alongside his team colleagues.
Dongqi Han, positioned second from the left in the second row, alongside his team colleagues.

Dongqi Han’s research primarily explores two areas: the convergence of AI with neuroscience, and AI’s applications in healthcare. He believes that this synergy is not only theoretically profound but also immensely practical. Dr. Han asserts, “To create more intuitive and effective interfaces, whether they be brain-computer or human-computer, a more intricate comprehension of the human cognitive and perceptual processes is essential.” In healthcare, neurological disorders impact approximately one billion individuals globally. Studies at the nexus of AI and brain science have the potential to enrich the knowledge base for both clinicians and patients, leading to improved diagnosis, prevention, and management of these conditions.

“AI and brain science are deeply intertwined, both delving into the core and mechanisms of intelligence. They encounter similar issues and can mutually benefit from shared insights.”

Indeed, AI technologies often draw from the brain’s neural networks, with structures like multilayer perceptron (MLP) and long short-term memory (LSTM) networks mirroring our own cognitive architecture. By examining human and animal cognition—learning, memory, decision-making—we can augment AI’s capabilities. A critical hurdle for AI is “catastrophic forgetting”, where new learning can erase old knowledge, a flaw not seen in the human brain. Dr. Han and his team colleagues are dedicated to resolving such AI challenges by gleaning lessons from our neurological processes.

Conversely, the robust data processing and modeling prowess of AI holds the potential to advance neuroscience research and its applications significantly. The human brain is composed of approximately 10^11 neurons interconnected by roughly 10^15 synapses. Harnessing AI to model the brain’s operational principles and computational strategies is essential to manage this immense data complexity and to substantiate the veracity of theoretical frameworks in neuroscience.

Presently, Dr. Han and his team colleagues, along with collaborators, have garnered preliminary findings from their research endeavors. Their study primarily examines two distinct human behavioral types: habitual and goal-directed behaviors [1]. For instance, habitual behavior, akin to the automatic act of selecting a familiar route home post-work, requires no conscious deliberation. In contrast, goal-directed behavior involves intentional consideration of both purpose and outcome, exemplified by plotting a course to the Civil Affairs Bureau to obtain a marriage certificate. While these behavioral models elucidate numerous aspects of biological conduct, the mechanisms by which the brain decides between these patterns and their mutual interactions remain an enigma.

Computational modeling of habitual and goal-directed behaviors
Computational modeling of habitual and goal-directed behaviors

Dongqi Han said, “Our team employs deep learning and machine learning methodologies to model and investigate the characteristics and underlying neural mechanisms of two distinct behavioral types. This endeavor not only contributes to the advancement of cognitive science and psychology but also serves as a source of inspiration for the development of innovative AI algorithms.”

Dr. Han’s recent research, conducted alongside his colleagues, revolves around emulating the brain’s neural circuitry. This has culminated in the development of a novel neural network model named CircuitNet [2]. Characterized by densely interconnected neurons within neural clusters and sparse connections across different brain regions, this model mirrors the human brain’s unique wiring. The team at MSR Asia is delving into the intricacies and benefits of such a neural architecture. Dr. Han, who has been involved in this project since his internship, has seen CircuitNet come to fruition through collaborative efforts, culminating in its selection for presentation at ICML 2023.

The model structure of CircuitNet
The model structure of CircuitNet

CircuitNet represents an advancement in neural network architectures, offering enhanced performance with a reduced parameter count, thus leading to greater energy efficiency. Remarkably, the human brain operates on less than 20 watts of power on average—a stark contrast to the substantial energy demands of large-scale AI models such as GPT-4, which may require hundreds to thousands of watts. Moving forward, Dongqi Han and his team colleagues are dedicated to unraveling the human brain’s mechanisms for energy conservation, drawing inspiration from CircuitNet’s design.

Dr. Han’s research extends to deep reinforcement learning and embodied AI, with the goal of refining AI to improve learning, decision-making, and real-world interaction capabilities of intelligent robots. He observes that while current large AI models predominantly generate content like text and images, embodied AI outputs dynamic actions, introducing a myriad of real-world uncertainties. For instance, a robot engaged in painting might encounter various challenges such as errors, equipment failure, or interference, all influencing the final outcome. Navigating these complexities requires sophisticated action selection processes. Dr. Han believes that by drawing parallels to human cognitive decision-making, we can expedite the advancement of embodied intelligence.

Fostering innovation: The power of interdisciplinary learning

Dongqi Han’s insatiable curiosity about the world fuels his wide-ranging and intense passion for scientific inquiry. His academic journey began with an undergraduate major in theoretical physics—a field he regards as exceptionally demanding. It necessitates a learner to possess stringent logical reasoning, robust mathematical prowess, and the capacity for experimental design and data analysis. These skills are instrumental in enhancing an individual’s intellectual caliber. Furthermore, physical science serves as the bedrock for numerous contemporary technologies, offering expansive applications.

During his doctoral studies, Dongqi Han took a photo with his mentor and lab mates.
During his doctoral studies, Dongqi Han took a photo with his mentor and lab mates.

During his undergraduate study in physics, Dongqi Han was deeply engrossed in the detection of physical parameters within tokamaks—devices pivotal for controlled nuclear fusion [4]. He considers nuclear fusion as a boundless, efficient, and clean energy source, believing its mastery to herald unparalleled benefits for humanity. Initially, Dr. Han viewed achieving controlled nuclear fusion as his scientific beacon. Yet, as his studies progressed, he discerned that the real hurdles to implementation lay not in the realm of theoretical physics, but within the ambit of engineering challenges. This revelation steered him to realize that his knowledge in theoretical physics might not directly contribute to the fruition of his aspirations. It was during this phase, amidst his tokamak investigations, that Dr. Han’s encounter with machine learning technology sparked a transformative shift in his academic pursuit, leading him to specialize in cognitive neuro-robotics for his doctoral research.

Embracing interdisciplinary learning, Dongqi Han embarked on a journey of discovery, building his foundation from the ground up. His initial year in the doctoral program was marked by an intensive curriculum that spanned basic neuroscience, machine learning, robotics control automation, and the integrative domains of cognitive science and psychology. Despite the rigorous challenges of navigating multiple disciplines, this cross-pollination of knowledge ignited innovative ideas. These insights have proven to be invaluable, enriching his research in AI and brain science.

In exploring the domains of reinforcement learning and embodied intelligence, Dongqi Han aims to integrate the methodical thinking approaches of physics. This will involve transferring the discipline’s rigorous thought processes (for example, commonly used measurement statistical methods) and logical reasoning, honed through physical experimentation, into the realm of AI study.

“The benefits of interdisciplinary learning are not only reflected in the cross-domain application of knowledge but also in the borrowing and inspiration of logical thinking. For example, when solving problems in neural network machine learning, traditional machine learning thinking often first considers data volume and model scale. Training in neuroscience can allow us to start from the perspective of the human or animal brain, thinking about problems in a more expansive and flexible way,” said Dongqi Han.

Collaborative synergy: Harnessing advanced technology for real-world solutions

Dongqi Han and his team colleagues are not only dedicated to interdisciplinary research but also actively engage in cross-disciplinary collaborations. They partner with universities and medical institutions around the globe, harnessing advanced technologies to tackle real-world challenges and forge superior solutions.

Dongqi Han and his team colleagues, in partnership with Fudan University, have pioneered an AI model for machine vision that replicates human visual perception [3], with a focus on boosting the energy efficiency of computer vision systems. Dr. Han notes, “Through collaborative research, we have discovered notable distinctions between human and computer vision. Human vision exhibits a significantly higher resolution at the fovea—the central point—compared to the peripheral vision. Additionally, the human brain processes information by transmitting spike-based signals, a characteristic that is mirrored in the architecture of spiking neural networks.” To harness the superior aspects of human vision, the team employed a spiking neural network to emulate a visual system with variable resolution and neuronal spike-based communication. This innovative model is poised to revolutionize energy efficiency, potentially achieving up to a hundredfold improvement in performing visual tasks with significantly reduced energy demands.

A human-eye-like spiking neural network performing a visual search task
A human-eye-like spiking neural network performing a visual search task

In a groundbreaking robotics initiative, researchers from MSR Asia and Korea Advanced Institute of Science and Technology (KAIST) have been pioneering the use of wearable, non-invasive devices to decipher human brainwave signals. This technology is poised to significantly enhance a robot’s ability to interpret human intentions with greater precision. “Consider an elderly individual reaching towards the kitchen—does this gesture indicate hunger or the need to retrieve an item? Or when they gesture towards a table adorned with both a water bottle and a tissue box, which is their intended choice?” Dongqi Han said, “KAIST’s robust expertise in neuroscience, combined with our advanced AI algorithms, creates a powerful alliance. Together, we’re pushing the boundaries of what’s possible in robotics, enabling more nuanced task execution through a fusion of brain science and AI. This interdisciplinary collaboration is sparking innovative research avenues for both teams.

From delving into games to persevering in scientific research

Dongqi Han’s diverse interests extend beyond his professional pursuits. He likes hiking and badminton, as well as various types of video games. “My passion for video games began in elementary school, evolving into a dedication to a particularly challenging game that demanded patience and perseverance. The process of advancing through persistent practice instilled in me a profound sense of accomplishment and shaped my approach to life, fostering patience and resilience against adversity” , Dr. Han shares.

This mindset has also permeated his research philosophy, which is characterized by deep, sustained inquiry. The supportive and diverse research environment at MSR Asia encourages vibrant collaboration and communication with colleagues from diverse disciplines and walks of life, bolstering his commitment to long-term research and connecting him with fellow researchers who share his vision.

Related links:

[1] Synergizing Habits and Goals with Variational Bayes, Nature Communications, 2024
(preprint at https://osf.io/preprints/psyarxiv/v63yj)

[2] CircuitNet:A Generic Neural Network to Realize Universal Circuit Motif Modeling
https://proceedings.mlr.press/v202/wang23k/wang23k.pdf

[3] Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network
https://arxiv.org/abs/2310.06578

[4] In situ relative self-dependent calibration of electron cyclotron emission imaging via shape matching
https://doi.org/10.1063/1.5038866

The post Dongqi Han: An interdisciplinary odyssey with AI and other fields appeared first on Microsoft Research.

]]>
Building a bridge of communication for cross-industry research http://approjects.co.za/?big=en-us/research/articles/meet-these-multifaceted-women-who-are-an-indispensable-part-of-the-academic-ecosystem-puzzle-in-the-computing-field/ Fri, 29 Mar 2024 08:04:08 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1019856 In the Accelerator team of Microsoft Research Asia, women from diverse countries, with diverse academic backgrounds and life experiences, are weaving bridges for interdisciplinary research in their own distinctive ways. These remarkable women embrace unconventional workplace philosophies and life approaches. Their enthusiasm lies in fostering connections between the lab and a multitude of partners within […]

The post Building a bridge of communication for cross-industry research appeared first on Microsoft Research.

]]>
three women

In the Accelerator team of Microsoft Research Asia, women from diverse countries, with diverse academic backgrounds and life experiences, are weaving bridges for interdisciplinary research in their own distinctive ways. These remarkable women embrace unconventional workplace philosophies and life approaches. Their enthusiasm lies in fostering connections between the lab and a multitude of partners within an open, diverse, and inclusive environment. With their passion and wisdom, they contribute vital pieces to the expansive ecosystem of computer science.

Diversity and inclusion are the drivers of innovation and creativity

Miran Lee, Accelerator Director at Microsoft Research Asia

Nationality: Korean

Relationship management expert/ Dual identity of entrepreneur and professional manager/ Hiking enthusiast

Miran Lee

Q: What are your main work responsibilities? Which do you find most challenging?

Miran Lee: I am responsible for academic collaboration in Korea and the Asia-Pacific region. My main work responsibilities include establishing strategies and directions for these collaborations, identifying business opportunities, designing various programs and projects, and managing relationships with our partners. At the Accelerator team, we establish solid relationships with our partners through research collaboration, curriculum development, academic exchanges, etc., to promote the advancement of computer technology and foster talent development in the computer field.

During this process, given the diverse set of responsibilities that I oversee, I think the most challenging aspects of my position are effectively managing and prioritizing multiple projects, maintaining strong partnerships through good communication and management, and keeping abreast of academic trends to deliver relevant and impactful projects or plans for Microsoft Research Accelerator.

Q: If you see yourself as a piece of the academic ecosystem puzzle, how do you work with others to complete this puzzle? While promoting academic exchange and collaboration, what are your secrets or suggestions for success?

Miran Lee: As a piece of the academic ecosystem puzzle, the best way to work with others to complete the puzzle is through collaboration and communication. Building strong partnerships with students, researchers, and faculty members requires a collaborative approach where all parties are invested in achieving common goals. Effective communication is key to establishing trust and understanding among partners, and it helps to create a positive working environment where everyone feels valued and heard.

I believe that in order to promote academic exchanges and cooperation, first, you must set clear goals: Clearly define the objectives of the collaboration and ensure that everyone involved understands what is expected of them. Next, you must foster an open and inclusive environment, encourage open communication and feedback, and create a safe space where everyone feels comfortable expressing their ideas and opinions. Actively listening to others’ ideas and acknowledging their contributions is also an important part of building trust and respect among collaborators. And being open to new ideas and being willing to adapt to changing circumstances can help to ensure that the collaboration remains relevant and effective.

Q: Before joining Microsoft Research Asia, you worked at various tech companies and had your own business. Why did you choose to join Microsoft Research Asia, and what incentivized you to stay here for ten plus years?

Miran Lee: Microsoft Research Asia provided me with a unique opportunity to leverage my expertise in the areas of technology and business and to collaborate with some of the brightest minds in the industry. I was particularly drawn to the mission of Microsoft Research Asia, which is to advance the state of the art in computer science and to develop innovative technologies that benefit society as a whole. Furthermore, Microsoft Research Asia offers a unique culture that is diverse and inclusive, allowing me to work with colleagues from diverse backgrounds and to be part of a team that is making a real impact on the world.

Q: How have your previous cross-industry experiences influenced your current work?

Miran Lee: I’ve held management positions at leading tech companies. This has allowed me to develop and adopt a more reasonable strategy for driving long-term partnerships and innovation in key areas. My experience in academia as an adjunct professor at Anyang University has given me a deep understanding of the needs of students, researchers, and faculty members, and that has helped me in developing effective plans for collaboration. Additionally, my experience in software development and technology has enabled me to work effectively with Microsoft Research Asia’s research groups. Overall, my cross-industry experiences have allowed me to bring a unique perspective and have undoubtedly contributed to my success in my current role.

Q: Microsoft Research Asia’s employees come from different countries and have different cultures and backgrounds. How do you feel about cross-cultural communication and work?

Miran Lee: Cross-cultural communication is a vital part of Microsoft Research Asia, and it is also one of the important driving forces of our innovation and creation. By working with people from different backgrounds and cultures, we can gain new perspectives and insights, and develop a deeper understanding and appreciation of different ways of thinking and working.

Effective cross-cultural communication and work has also helped us develop stronger interpersonal skills and enabled us to be active listeners and to be empathetic. It is important to be open-minded and respectful of different opinions and perspectives, and to be willing to adapt to various communication styles and work practices to accommodate different cultural norms and expectations.

Q: What are your hobbies outside of work? Why do you like these hobbies or what do they bring to you?

Miran Lee: My hobbies are reading and hiking. Reading offers numerous benefits for personal growth, intellectual development, and enjoyment. Hiking is a physical activity that can improve health, build strength, and boost endurance. Being in nature can also help to reduce stress, improve your mood, and promote mental clarity. These hobbies help me to relax and enjoy life and often allow me to gain new, unexpected skills and interests, and to connect with others who share a love for the outdoors.

Using my ability to bridge different industries

Mawo Kamakura, Senior Research Program Manager at Microsoft Research Asia

Nationality: Japanese

Protector of traditional cultural heritage/ Gourmet enthusiast/ Fan of tea culture

Mawo Kamakura

Q: How did you come to join Microsoft Research Asia?

Mawo Kamakura: It all started with an email from my graduate supervisor. Many of my older classmates and colleagues in my laboratory had interned at Microsoft Research Asia, and so for me, Microsoft Research Asia was a reasonably familiar place. However, I was doing research in a fusion area with the humanities, so I honestly never thought I’d cross paths with Microsoft Research Asia myself. 

When I learned about the nature of the work at Microsoft Research Asia’s  Accelerator team from my supervisor’s email, I began to seriously consider this opportunity. On one hand, my previous work experience in different countries such as Egypt, Cambodia, and Italy made me highly adaptable to working with people from different cultural backgrounds; on the other hand, I had always worked with people from cultural heritage organizations (including UN agencies and government organizations) and computer science fields. I felt that these unique experiences would enable me to play a unique role at the research institute.

My decision proved to be a correct one. Microsoft Research Asia has given me the space to develop my expertise and has given me the satisfaction of living up to my true potential.

Q: You have been committed to protecting traditional culture with 3D digital technology. What the role do you think AI is playing in the protection of cultural heritage?

Mawo Kamakura: Since around 2005, I have participated in projects for 3D digital conservation of cultural heritage objects, including those at UNESCO World Heritage sites, in Cambodia, Italy, Egypt, and Japan. For example, I participated in the 3D data storage of excavation components at the excavation site of the Second Solar Boat.

Due to the recent advances in AI and storage technology, as well as the downsizing and diversification of display devices, large-scale data storage of cultural heritage has become increasingly easy. At the same time, it is also now possible to visualize and display data on things that have never been known before, such as deciphering ancient languages written on stone monuments and caves or recognizing letters and paintings that are unclear. In short, AI technology has brought to us greater and more flexible choices and is enabling us to better preserve the cultural treasures of humanity. 

Q: What are your main work responsibilities currently?

Mawo Kamakura: I am responsible for overseeing Microsoft Research Asia’s academic cooperation in Japan, and I am also a visiting scholar at the Center for Spatial Information Science. Specifically, my work can be organized into three areas: Academic Engagement, Industrial Engagement, and Communication. 

I hope to enable more researchers from different fields to discover the value and significant of collaborating with Microsoft Research Asia, and I will do my best to establish a connection between the two parties, promote collaborative innovation, and leverage my ability of connecting A to B to forge bridges among different fields.

Q: How do you use your knowledge and skills from different fields to solve the problems you encounter in your current work?

Mawo Kamakura: I always keep in mind two things whenever I encounter a problem. One, I look for ways to apply what I have learned from my past experiences, and two, I look forward to improving myself through new experiences and learning opportunities. I also try to break things down to make them as simple as possible and try to discover what the truly important part of the issue is.

Q: Microsoft Research Asia’s employees come from different countries, with different cultures and backgrounds. What are your feelings about cross-cultural communication and work?

Mawo Kamakura: Many members of Microsoft Research Asia are from Asia, and so we have many cultural similarities. For example, in addition to English, some colleagues can communicate with me using Chinese characters. Even though we have completely different pronunciations of the characters, it is still a very interesting experience. These exchanges can help me understand different perspectives and have allowed me to benefit greatly.

Being inclusive is one of the characteristics of an excellent organization. This is why Microsoft Research Asia can gather many outstanding talents with different backgrounds, personalities, and expertise. Here, everyone can freely express and discover themselves and can find great teachers and friends among their colleagues. This is what’s most attractive about the institute.

Q: What are your hobbies outside of work? Why do you like these hobbies or what do they bring to you?

Mawo Kamakura: During my travels to various countries, I’ve found myself very interested in the different cuisines I’ve come across. For example, tea tasting is one of my favorite food cultures. I like a tea called Omija-cha that my colleague gave me, and I also like the small blue citrus tea I first encountered at a Microsoft Research Asia event.

Trying out various foods from different cultures has allowed me to become more aware of myself in the world. I also like to cook. Cooking is a time for me to clear my mind, so I think it is refreshing and helps me maintain a good work-life balance.

Making the world a better place with technology

Beibei Shi, Senior Research Program Manager at Microsoft Research Asia

Nationality: Chinese

Environmental science/computer science from beginner level to proficiency/traveling with family

Beibei Shi

Q: What is your current job, and what do you think is the most challenging part of it?

Beibei Shi: I am mainly responsible for the Open Collaborative Research Program at Microsoft Research Asia and focus on the research topics of sustainable development and trust. I’m also responsible for the StarTrack Program and university relations in multiple regions.

In summation, my work is akin to building bridges between two originally isolated places, creating reaction catalysts for two independent substances, and laying a solid foundation for cooperative relationships. The challenge is to identify what needs to be bridged, catalyzed, and/or stabilized under different circumstances, and to achieve this in the most appropriate way.

Q: With your professional background in environmental science, how would you view the impact of technological development on the natural environment and on society?

Beibei Shi: Since I was a child, I’ve always like to go to places with a lot of great natural landscapes. I chose to study environmental science because I wanted to protect the environment and our planet Earth. While at school, I discovered that many courses in environmental science intersected with computer science. Later, there was an interdisciplinary scientific research project between ecology and computer science that served as my opportunity to step into the computer science industry. Since then, I’ve undertaken many cross-industry projects in environmental science and computer science. During this process, I’ve come to believe that the intention behind good technology is the same as my original intention for going into environmental science, and that is to make the world a better place.

Undeniably, the development of technology has caused damage to the natural environment and excessive consumption of resources. To solve these problems, we’ll actually further need the power of technological innovation, such as to replace the development method that’s destroying the environment with environmentally friendly technologies. But this type of scientific and technological innovation is not easy, requiring cooperation of scholars from various disciplines, long-term research, and integration and innovation. At Microsoft Research Asia, a group of researchers have joined forces with experts in the fields of environmental science, earth science, and climatology from many universities in Asia through our Carbon Negative Computing Research Collaboration Program to work together and pursue this arduous but meaningful research direction. I’m very fortunate to be able to join them in this.

Q: How has your interdisciplinary background influenced your work and life?

Beibei Shi: Throughout my career, I’ve constantly been moving out of my comfort zone and going into new territories. From environmental science to computer science, and from a researcher to a research project manager, I’ve had to keep learning things that are unfamiliar to me. Luckily, I’ve had excellent colleagues around me who’ve given me a lot of help and trust along the way, helping me grow rapidly.

The good thing is that my unique professional background and cross-disciplinary perspectives and way of thinking has enabled me to bring valuable contributions to the team as well. During this process, I’ve been able to gradually face the changes in life and at work with a better attitude. I’ve been able to cope with novel things and challenges in new fields and to see the wonders of the world from a broader point of view.

Q: How do you perceive the impact of AI on society at large?

Beibei Shi: With the development and popularization of AI, and especially with the recent dawn of the era of large models, we might soon see a new Industrial Revolution driven by advanced AI technology. Actively mastering technology and using it the right way will definitely create great value for social development.

In this process, as professionals engaged in AI research, we value more than ever the belief that technology can promote the development of society towards sustainability and trust. Because of this, Microsoft Research Asia launched the Carbon Negative Computing Research Collaboration Program in 2020 in the hopes of giving full play to the advantages of AI technology in the field of data and computing to aid in the sustainable development of the world. Then in 2021, we launched the Responsible AI Interdisciplinary Exploration Program, where we work with experts in humanities fields such as law, psychology, and sociology to promote the development of those fields while creating reliable and trustworthy artificial intelligence technologies to better support the needs of societal development in the future.

Q: What are your hobbies and what have you gained from them?

Beibei Shi: My biggest hobby is traveling. When I was little, I dreamt about traveling around the world. I wanted to go see different mountains, lakes, rivers, and seas, and I wanted to experience different cultures and customs. Over the span of my life, my travel companions have ranged from my parents to my friends and partners, and now to my family members, including my daughter. Every trip has been a happy experience, from the early preparation stages to looking back on the trips after they’ve ended.  

Traveling has allowed me to gain a lot of new knowledge and skills. It’s enriched my way of thinking and allowed me to face life and work in a better way.

The post Building a bridge of communication for cross-industry research appeared first on Microsoft Research.

]]>
Changho Hwang: Pursuing long-term research takes constant self-persuasion http://approjects.co.za/?big=en-us/research/articles/changho-hwang-pursuing-long-term-research-takes-constant-self-persuasion/ Wed, 27 Mar 2024 04:24:21 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=1018299 “Before I became an intern at Microsoft Research Asia (MSR Asia), my knowledge of the institute was a paper on ResNet (Residual Network),” said Changho Hwang. “In the paper, researchers at MSR Asia introduced the idea of ‘residual learning’ and made ResNet a milestone in the development of computer vision technology.” Hwang’s first impression of […]

The post Changho Hwang: Pursuing long-term research takes constant self-persuasion appeared first on Microsoft Research.

]]>
“Before I became an intern at Microsoft Research Asia (MSR Asia), my knowledge of the institute was a paper on ResNet (Residual Network),” said Changho Hwang. “In the paper, researchers at MSR Asia introduced the idea of ‘residual learning’ and made ResNet a milestone in the development of computer vision technology.”

Hwang’s first impression of MSR Asia was that it was the place for cutting-edge technological research conducted by top innovative talents.

Join MSR Asia and “Do the right thing in the right way”

During the second year of his PhD studies, Hwang became an intern at MSR Asia thanks to the recommendation of his supervisor at the Korean Advanced Institute of Science and Technology (KAIST). After two internships at the lab, one during the winter of 2018 and one during the summer of 2019, Hwang developed a new understanding of MSR Asia. He decided that upon graduation, his career goal would be to join MSR Asia and further pursue forward-looking technological research. Hwang said, “At the time, some of my classmates and colleagues had introduced me other laboratories and companies, but my internship experiences made MSR Asia a clear choice for me. I preferred the working environment and research atmosphere here. This place enabled me to focus on the areas I was really interested in.”

According to Hwang, what’s most attractive about MSR Asia is that it always does the right thing in the right way. MSR Asia never blindly follows technology trends. Rather, it sets unique strategies and research directions and always looks at the bigger picture while focusing on cutting-edge technologies.

The people Hwang worked with during his internships and the diverse research directions found at MSR Asia were also important reasons behind Hwang’s decision to join the lab. MSR Asia boasts a group of extremely professional yet convivial researchers. Hwang’s mentor during his internships was highly approachable and offered him a great deal of freedom and solid academic support for his research. His colleagues were also warm and helpful both in and out of the office, and allowed Hwang to feel at home despite being abroad. Furthermore, among the cutting-edge research endeavors undertaken by MSR Asia, Hwang discovered not only research areas and projects that matched his expertise in electrical engineering but also a multitude of interdisciplinary research directions that offered researchers opportunities to expand the breadth and depth of their academic pursuits. Therefore, after graduating from his doctorate program in 2022, Hwang quickly decided to join MSR Asia and became a member of the Networking Infrastructure Group. He currently holds a position as a researcher at MSR Asia – Vancouver.

a group of people posing for the camera
Changho Hwang (C) poses for a group photo with his peers after his second internship at MSR Asia

Enhancing AI system performance: Hone achievements through progressive research

During his internship, Hwang was assigned to a team tasked with optimizing the performance of GPUs that support the operation of artificial intelligence (AI) models. At the time, Hwang’s mission was clear: to find a new way to improve the throughput and utilization rate of AI systems through software-hardware collaborative designing. However, scientific research is often a long journey, and many studies do not yield immediate results. As an advocate of long-term research, Hwang did not see himself as a mere passerby in the research team. Instead, he continued to work with them for two years after returning to school. The subsequent research results achieved by the team won them the Best Paper Award at the MLArchSys 2022 conference.

Paper Title: Towards GPU driven Code Execution for Distributed Deep Learning

Paper link: https://chhwang.github.io/pubs/mlarchsys22_hwang.pdf (opens in new tab)

With the development of large models, GPUs had become increasingly crucial for training and deploying AI models, and the performance and utilization efficiency of GPUs directly affected AI development. Upon joining MSR Asia as a researcher, Hwang continued to focus on this area, except now, he was a project leader rather than a mere participant.

Hwang believed that the most advanced deep learning applications today required a large number of parallel GPUs to provide sufficient computing power. However, communication efficiency between GPUs and CPUs served as a restricting factor that affected the performance of AI models. This was because CPUs played the role of chief commander in the current GPU-driven communication mode of AI systems, where each CPU was responsible for assigning tasks to multiple GPUs, but there existed considerable delay in message transmission between them, leading to low efficiency in task execution and a waste of GPU resources.

In his research, Hwang’s goal was to enable a GPU to command itself, thereby improving communication efficiency. To this end, he and his colleagues in the group designed a GPU-driven code execution system, along with a DMA engine that could be directly driven by the GPU. This allowed GPUs to directly solve communication problems that were used to require CPU commands, thus reducing communication latency in AI systems and improving the utilization rate of GPU computing resources. This new method freed up occupied CPU resources in earlier communication modes, allowing CPUs to focus on their own work and GPUs to perform autonomous scheduling as well as to do what it did best: provide higher computational performance for AI models. This research has demonstrated that an AI system based on distributed GPUs are capable of having GPUs manage task scheduling on their own. The paper on this research has been accepted by the NSDI 2023 conference.

Paper Title: ARK: GPU driven Code Execution for Distributed Deep Learning

Paper link: https://www.usenix.org/system/files/nsdi23-hwang.pdf (opens in new tab)

“System performance optimization is an eternal topic,” said Hwang. “In the past decade or so, we have witnessed the rapid development of AI, with one of the main driving forces being the continuously strengthening support of computing power. Adequate computing power enables steady improvement in system performance, resulting in larger and more powerful AI models. Currently, there are two leading approaches to improving system performance: one is to enhance hardware such as GPUs, and the other is to propose new AI algorithms. Both approaches are challenging, and hardware design and manufacture can be very costly.”

With this understanding, Hwang and his colleagues proposed a hardware-algorithm collaborative design method that could serve as another effective solution for enhancing the performance of AI systems. And so, after successfully proving that GPUs could autonomously schedule and achieve performance improvement, Hwang went on to explore GPU scheduling algorithms to avoid scheduling conflicts and further improve communication efficiency among GPUs. “We hope that in the future, GPUs will be able to achieve autonomous scheduling without requiring additional DMA engines, thereby bringing the performance of AI systems to a new level,” Hwang said.

“At MSR Asia, I can freely choose my research direction”

The long-standing culture of openness, inclusion, and diversity at MSR Asia has been a great attraction to Hwang, who has now worked in this lab for more than 12 months with a deepening appreciation for it. According to him, “MSR Asia is more like a laboratory—it’s a true research institute. Everyone here is equal, and there is transparency in our work. People understand each other’s ideas and are able to stay on the same page. At MSR Asia, we enjoy the greater liberty of being able to choose our own research direction.”

In addition to fostering a free academic atmosphere within itself, MSR Asia also works closely with the global academic community, which certainly includes its counterparts in South Korea, in academic exchanges and talent training. For example, MSR Asia, together with Tsinghua University, Peking University, National University of Singapore, Seoul National University, and many other Asian universities, established the OpenNetLab, an open networking community and platform to promote the application and development of AI in networking research. Hwang’s supervisor at KAIST is also involved in this collaboration. Another example is the Microsoft Research Collaboration Program with MSIT (the Ministry of Science and ICT) of Korea, which has been a long-lasting program for talent training and academic research elevation targeting South Korean colleges and universities. For more than a decade, Microsoft Research has partnered with the Korea MSIT has served as a bridge for academic exchange with  the South Korean academic community. Through programs such as these, scholars have carried out in-depth scientific research collaborations and enriched the talent pool for global computing research. After his internship at MSR Asia, Hwang had also participated in the MSIT program and the resulting paper won the Best Paper Award at the APSys 2021 conference.

Paper Title: Accelerating GNN Training with Locality Aware Partial Execution

Paper link: https://dl.acm.org/doi/10.1145/3476886.3477515 (opens in new tab)

graphical user interface, website

As a part of MSR Asia and also the broader computer science academic ecosystem, these diverse programs for exchange and collaboration not only yield cutting-edge research achievements, but also serve as a “matchmaker” between MSR Asia and many scholars and students. In South Korea alone, more than 200 interdisciplinary talents have interned at MSR Asia to date, and many outstanding ones like Hwang have become researchers of MSR .

Thoughts on adhering to long-term research

Conducting scientific research is often a long and arduous journey, and maintaining a commitment to long-term research is no simple endeavor. Besides being persistent, Hwang also has a set of methods and insights for upholding this commitment.

Hwang believes that maintaining a high level of enthusiasm for research is essential to building a career in this field. He himself, for one, enjoys the entire process of discovering and solving problems in scientific research. “The goal of some jobs,” he said, “is to find the best way to avoid problems, but the mission of scientific research is to identify, confront, and solve problems. I sincerely enjoy the entire research process from discovering problems to solving them.”

Changho Hwang

In the course of long-term research, researchers may inevitably encounter obstacles or produce unsatisfactory results. For intstance, some of Hwang’s papers had been repeatedly rejected by conferences he’d submitted them to. Hwang believes that you should not be discouraged or resentful when you encounter these frustrations, but should rather reflect on yourself, review existing work, identify the problems, and then invest in new research. Hwang explained, “It is a process of self-persuasion, where you show yourself the value of research.”

When facing difficulties, Hwang believes that you should not confine yourself to the problem at hand, but should look away and relax for a while. Hwang, for example, would play the piano or chat with peers to break free from the invisible shackles. “A shift in perspective might take you right to the solution.”

The post Changho Hwang: Pursuing long-term research takes constant self-persuasion appeared first on Microsoft Research.

]]>
Shaping the Future with Societal AI: 2024 Microsoft Research Asia StarTrack Scholars Program Highlights AI Ethics and Interdisciplinary Integration http://approjects.co.za/?big=en-us/research/articles/shaping-the-future-with-societal-ai-2024-microsoft-research-asia-startrack-scholars-program-highlights-ai-ethics-and-interdisciplinary-integration/ Mon, 18 Dec 2023 09:15:58 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=993510 The rapid development of Generative Pre-trained Transformer (GPT) technologies and the advent of the large model era have significantly impacted every facet of the information world. As AI steps into the complex web of human society, it is transitioning from a mere technological tool to a social entity with significant influence. In the third installment […]

The post Shaping the Future with Societal AI: 2024 Microsoft Research Asia StarTrack Scholars Program Highlights AI Ethics and Interdisciplinary Integration appeared first on Microsoft Research.

]]>
The rapid development of Generative Pre-trained Transformer (GPT) technologies and the advent of the large model era have significantly impacted every facet of the information world. As AI steps into the complex web of human society, it is transitioning from a mere technological tool to a social entity with significant influence. In the third installment of our exclusive series on the 2024 Microsoft Research Asia StarTrack Scholars Program, we explore the critical role of AI in society, emphasizing the need for AI, as advocated by the Societal AI team at Microsoft Research Asia, to understand and adhere to human societal values. To explore the full scope of the 2024 program, visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.

Over the past year, artificial intelligence has exhibited remarkable advancements, surpassing previously held expectations. Amidst the excitement, a crucial question arises: Is technology itself neutral in terms of values? After all, the intelligence of Large Language Models (LLMs) is based on human-generated corpora, which inevitably are embedded with human biases and values, influencing the reasoning and judgment of machines.

“The rapid development of artificial intelligence is increasingly impacting human society,” said Xing Xie, Senior Principal Research Manager at Microsoft Research Asia. “To ensure that AI evolves as a socially responsible technology, our research is directed towards ‘Societal AI.’ This approach involves interdisciplinary collaboration with social sciences, including psychology, sociology, and law, to explore how AI can understand and adhere to the mainstream values of human society. Our goal is to enable AI to make decisions aligned with human expectations and develop more accurate evaluation models to precisely gauge its actual value orientations and level of intelligence.”

To ensure that AI adheres to the principle of benefiting humanity, Xing Xie and his colleagues at Microsoft Research Asia believe it’s imperative to not only develop technologies aligned with this objective but also to establish rules and methodologies that extend beyond the technological realm. Their area of study involves value orientations as well as AI safety, verifiability, copyright, and model evaluation, which are all closely related to social responsibility.

Preparing for Greater Impact

Years ago, Microsoft identified “Responsible AI” as a core principle in AI research and development, encompassing aspects such as privacy, security, fairness, and explainability. This foresight has become increasingly relevant with AI’s explosive growth over the past year, making Societal AI a forward-looking research direction.

As AI’s capabilities increase and its societal impact expands, even a minor misalignment in its values could potentially trigger significant consequences. As Microsoft President Brad Smith suggests in his book Tools and Weapons: The Promise and the Peril of the Digital Age, the more powerful the tool, the greater the benefit or damage it can cause. Therefore, in pursuing more powerful AI, it is crucial to simultaneously focus on AI’s role in social responsibility and prepare for any potential impacts on human society. The aim of Societal AI is to ensure that AI becomes a technology accountable to society.

Setting Value-Based Guardrails for Artificial Intelligence

Xing Xie and his colleagues believe that in building Societal AI, they should consider the following: value alignment, data and model safety, correctness or verifiability, model evaluation, and interdisciplinary collaboration.

Value alignment, a nascent field, has already gained widespread recognition for its importance in both industry and academia. In simple terms, it means ensuring that AI, when cooperating with humans and society, follows the same mainstream values as humans and achieves goals consistent with human expectations. This approach helps avoid unexpected outcomes from AI automation or the misuse of AI in ways that are detrimental to human welfare. Traditional practices such as reinforcement learning from human feedback (RLHF) are being reevaluated. In Societal AI research, the team’s goal is to elevate AI from merely following human instructions and preferences to embracing basic human values, allowing AI to assess its own actions based on these values. To achieve this, the team has initiated the Value Compass Project, which focuses on directly aligning AI models with human values established in sociology, ethics, and other areas.

pic

According to the team, the challenge they are faced with in this endeavor involves three parts: first, translating abstract human values into concrete, measurable, and practical definitions for AI; second, technically regulating AI behavior with these value definitions; and third, effectively evaluating AI to demonstrate its alignment with genuine human values.

Ensuring AI Remains within Human Oversight

As AI’s intelligence leaps ahead, its evaluation faces new challenges. Traditional task-oriented machine learning allows for quantifiable evaluation standards, but as AI’s work types diversify, new methodologies are needed. To address this, Xing Xie and his team have developed an evaluation route based on the PromptBench architecture, which covers infrastructure, various tasks and scenarios, and evaluation protocols.

pic

In terms of specific evaluation methods, they are exploring two approaches. One is a dynamic and developmental evaluation system. Current static public benchmarks have limitations, such as an inability to accurately evaluate the improving intelligence of large models and running the risk of being fully mastered by them, akin to memorizing a whole exam database. Because developing a dynamic and evolving system is key to achieving fair and accurate AI evaluation, the team has developed the DyVal algorithm for dynamic evaluation of large language models, generating test samples through a directed acyclic graph and allowing for scalable complexity.

The other approach views AI as a general intelligence agent similar to humans and uses methodologies from social sciences such as psychology and education for AI evaluation. The team has initiated interdisciplinary collaboration with experts in psychometrics and believe that the methodologies used for evaluating unique human functions can apply to general AI, offering abilities that traditional benchmarks lack. Their latest paper details the feasibility and potential of psychometrics in AI evaluation.

Cross-Industry and Cross-Disciplinary Collaboration

Just as methodologies from psychology are essential for AI testing, blending Societal AI with other disciplines, especially social sciences, is critical. Key areas such as value alignment, safety, and model evaluation in AI require integration with social sciences, since computer science alone cannot fully address many of the challenges.

Unlike previous interdisciplinary collaborations in computer science, Societal AI presents unique challenges, such as bridging significant disciplinary divides, and requires new approaches. It not only needs to integrate the arts and sciences but also needs to reposition computer technology as an entity that is being empowered rather than one that empowers. Social sciences provide fresh perspectives and tools, necessitating the construction of new theoretical frameworks and methodologies from scratch.

While researchers in engineering, biology, physics, chemistry, and mathematics have begun integrating AI into their studies, there is a significant dearth of talent capable of supporting interdisciplinary research, particularly in social sciences like sociology and law. Balancing and combining the fast-paced, iterative approach of computer science with the long-term research and observational methods of social sciences remains an area of exploration.

In addressing these unresolved and challenging issues, Microsoft Research Asia StarTrack Scholars Program advocates an open attitude, encouraging dialogue and joint experimentation with researchers from various disciplines to discover viable solutions.

As we delve deeper into the realms of Societal AI, we are increasingly recognizing the need for fresh perspectives and innovative minds to tackle the intricate challenges that lie at the convergence of technology and human society. If you are an aspiring young researcher with a zeal for exploring how AI can be made to align with human societal values and are eager to contribute to groundbreaking work in AI safety, verifiability, and value alignment, we invite you to apply to the Microsoft Research Asia StarTrack Scholars Program. Join us in this exciting journey to shape AI into a responsible, value-driven technology that resonates with and enhances human society. Applications are now open for the 2024 program. Apply now and become a part of this transformative journey. For more details and to submit your registration, visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.

References:

1. Yao, J., Yi, X., et al. (2023). “From Instructions to Intrinsic Human Values — A Survey of Alignment Goals for Big Models.” arXiv. Access the paper (opens in new tab)

2. Yi, X., & Xie, X. (2023). “Unpacking the Ethical Value Alignment in Big Models.” Journal of Computer Research and Development, 60(9), 1926-1945. DOI: 10.7544/issn1000-1239.202330553. Access the paper (opens in new tab)

3. Zhu, K., Wang, J., et al. (2023). “PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts.” arXiv. Access the paper (opens in new tab)

4. Microsoft. PromptBench. GitHub repository. Access PromptBench (opens in new tab)

5. Zhu, K., Chen, J., et al. (2023). “DyVal: Graph-informed Dynamic Evaluation of Large Language Models.” arXiv. Access the paper (opens in new tab)

6. Wang, X., Jiang, L., et al. (2023). “Evaluating General-Purpose AI with Psychometrics.” arXiv. Access the paper (opens in new tab)

7. Xie, X. “Aligning AI with human values is as important as making AI intelligent (让AI拥有人类的价值观,和让AI拥有人类智能同样重要).” Microsoft Asia Research Asia WeChat Account, October 26, 2023, 5:02 PM, Beijing. Access the article (opens in new tab)

8. Smith, B. (2023, May 30). Governing AI: A blueprint for our future. In Tools and Weapons Podcast (Season 2, Episode 6). Microsoft News. Access the podcast (opens in new tab)

9. Smith, B., & Browne, C. (2019). Tools and Weapons: The Promise and the Peril of the Digital Age. Penguin Press. Access Microsoft’s introduction to the book (opens in new tab)

10. Microsoft Research Asia. “Intellectual Property, Privacy, and Technology Misuse: How to Face the Legal and Ethical Challenges of the Large Model Era? (知识产权、隐私和技术滥用:如何面对大模型时代的法律与伦理挑战?).” Microsoft Research Asia WeChat Account, August 17, 2023, 5:01 PM, Beijing. Access the article (opens in new tab)

Theme Team:

Xing Xie (Engaging Lead), Senior Principal Research Manager, Microsoft Research Asia

Fangzhao Wu, Principal Researcher, Microsoft Research Asia

Jianxun Lian, Senior Researcher, Microsoft Research Asia

Jindong Wang, Senior Researcher, Microsoft Research Asia

Xiaoyuan Yi, Senior Researcher, Microsoft Research Asia

If you have any questions, please email Ms. Beibei Shi, program manager of the Microsoft Research Asia StarTrack Scholars Program, at besh@microsoft.com

The post Shaping the Future with Societal AI: 2024 Microsoft Research Asia StarTrack Scholars Program Highlights AI Ethics and Interdisciplinary Integration appeared first on Microsoft Research.

]]>
Charting a Greener Path: 2024 Microsoft Research Asia StarTrack Scholars Program Advances AI for Sustainability http://approjects.co.za/?big=en-us/research/articles/charting-a-greener-path-2024-microsoft-research-asia-startrack-scholars-program-advances-ai-for-sustainability/ Mon, 18 Dec 2023 08:54:47 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=993507 Microsoft Research Asia StarTrack Scholars Program has recently embarked on an exciting new chapter, inviting exceptional young minds from around the world to partake in a transformative three-month research visit. In this second installment of our exclusive series on the 2024 Microsoft Research Asia StarTrack Scholars Program, we turn our focus to a theme that […]

The post Charting a Greener Path: 2024 Microsoft Research Asia StarTrack Scholars Program Advances AI for Sustainability appeared first on Microsoft Research.

]]>
Microsoft Research Asia StarTrack Scholars Program has recently embarked on an exciting new chapter, inviting exceptional young minds from around the world to partake in a transformative three-month research visit. In this second installment of our exclusive series on the 2024 Microsoft Research Asia StarTrack Scholars Program, we turn our focus to a theme that resonates deeply with the computer science academia: AI for Sustainability. This groundbreaking initiative underlines our commitment to harnessing the power of artificial intelligence to address one of the most critical challenges of our time: sustainability and achieving carbon negativity. It exemplifies the fusion of advanced computational methods with environmental sustainability goals and presents both a unique challenge and opportunity for scholars in the field. For more details on the initiative, explore our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.

The “AI for Sustainability” team within the 2024 StarTrack Scholars Program, led by Jiang Bian, Senior Principal Research Manager at Microsoft Research Asia, seeks to address critical environmental issues through a computational lens, focusing on the development of AI algorithms for carbon emission tracking, renewable energy optimization, and smart energy management. These areas represent significant challenges that require innovative approaches in machine learning, data analytics, and system design – key areas of interest in computer science research. The team also underscores the importance of collaborative research, bringing academic insights and industry expertise to tackle sustainability challenges. This synergy is crucial for developing scalable AI solutions that are not only theoretically sound but also practically viable.

Our Vision: AI-Driven Sustainability for a Carbon-Negative Future

At Microsoft Research Asia, we recognize the pivotal role AI can play in combating climate change and advancing environmental sustainability. Our goal is to transform how we interact with our planet, using AI to create more efficient, sustainable systems that can significantly reduce our carbon footprint.

We envision a future where AI-driven technologies enable and support:

  • Smart Carbon Tracking: AI algorithms that can monitor, report, and analyze carbon emissions with unprecedented accuracy, providing actionable insights for carbon reduction.
  • Intelligent Energy Systems: Leveraging AI to create smarter grids and more efficient renewable energy systems, optimizing energy production and consumption patterns.
  • Sustainable Urban Environments: AI models that help design greener buildings, optimize traffic flow, and reduce urban carbon emissions.
  • Eco-Friendly Industrial Processes: AI solutions that transform industrial operations, minimizing waste and maximizing efficiency for a lower environmental impact.

Research Insights and Challenges: Tackling Key Environmental Issues

Our research agenda is focused on addressing specific, high-impact challenges in the realm of sustainability:

  • Emission Tracking and Management: Developing AI algorithms that can not only track carbon emissions in real-time but also predict future emission trends, enabling proactive management.
  • Renewable Energy Optimization: AI tools to enhance the predictability and reliability of renewable energy sources, ensuring they can effectively meet demand.
  • Smart Energy Storage and Distribution: Innovating in AI-driven battery management and smart grid technologies to ensure energy is stored efficiently and distributed effectively, minimizing losses.
  • Reducing Operational Energy Demand: AI applications designed to optimize energy use in various sectors, from manufacturing to residential, reducing overall demand and carbon footprint.

Collaborative Exploration: Partnerships Driving Innovation

In our journey towards sustainability, we acknowledge the immense value of collaborative exploration. Our approach involves:

  • Joint Research Initiatives: Working hand-in-hand with academic partners to blend their domain expertise with our cutting-edge AI technologies, leading to innovative solutions.
  • Shared Learning and Development: Creating an environment of mutual learning, where insights from academia and industry merge, fostering a rich ground for breakthroughs.
  • Pilot Projects and Real-World Testing: Collaborating on pilot projects to test and refine AI applications in real-world settings, ensuring they are viable, effective, and scalable.
  • Global Reach and Impact: Aiming to extend the benefits of our collaboration globally, influencing policies, and setting new standards in sustainable practices across industries.

Insights and Innovations: StarTrack Scholars’ Experiences

The true essence and impact of the Microsoft Research Asia StarTrack Scholars Program are vividly captured through the firsthand experiences of StarTrack Scholars:

a person posing for the camera

Chunyan Wang (opens in new tab), Associate Professor at Tsinghua University’s School of Environment, reflects on her visit:

“Participating in the Microsoft Research Asia (MSR Asia) StarTrack Scholars Program has been an honor and a thrilling experience for me. In this program, I had the opportunity to collaborate with outstanding researchers at the institute to explore the application of big data methods in residential water and energy consumption prediction.

During the project, I gained a deep understanding of the significant role that big data plays in promoting the transformation of green and sustainable consumption. Big data analysis methods can be used to conduct in-depth exploration of the coupled, nonlinear relationship between water and energy consumption, and significantly improve consumption prediction accuracy. This helps us better understand consumer needs and provide targeted advice for households and policymakers to achieve sustainable resource consumption and supply.

This experience has made me realize the importance of interdisciplinary collaboration in the field of green and sustainable consumption. Through exchanges with other researchers at the institute, I have learned a lot about computer science and have gained new perspectives and methods for my research. More importantly, I met many like-minded partners. This program has not only enhanced my academic capabilities but also strengthened my belief in dedicating myself to the research of green and sustainable consumption.

I sincerely hope that more talented researchers from various fields will have the opportunity to participate in and benefit from this exceptional program.”

a man wearing glasses and smiling at the camera

Jia Xing (opens in new tab), Research Associate Professor at the University of Tennessee Knoxville, shares:

“The six months I spent at MSR Asia were one of the most joyful periods of my career. I was so impressed by the energetic atmosphere at MSR Asia, which was marked by enthusiasm, exceptional talent, and strong work ethics.

I was genuinely surprised to witness the passion that MSR Asia fellows have for their work. They arrive early and leave late, clearly demonstrating their deep love for what they do. It’s something I sorely miss in my current work routine, which often involves tedious tasks and monotonous meetings. Being at MSR Asia felt like a return to my student days, a time when I could wholeheartedly focus on my research without the burden of mundane responsibilities.

Among the highlights of my experience at MSR Asia were the discussion rooms, which stand out as an integral part of its vibrant atmosphere. Engaging in discussions on intriguing topics was genuinely memorable. The abundance of available rooms for discussion streamlined the process; when an idea struck, connecting with a colleague for discussion was effortlessly efficient, contributing to an overall productive work environment.

Lacking an AI background, I initially had some concerns. However, the accomplished fellows at MSR Asia provided me with exceptional training, consistently extending their kindness and mentorship. My learning extended beyond AI knowledge; it embraced the essence of cross-field innovation. The experience not only enriched my understanding of AI but also allowed me to cultivate meaningful friendships.

Overall, MSR Asia not only expanded my AI expertise but also contributed significantly to my professional and personal growth, making it a truly memorable and impactful chapter in my career.”

The 2024 Microsoft Research Asia StarTrack Scholars Program will continue to offer a valuable platform for talented young scholars from around the world to engage in cutting-edge research at the intersection of AI and environmental sustainability. AI for Sustainability is a call to action for brilliant young minds passionate about melding technology with environmental stewardship. It represents the nexus of advanced computational methods and the ambitious goals of environmental sustainability. For scholars in computer science and related fields, this not only presents an opportunity but also a challenge for them to be part of something transformative. Join us as we chart a greener path forward, leveraging the power of AI for a sustainable future. For further information on the application procedure and eligibility requirements, please visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research

References:

  1. Microsoft Research. (2023). AI Forum 2023: Harnessing AI for a Greener Tomorrow – From Precise Carbon Monitoring to Intelligent Energy Optimization. Access the video
  2. Microsoft Research Asia. (2023). “Science Craftsmen | Bian Jiang: Seven Years of ‘Technical Itching’ at Microsoft Research Asia, Exploring the Path for Large Models to Aid in the Integration of AI and Industry (科学匠人 | 边江: 在研究院的七年‘技痒’, 探寻大模型助力AI与产业融合之道).” Microsoft Research Asia WeChat Account, July 31, 2023, 5:01 PM, Beijing. Access the article (opens in new tab)

Theme Team:

Jiang Bian (Engaging Lead), Senior Principal Research Manager, Microsoft Research Asia

Lei Song, Principal Researcher, Microsoft Research Asia

Shun Zheng, Senior Researcher, Microsoft Research Asia

Jinyu Wang, Senior Applied Scientist, Microsoft Research Asia

Xiaofan Gui, Applied Scientist, Microsoft Research Asia

If you have any questions, please email Ms. Beibei Shi, program manager of the Microsoft Research Asia StarTrack Scholars Program, at besh@microsoft.com

The post Charting a Greener Path: 2024 Microsoft Research Asia StarTrack Scholars Program Advances AI for Sustainability appeared first on Microsoft Research.

]]>
Exploring Media Foundation: 2024 Microsoft Research Asia StarTrack Scholars Program Elevates Multimodal AI Research http://approjects.co.za/?big=en-us/research/articles/exploring-media-foundation-2024-microsoft-research-asia-startrack-scholars-program-elevates-multimodal-ai-research/ Tue, 05 Dec 2023 03:46:22 +0000 http://approjects.co.za/?big=en-us/research/?post_type=msr-blog-post&p=989319 In a groundbreaking move, Microsoft Research Asia’s prestigious StarTrack Scholars Program has officially taken flight, extending a global invitation to brilliant young minds for an immersive three-month research visit. Picture this: collaboration with elite researchers, a deep dive into the Microsoft Research environment, and a valuable opportunity to transform academic brilliance into real-world impact. But […]

The post Exploring Media Foundation: 2024 Microsoft Research Asia StarTrack Scholars Program Elevates Multimodal AI Research appeared first on Microsoft Research.

]]>
In a groundbreaking move, Microsoft Research Asia’s prestigious StarTrack Scholars Program has officially taken flight, extending a global invitation to brilliant young minds for an immersive three-month research visit. Picture this: collaboration with elite researchers, a deep dive into the Microsoft Research environment, and a valuable opportunity to transform academic brilliance into real-world impact. But here’s the exclusive: Amidst the program’s three pivotal research themes, the spotlight is now on Media Foundation—a catalyst for innovation in the realms of media and technology. Brace yourself for an in-depth exploration into the heart of the 2024 StarTrack Scholars Program. For a comprehensive experience, visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research 

The saga continues! Stay tuned for an exclusive series on the Microsoft Research Asia StarTrack Scholars Program, where we unravel the intricacies of the research themes that are steering the 2024 program towards a future defined by technological excellence. Join us as we set sail towards the stars and reshape the narrative of tech innovation!  

Unlocking the Potential of AI in Observing and Understanding the Real World 

In the pursuit of artificial intelligence mirroring human capabilities in extracting knowledge and intelligence from real-world media, Microsoft Research Asia has set its sights on innovating within the realm of Media Foundation. This groundbreaking research theme, as one of the three core focuses of the 2024 StarTrack Scholars Program, aims to provide fresh insights for the study of multimodal large models. 

“We aspire for artificial intelligence to emulate humans by acquiring knowledge and intelligence from various media sources in the real world,” said Yan Lu, Partner Research Manager, Microsoft Research Asia. “To achieve this goal, we must transform the complex and noisy real world into abstract representations capable of capturing essential information and dynamics. The exploration of Media Foundation serves as a new breakthrough in the synergy of multimedia and artificial intelligence, offering novel perspectives for the research on multimodal large models.”   

Breaking the Barrier Between the Real World and Abstract Semantic Space 

Over the last 70 years since the inception of the term “artificial intelligence” at the Dartmouth Conference in 1956, advancements in technology and resources have propelled AI to unprecedented heights. With the recent surge in Large Language Models (LLMs), such as ChatGPT and DALL-E, showcasing remarkable progress in natural language understanding, speech recognition, and image generation, the potential for AI to observe, learn, understand, reason, and create in the real world has become more apparent than ever. 

However, despite these strides, a substantial gap remains between the cognitive abilities of AI and humans. While the human brain can interpret a vast array of phenomena from the physical world—such as videos, sounds, language, and text—and abstract them into preservable and accumulative information, multimodal AI models are still in their early stages of development when it comes to tackling universal tasks. 

The ambition is for AI to learn and iterate from real-world data. The challenge lies in bridging the gap between the complex, noisy real world and the abstract semantic world where AI operates. Can we construct another language parallel to natural language for different types of media information? The answer to this question, according to Yan Lu and colleagues at Microsoft Research Asia, lies in researchers’ dedication to constructing a comprehensive Media Foundation framework, starting with the neural codec. This framework aims to extract representations of different media content, enabling AI to understand the semantics of the real world and, in turn, bridging the gap between reality and abstract semantics for multimodal AI research. 

Humans excel as learners due to their ability to observe and interact with the physical world through various senses—sight, sound, touch, and speech. The aspiration is to replicate this human characteristic in AI, enabling it to learn and iterate from rich real-world data. 

While most AI models are built on top of Large Language Models (LLMs) that use abstract, concise text expressions to understand the world, there is a need for efficient methods to transform complex and noisy signals from video and audio sources into abstract representations that capture the essence and dynamics of the real world. 

diagram

The Media Foundation framework, as envisioned by Yan Lu and his team, consists of two components: the online media tokenization and the offline foundation model. The online model dynamically converts multimedia information into compact, abstract semantic representations for AI to observe and interact with the real world. Meanwhile, the offline foundation model is constructed using extracted media tokens from the real world and predicts dynamics through offline learning. The efficiency and near-lossless compression of real-world dynamics are essential sources of AI intelligence, whether learning from text or audio and video signals. 

Neural Codec: Constructing Abstract Representations for Multimedia 

Media Foundation is envisioned as a comprehensive framework comprising online media tokenization and offline foundation models. These components, driven by the neural codec, aim to efficiently convert different modalities of media signals into compact and semantic representations, constructing an abstract representation of the real world and its dynamics. 

The development plan spans three phases: firstly, training initial encoder and decoder models for each modality; secondly, building foundation models for each modality and further optimizing the encoder and decoder; and lastly, learning cross-modal correlations and ultimately constructing the final multimodal foundation model. This dynamic media representation, in conjunction with the multimodal foundation model, forms Media Foundation, providing a new perspective for the study of multimodal artificial intelligence. 

As previously discussed, the abstract semantic representation is more compact and concise, while video and audio signals are complex and noisy. Can Media Foundation compress the dynamics of the real world efficiently and with minimal loss? This challenge has spurred the team to develop a new neural codec framework dedicated to constructing abstract representations for video, audio, and their dynamics. 

Efficient Neural Audio/Video Codec Development: Paving the Way for Innovative Applications 

Over the past few years, Yan Lu and his colleagues have focused on developing efficient neural audio/video codecs and have made exciting progress. Disrupting traditional codec architectures through deep learning, the team has achieved lower computational costs and superior performance. 

In the realm of neural audio codec development, they achieved high-quality speech signal compression at 256bps, realizing disentangled semantic representation learning through information bottleneck at an extremely low bitrate. This breakthrough not only holds significance at the multimedia technology level but also empowers various audio and speech tasks, such as voice conversion or speech-to-speech translation. 

Additionally, the team developed the DCVC-DC (Deep Contextual Video Compression-Diverse Contexts) neural video codec. This codec transforms different modules and algorithms traditionally combined through rule-based approaches in codec design into an automated deep learning process. This innovation significantly enhances video compression ratio and surpasses all existing video codecs. Due to the comprehensive and collaborative nature of Media Foundation, the team is currently making substantial modifications to the DCVC-DC codec. 

Exploring New Possibilities Beyond Implicit Text Language 

The newly developed neural codec fundamentally alters the modeling of different types of information in the latent space, achieving higher compression ratios. For multimodal large models, this approach enables the transformation of visual, language, and audio information into neural representations in the latent space. Unlike the simple sequential descriptions in natural languages, these multimedia representations conform to natural laws and support a wider range of applications. 

The team’s exploration validates the feasibility of constructing a new Media Foundation based on video and audio information, offering a brand-new perspective for AI development. While natural language has proven to be effective in constructing AI, having to always transform complex multimedia signals into text can be laborious and limit its development. Constructing a Media Foundation based on neural codecs could provide a more effective approach. 

Although the pathways for developing multimodal large models through Media Foundation and natural language models differ, both approaches hold irreplaceable value for the development of artificial intelligence. If AI-learned multimedia representations are considered a parallel “language” to natural language, then large multimodal models can be seen as “large multimedia language models.” The development of the neural codec is expected to play a crucial role in driving the evolution of Media Foundation, with its media foundation models and large language models collaboratively shaping the future of multimodal large models, unlocking the full potential of artificial intelligence. 

The team will continue to explore various modeling approaches in the latent space for multimedia information using neural codecs. Media Foundation will serve as their guiding principle, presenting myriad possibilities at every entry point. As we navigate the intricacies of constructing a comprehensive Media Foundation framework, we extend a call to brilliant young minds passionate about reshaping the future of artificial intelligence. 

We invite talented young scholars from around the world to apply for the 2024 StarTrack Scholars Program, which offers an exciting opportunity to collaborate with our esteemed researchers, led by Yan Lu. Engage in a transformative three-month research visit, immersing yourself in the development of neural codecs and the exploration of new possibilities beyond implicit text language. Work alongside our dedicated team, including Xiulian Peng, Cuiling Lan, and Bin Li, as we bridge the gap between the complex real world and abstract semantic space. The Media Foundation framework is not just a research endeavor; it’s a journey toward unlocking the full potential of artificial intelligence in observing and understanding the real world. Visit our official website for more details on the application process and eligibility criteria: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research 

References: 

  1. Jiang, X., Peng, X., Zhang, Y., & Lu, Y. (2023). “Disentangled Feature Learning for Real-Time Neural Speech Coding.” In ICASSP 2023 – 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). Rhodes Island, Greece. DOI: 10.1109/ICASSP49357.2023.10094723 (opens in new tab). Access the paper (opens in new tab) 
  1. Li, J., Li, B., & Lu, Y. (2023). “Neural Video Compression with Diverse Contexts.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 22616-22626). Vancouver, BC, Canada. DOI: 10.1109/CVPR52729.2023.02166 (opens in new tab). Access the paper (opens in new tab)  
  1. Lu, Y. “Media Foundation: Opening New Perspectives on Multimodal Large Models (媒体基础:打开多模态大模型的新思路).” Microsoft Research Asia WeChat Account, October 12, 2023, 5:59 PM, Beijing. Access the article  (opens in new tab) 

Theme Team:  

Yan Lu (Engaging Lead), Partner Research Manager, Microsoft Research Asia 

Xiulian Peng, Principal Research Manager, Microsoft Research Asia 

Cuiling Lan, Principal Researcher, Microsoft Research Asia 

Bin Li, Principal Researcher, Microsoft Research Asia 

If you have any questions, please email Ms. Beibei Shi, program manager of the Microsoft Research Asia StarTrack Scholars Program, at besh@microsoft.com 

The post Exploring Media Foundation: 2024 Microsoft Research Asia StarTrack Scholars Program Elevates Multimodal AI Research appeared first on Microsoft Research.

]]>