The System and Engineering group of Microsoft Research Asia (Shanghai) is a pioneering force in the realm of deep learning systems and their integration into the expansive landscape of large language models (LLMs) and artificial intelligence (AI) ecosystems. Our research spans multiple dimensions, from developing efficient inference engines that harness the power of sparsity and dynamism to advancing AI infrastructure technologies and exploring emerging applications for LLMs. We are committed to optimizing computing for emerging technologies, reducing hardware inefficiencies, and designing new architectures. In addition, our expertise extends to real-time video enhancement and cloud gaming systems, ensuring superior quality and reliability in multimedia experiences. As we navigate this ever-evolving field, our group remains at the forefront, shaping the future of AI systems and infrastructure.
微软亚洲研究院(上海)系统与工程组深耕在深度学习系统领域,专注于将其融入大型语言模型(LLMs)和人工智能(AI)生态系统的广泛领域。我们的研究涵盖多个方面,从开发高效的推理引擎,充分发挥稀疏性和动态性的潜力,到推动AI基础设施技术的进步,以及探索LLMs的新兴应用。我们致力于优化新兴技术的计算,减少硬件效率问题,设计全新的架构。此外,我们的专业领域还涵盖了实时视频增强和云游戏系统,以确保多媒体体验的卓越品质和可靠性。在这个不断演化的领域中,我们的团队一直站在前沿,塑造着AI系统和基础设施的未来。
Research topics
System for Large Language Models and Ecosystem
- Efficient inference engine by leveraging sparsity and dynamism from model architecture, values, and inputs
- AI infrastructure technologies, e.g., Kubernetes GPU schedulers and platform for deep learning workloads
- Emerging technologies for LLM applications including Copilots and Autonomous Agents, such as prompt compression, and lifelong learning from historical records
- Key components connected with LLM, such as data service and vector search
Efficient Computing for Emerging Technologies
- Accelerate the training and inference of diverse models on the cloud and the edge
- Hardware efficiency (latency, energy, and carbon emission) benchmarking, prediction, and efficient model design for specific devices
- New architecture for vector search and resource disaggregation
Video Streaming and Cloud Gaming Systems
- Real-time video super resolution and frame prediction
- Systematic optimization of video encoding, transmission, and DNN-based video enhancement
- Server-client cooperation to mitigate bandwidth-limited and quality-unreliable network
- Fundamental technologies of cloud gaming systems, such as job and resource scheduling