About
Kaitao Song is a senior researcher at Microsoft Research Asia, Shanghai, China. His research interests focus on machine learning/deep learning algorithms on natural language processing/speech processing, including pre-trained language model, neural machine translation, music generation, text summarization, neural architecture search for NLP, audio speech recognition, text-to-speech synthesis and etc.
He has published multiple famous papers, including:
- HuggingGPT / JARVIS
- Github (opens in new tab) stars: 23.6K
- The Most Influential Papers in NeurIPS 2023, Rank 8
- Selected as Top 100 AI papers in 2023 (opens in new tab)
- Wins 2024 World Artificial Intelligence Conference (WAIC) Youth Outstanding Paper Award (opens in new tab)
- integrated into Langchain (opens in new tab) and HuggingFace (opens in new tab)
- NeurIPS 2023 Highlights (opens in new tab)
- Open100: Top 100 Open Source achievements (opens in new tab)
- PVT
- The Most Influential Papers in ICCV 2021, Rank 2
- MPNet
- MASS
- The Most Influential Papers in ICML 2019, Rank 9
Currently, He is focused on the following research topics:
- Large Language Models
- Autonomous Agents, Planning, Tool Use, Memory and etc
- General NLP methods
- Pre-trained Model, Architecture Design, Efficient Training and Inference.
Hiring: We are looking for research interns with self-motivation. Please contact me (kaitaosong[AT]microsoft.com) if you are interested in my research topics. (No headcount now)
Research Topics & Projects
- Large Language Models
- Autonomous Agents: HuggingGPT (opens in new tab), MusicAgent (opens in new tab), EvoAgent (opens in new tab)
- Tool: EasyTool (opens in new tab)
- Benchmark: TaskBench (opens in new tab)
- Prompt Engineering: DTG (opens in new tab), EvoPrompt (opens in new tab)
- Autonomous Agents: HuggingGPT (opens in new tab), MusicAgent (opens in new tab), EvoAgent (opens in new tab)
- Foundation Models
- NLP & Speech Application
- Translation: [Paper-1 (opens in new tab)], [Paper-2 (opens in new tab)], [Paper-3 (opens in new tab)]
- Summarization: [Paper-1 (opens in new tab)]
- Information Extraction: DiffusionNER (opens in new tab)
- Music: SongMASS (opens in new tab), DeepRapper (opens in new tab)
- MultiModal: [Paper-1 (opens in new tab)], [Paper-2 (opens in new tab)]
- Clinical Application: [Paper-1 (opens in new tab)], [Paper-2 (opens in new tab)], [Paper-3 (opens in new tab)], [Paper-4 (opens in new tab)]
- ASR & TTS: [Paper-1 (opens in new tab)], [Paper-2 (opens in new tab)], [Paper-3 (opens in new tab)]
Recent News:
[2024-09-26] Two papers about LLM reasoning and LLM benchmark have been accepted by NeurIPS 2024.
[2024-08-11] I serve as Session Chair of ACL 2024.
[2024-06-27] I have been invited to attend AI Agent Summit, hosted by Andrew Ng and DeepLearning.ai, to discuss about the future of AI Agent.
[2024-05-15] Our paper (NaturalSpeech 3) has been accepted by ICML 2024 (Oral). Our papers about LLM reasoning has been accepted by ACL 2024 Main Conference (Oral).
[2024-02-29] I have given a talk about HuggingGPT (opens in new tab) at AGI Leap Summit 2024.
[2024-01-17] Two our papers (EvoPrompt, VoiceGen) have been accepted by ICLR 2024. And Two of our papers (EasyTool, TaskBench) also have been accepted by ICLR 2024 LLMAgent workshops.
[2023-12-10] I have given a talk – “The Future is Here – A Deep Dive into Autonomous Agents (opens in new tab)” at NeurIPS 2023
[2023-09-22] HuggingGPT has been accepted by NeurIPS 2023