{"id":1018134,"date":"2024-03-28T14:28:52","date_gmt":"2024-03-28T21:28:52","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&p=1018134"},"modified":"2024-05-31T12:15:39","modified_gmt":"2024-05-31T19:15:39","slug":"covomix","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/covomix\/","title":{"rendered":"CoVoMix"},"content":{"rendered":"
\n\t
\n\t\t
\n\t\t\t\t\t<\/div>\n\t\t\n\t\t
\n\t\t\t\n\t\t\t
\n\t\t\t\t\n\t\t\t\t
\n\t\t\t\t\t\n\t\t\t\t\t
\n\t\t\t\t\t\t
\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n

CoVoMix<\/h1>\n\n\n\n

Advancing Zero-shot Speech Generation for Human-like Multi-talker Conversation<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n

<\/div>\n\n\n\n\n\n

We introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. In addition, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple speakers engaging in multiple rounds of conversation. These dialogues, generated within a single channel, are characterized by seamless speech transitions, including overlapping speech, and appropriate paralinguistic behaviors such as laughter and coughing.<\/p>\n\n\n\n

\n
Paper’s Link<\/a><\/div>\n<\/div>\n\n\n\n
\"diagram\"<\/figure>\n\n\n\n
<\/div>\n\n\n\n
\n