Speaker: Benjamin StahlHost: Hannes Gamper In this talk, we explore advancements in computational models for speech quality assessment. Self-supervised learning models have emerged as powerful front-ends, outperforming supervised-only models. However, their large size renders them…
Much of recent progress for natural language generation (NLG) has been in the context of English and, in general, high resource languages, however, Indian languages have yet to see similar paradigm shifts despite their speaking…
Million-Tokens Prompt Inference for Long-context LLMs MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern…
Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech EmoCtrl-TTS is an emotion-controllable zero-shot TTS that can generate highly emotional speech with non-verbal vocalizations such as laughter and crying for any speaker. EmoCtrl-TTS is purely a…
In this issue: RENC makes 5G vRAN servers more energy efficient; CoExplorer uses AI to keep video meetings on track; Automatic bug detection in LLM-powered text-based games; MAIRA-2: Grounded radiology report generation.