Table des matières: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Li, Bohan, Wang, Hankun, Zhang, Situo, Guo, Yiwei, Yu, Kai
Format:	Preprint
Publié:	2024
Sujets:	Audio and Speech Processing Artificial Intelligence Sound 68T07
Accès en ligne:	https://arxiv.org/abs/2410.21951
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Table des matières:

The auto-regressive architecture, like GPTs, is widely used in modern Text-to-Speech (TTS) systems. However, it incurs substantial inference time, particularly due to the challenges in the next-token prediction posed by lengthy sequences of speech tokens. In this work, we introduce VADUSA, one of the first approaches to accelerate auto-regressive TTS through speculative decoding. Our results show that VADUSA not only significantly improves inference speed but also enhances performance by incorporating draft heads to predict future speech content auto-regressively. Furthermore, the inclusion of a tolerance mechanism during sampling accelerates inference without compromising quality. Our approach demonstrates strong generalization across large datasets and various types of speech tokens.

Documents similaires