Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Lu, Xiangyu, Xu, Wang, Wang, Haoyu, Zhou, Hongyun, Zhao, Haiyan, Zhu, Conghui, Zhao, Tiejun, Yang, Muyun
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2502.11123
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866909562999668736
author	Lu, Xiangyu Xu, Wang Wang, Haoyu Zhou, Hongyun Zhao, Haiyan Zhu, Conghui Zhao, Tiejun Yang, Muyun
author_facet	Lu, Xiangyu Xu, Wang Wang, Haoyu Zhou, Hongyun Zhao, Haiyan Zhu, Conghui Zhao, Tiejun Yang, Muyun
contents	Real-time speech conversation is essential for natural and efficient human-machine interactions, requiring duplex and streaming capabilities. Traditional Transformer-based conversational chatbots operate in a turn-based manner and exhibit quadratic computational complexity that grows as the input size increases. In this paper, we propose DuplexMamba, a Mamba-based end-to-end multimodal duplex model for speech-to-text conversation. DuplexMamba enables simultaneous input processing and output generation, dynamically adjusting to support real-time streaming. Specifically, we develop a Mamba-based speech encoder and adapt it with a Mamba-based language model. Furthermore, we introduce a novel duplex decoding strategy that enables DuplexMamba to process input and generate output simultaneously. Experimental results demonstrate that DuplexMamba successfully implements duplex and streaming capabilities while achieving performance comparable to several recently developed Transformer-based models in automatic speech recognition (ASR) tasks and voice assistant benchmark evaluations. Our code and model are released.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_11123
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities Lu, Xiangyu Xu, Wang Wang, Haoyu Zhou, Hongyun Zhao, Haiyan Zhu, Conghui Zhao, Tiejun Yang, Muyun Computation and Language Real-time speech conversation is essential for natural and efficient human-machine interactions, requiring duplex and streaming capabilities. Traditional Transformer-based conversational chatbots operate in a turn-based manner and exhibit quadratic computational complexity that grows as the input size increases. In this paper, we propose DuplexMamba, a Mamba-based end-to-end multimodal duplex model for speech-to-text conversation. DuplexMamba enables simultaneous input processing and output generation, dynamically adjusting to support real-time streaming. Specifically, we develop a Mamba-based speech encoder and adapt it with a Mamba-based language model. Furthermore, we introduce a novel duplex decoding strategy that enables DuplexMamba to process input and generate output simultaneously. Experimental results demonstrate that DuplexMamba successfully implements duplex and streaming capabilities while achieving performance comparable to several recently developed Transformer-based models in automatic speech recognition (ASR) tasks and voice assistant benchmark evaluations. Our code and model are released.
title	DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
topic	Computation and Language
url	https://arxiv.org/abs/2502.11123

Ejemplares similares