:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ghazal, Nizar El, Caubrière, Antoine, Vielzeuf, Valentin
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.09424
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025)

Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
by: Wang, Chen, et al.
Published: (2025)

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026)

Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
by: Yan, Ruiqi, et al.
Published: (2025)

StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
by: Xu, Qianheng
Published: (2025)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)

Is one brick enough to break the wall of spoken dialogue state tracking?
by: Druart, Lucas, et al.
Published: (2023)

TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation
by: Polák, Peter, et al.
Published: (2023)

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)

Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
by: Hono, Yukiya, et al.
Published: (2023)

Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
by: Li, Yinghao Aaron, et al.
Published: (2024)

Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)

When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
by: Min, Anna, et al.
Published: (2025)

YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction
by: Wang, Qichao, et al.
Published: (2025)

Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation
by: Wang, Qiongqiong, et al.
Published: (2025)

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)

An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation
by: Gunduz, Ahmet, et al.
Published: (2024)

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)

Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)

Do we really need Self-Attention for Streaming Automatic Speech Recognition?
by: Dkhissi, Youness, et al.
Published: (2026)

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
by: Mitsui, Kentaro, et al.
Published: (2024)

SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
by: Lin, Chyi-Jiunn, et al.
Published: (2024)

Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation
by: Sildam, Tiia, et al.
Published: (2024)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
by: Fang, Qingkai, et al.
Published: (2025)

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
by: Chen, Tanyu, et al.
Published: (2026)

An End-to-End Approach for Chord-Conditioned Song Generation
by: Gao, Shuochen, et al.
Published: (2024)

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024)

Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs
by: Sedláček, Šimon, et al.
Published: (2025)

MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
by: Deng, Yayue, et al.
Published: (2025)

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
by: Zink, Oswald, et al.
Published: (2024)

Towards End-to-End Spoken Grammatical Error Correction
by: Bannò, Stefano, et al.
Published: (2023)