Saved in:
| Main Authors: | Ghazal, Nizar El, Caubrière, Antoine, Vielzeuf, Valentin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.09424 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025)
by: Vendrame, Katia, et al.
Published: (2025)
Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)
by: Wang, Mingqiu, et al.
Published: (2024)
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026)
by: Kuzmin, Nikita, et al.
Published: (2026)
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
by: Yan, Ruiqi, et al.
Published: (2025)
by: Yan, Ruiqi, et al.
Published: (2025)
StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
by: Xu, Qianheng
Published: (2025)
by: Xu, Qianheng
Published: (2025)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)
by: Ma, Zhengrui, et al.
Published: (2024)
Is one brick enough to break the wall of spoken dialogue state tracking?
by: Druart, Lucas, et al.
Published: (2023)
by: Druart, Lucas, et al.
Published: (2023)
TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
Long-Form End-to-End Speech Translation via Latent Alignment Segmentation
by: Polák, Peter, et al.
Published: (2023)
by: Polák, Peter, et al.
Published: (2023)
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
by: Hono, Yukiya, et al.
Published: (2023)
by: Hono, Yukiya, et al.
Published: (2023)
Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)
by: Caubrière, Antoine, et al.
Published: (2024)
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)
by: Lee, Jihwan, et al.
Published: (2024)
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
by: Li, Yinghao Aaron, et al.
Published: (2024)
by: Li, Yinghao Aaron, et al.
Published: (2024)
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
When End-to-End is Overkill: Rethinking Cascaded Speech-to-Text Translation
by: Min, Anna, et al.
Published: (2025)
by: Min, Anna, et al.
Published: (2025)
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction
by: Wang, Qichao, et al.
Published: (2025)
by: Wang, Qichao, et al.
Published: (2025)
Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation
by: Wang, Qiongqiong, et al.
Published: (2025)
by: Wang, Qiongqiong, et al.
Published: (2025)
Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)
by: Lu, Yen-Ju, et al.
Published: (2025)
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation
by: Gunduz, Ahmet, et al.
Published: (2024)
by: Gunduz, Ahmet, et al.
Published: (2024)
BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)
Do we really need Self-Attention for Streaming Automatic Speech Recognition?
by: Dkhissi, Youness, et al.
Published: (2026)
by: Dkhissi, Youness, et al.
Published: (2026)
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
by: Mitsui, Kentaro, et al.
Published: (2024)
by: Mitsui, Kentaro, et al.
Published: (2024)
SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
by: Lin, Chyi-Jiunn, et al.
Published: (2024)
Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation
by: Sildam, Tiia, et al.
Published: (2024)
by: Sildam, Tiia, et al.
Published: (2024)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
by: Fang, Qingkai, et al.
Published: (2025)
by: Fang, Qingkai, et al.
Published: (2025)
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
by: Chen, Tanyu, et al.
Published: (2026)
by: Chen, Tanyu, et al.
Published: (2026)
An End-to-End Approach for Chord-Conditioned Song Generation
by: Gao, Shuochen, et al.
Published: (2024)
by: Gao, Shuochen, et al.
Published: (2024)
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024)
by: Wang, Xiong, et al.
Published: (2024)
Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs
by: Sedláček, Šimon, et al.
Published: (2025)
by: Sedláček, Šimon, et al.
Published: (2025)
MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
by: Deng, Yayue, et al.
Published: (2025)
by: Deng, Yayue, et al.
Published: (2025)
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
by: Zink, Oswald, et al.
Published: (2024)
by: Zink, Oswald, et al.
Published: (2024)
Towards End-to-End Spoken Grammatical Error Correction
by: Bannò, Stefano, et al.
Published: (2023)
by: Bannò, Stefano, et al.
Published: (2023)
Similar Items
-
Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025) -
Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024) -
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
by: Wang, Chen, et al.
Published: (2025) -
Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026) -
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)