Saved in:
| Main Authors: | Lin, Guan-Ting, Lian, Jiachen, Li, Tingle, Wang, Qirui, Anumanchipalli, Gopala, Liu, Alexander H., Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.04721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
by: Lin, Guan-Ting, et al.
Published: (2025)
by: Lin, Guan-Ting, et al.
Published: (2025)
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
by: Liu, Jingwen, et al.
Published: (2025)
by: Liu, Jingwen, et al.
Published: (2025)
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
by: Lin, Guan-Ting, et al.
Published: (2025)
by: Lin, Guan-Ting, et al.
Published: (2025)
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)
by: Lin, Guan-Ting, et al.
Published: (2026)
FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Towards a Japanese Full-duplex Spoken Dialogue System
by: Ohashi, Atsumoto, et al.
Published: (2025)
by: Ohashi, Atsumoto, et al.
Published: (2025)
LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems
by: Zhang, Hao, et al.
Published: (2025)
by: Zhang, Hao, et al.
Published: (2025)
Audio Texture Manipulation by Exemplar-Based Analogy
by: Cheng, Kan Jen, et al.
Published: (2025)
by: Cheng, Kan Jen, et al.
Published: (2025)
From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models
by: Chen, Yuxuan, et al.
Published: (2025)
by: Chen, Yuxuan, et al.
Published: (2025)
Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech
by: Zhou, Xuanru, et al.
Published: (2025)
by: Zhou, Xuanru, et al.
Published: (2025)
Teaching Machines to Speak Using Articulatory Control
by: Anand, Akshay, et al.
Published: (2025)
by: Anand, Akshay, et al.
Published: (2025)
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
by: Arora, Siddhant, et al.
Published: (2025)
by: Arora, Siddhant, et al.
Published: (2025)
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
by: Zhang, Haoyang, et al.
Published: (2026)
by: Zhang, Haoyang, et al.
Published: (2026)
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
Scaling Spoken Language Models with Syllabic Speech Tokenization
by: Lee, Nicholas, et al.
Published: (2025)
by: Lee, Nicholas, et al.
Published: (2025)
Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge
by: Wang, Chengyou, et al.
Published: (2026)
by: Wang, Chengyou, et al.
Published: (2026)
TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving
by: Cui, Wenqian, et al.
Published: (2025)
by: Cui, Wenqian, et al.
Published: (2025)
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
by: Lin, Guan-Ting, et al.
Published: (2023)
by: Lin, Guan-Ting, et al.
Published: (2023)
DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio
by: Nakata, Wataru, et al.
Published: (2026)
by: Nakata, Wataru, et al.
Published: (2026)
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
by: Veluri, Bandhav, et al.
Published: (2024)
by: Veluri, Bandhav, et al.
Published: (2024)
How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue
by: Lu, Hui, et al.
Published: (2026)
by: Lu, Hui, et al.
Published: (2026)
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE
by: Lian, Jiachen, et al.
Published: (2022)
by: Lian, Jiachen, et al.
Published: (2022)
TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)
by: Chang, Kai-Wei, et al.
Published: (2026)
Self-Supervised Audio-Visual Soundscape Stylization
by: Li, Tingle, et al.
Published: (2024)
by: Li, Tingle, et al.
Published: (2024)
Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models
by: Kuzmin, Nikita, et al.
Published: (2026)
by: Kuzmin, Nikita, et al.
Published: (2026)
HuPER: A Human-Inspired Framework for Phonetic Perception
by: Guo, Chenxu, et al.
Published: (2026)
by: Guo, Chenxu, et al.
Published: (2026)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
by: Lin, Yi-Cheng, et al.
Published: (2024)
by: Lin, Yi-Cheng, et al.
Published: (2024)
LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues
by: Ivry, Amir, et al.
Published: (2026)
by: Ivry, Amir, et al.
Published: (2026)
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
by: Zhang, He, et al.
Published: (2025)
by: Zhang, He, et al.
Published: (2025)
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)
by: Wu, Weijie, et al.
Published: (2025)
SSDM 2.0: Time-Accurate Speech Rich Transcription with Non-Fluencies
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models
by: Huang, Hsiao-Ying, et al.
Published: (2026)
by: Huang, Hsiao-Ying, et al.
Published: (2026)
MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
by: Deng, Yayue, et al.
Published: (2025)
by: Deng, Yayue, et al.
Published: (2025)
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models
by: Yan, Ruiqi, et al.
Published: (2025)
by: Yan, Ruiqi, et al.
Published: (2025)
MMedFD: A Real-world Healthcare Benchmark for Multi-turn Full-Duplex Automatic Speech Recognition
by: Chen, Hongzhao, et al.
Published: (2025)
by: Chen, Hongzhao, et al.
Published: (2025)
DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
by: Koudounas, Alkis, et al.
Published: (2025)
by: Koudounas, Alkis, et al.
Published: (2025)
Similar Items
-
Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
by: Lin, Guan-Ting, et al.
Published: (2025) -
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
by: Liu, Jingwen, et al.
Published: (2025) -
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
by: Lin, Guan-Ting, et al.
Published: (2025) -
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024) -
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)