:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Han, Kang, Wei, Guo, Liyong, Yao, Zengwei, Kuang, Fangjun, Zhuang, Weiji, Li, Zhaoqing, Han, Zhifeng, Zhang, Dong, Zhang, Xin, Song, Xingchen, Ye, Lingxuan, Lin, Long, Povey, Daniel
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2507.09318
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)

Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
by: Yao, Zengwei, et al.
Published: (2025)

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
by: Yang, Yifan, et al.
Published: (2024)

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)

SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
by: Wang, Huimeng, et al.
Published: (2026)

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization
by: Lu, Yen-Ju, et al.
Published: (2025)

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)

Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
by: Lin, Guan-Ting, et al.
Published: (2023)

LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems
by: Zhang, Hao, et al.
Published: (2025)

FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)

CoVoMix2: Advancing Zero-Shot Dialogue Generation with Fully Non-Autoregressive Flow Matching
by: Zhang, Leying, et al.
Published: (2025)

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
by: Nakata, Wataru, et al.
Published: (2024)

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models
by: Xue, Hongfei, et al.
Published: (2023)

FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
by: Chen, Tanyu, et al.
Published: (2026)

Spoken Language Corpora Augmentation with Domain-Specific Voice-Cloned Speech
by: Czyżnikiewicz, Mateusz, et al.
Published: (2024)

LatentVoiceGrad: Nonparallel Voice Conversion with Latent Diffusion/Flow-Matching Models
by: Kameoka, Hirokazu, et al.
Published: (2025)

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
by: Zuo, Jialong, et al.
Published: (2025)

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
by: Lu, Haitian, et al.
Published: (2025)

Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
by: Zink, Oswald, et al.
Published: (2024)

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
by: Zhan, Jun, et al.
Published: (2025)

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
by: Luo, Tianze, et al.
Published: (2025)

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
by: Kirdey, Stanislav
Published: (2025)

Towards a Japanese Full-duplex Spoken Dialogue System
by: Ohashi, Atsumoto, et al.
Published: (2025)

Semantic-Aware Interruption Detection in Spoken Dialogue Systems: Benchmark, Metric, and Model
by: Xia, Kangxiang, et al.
Published: (2026)

An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
by: Tulsiani, Hitesh, et al.
Published: (2024)

DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset
by: Koudounas, Alkis, et al.
Published: (2025)

SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue
by: Li, Ruiqi, et al.
Published: (2026)

Audio Dialogues: Dialogues dataset for audio and music understanding
by: Goel, Arushi, et al.
Published: (2024)

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025)

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
by: Liu, Jingwen, et al.
Published: (2025)

FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)