Saved in:
| Main Authors: | Yao, Zengwei, Kang, Wei, Zhu, Han, Guo, Liyong, Ye, Lingxuan, Kuang, Fangjun, Zhuang, Weiji, Li, Zhaoqing, Han, Zhifeng, Lin, Long, Povey, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.23278 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)
by: Zhu, Han, et al.
Published: (2025)
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching
by: Zhu, Han, et al.
Published: (2025)
by: Zhu, Han, et al.
Published: (2025)
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)
by: Zhu, Han, et al.
Published: (2026)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)
by: Kang, Wei, et al.
Published: (2023)
PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)
by: Yang, Xiaoyu, et al.
Published: (2023)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
by: Yang, Yifan, et al.
Published: (2024)
by: Yang, Yifan, et al.
Published: (2024)
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)
by: Jin, Zengrui, et al.
Published: (2024)
AudioGAN: A Compact and Efficient Framework for Real-Time High-Fidelity Text-to-Audio Generation
by: Chung, HaeChun
Published: (2025)
by: Chung, HaeChun
Published: (2025)
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)
by: Park, Hyun Joon, et al.
Published: (2025)
DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
by: Shen, Rubing, et al.
Published: (2024)
by: Shen, Rubing, et al.
Published: (2024)
Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?
by: Du, Hui-Peng, et al.
Published: (2025)
by: Du, Hui-Peng, et al.
Published: (2025)
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis
by: Wu, Zhuojia, et al.
Published: (2024)
by: Wu, Zhuojia, et al.
Published: (2024)
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)
by: Jung, Chaeyoung, et al.
Published: (2024)
FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)
by: Lee, Seonggyu, et al.
Published: (2025)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
by: Du, Jiawei, et al.
Published: (2024)
by: Du, Jiawei, et al.
Published: (2024)
DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)
by: Tan, Shihong, et al.
Published: (2026)
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)
by: Guan, Wenhao, et al.
Published: (2024)
Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)
by: Han, Seungu, et al.
Published: (2025)
FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)
by: Povey, Anna, et al.
Published: (2024)
GAN-Based Multi-Microphone Spatial Target Speaker Extraction
by: Shetu, Shrishti Saha, et al.
Published: (2025)
by: Shetu, Shrishti Saha, et al.
Published: (2025)
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
by: Pia, Nicola, et al.
Published: (2024)
by: Pia, Nicola, et al.
Published: (2024)
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)
by: Lan, Gael Le, et al.
Published: (2024)
FlowW2N: Whispered-to-Normal Speech Conversion via Flow-Matching
by: Ritter-Gutierrez, Fabian, et al.
Published: (2026)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2026)
Leveraging Discriminative Latent Representations for Conditioning GAN-Based Speech Enhancement
by: Shetu, Shrishti Saha, et al.
Published: (2025)
by: Shetu, Shrishti Saha, et al.
Published: (2025)
A Universal Harmonic Discriminator for High-quality GAN-based Vocoder
by: Xu, Nan, et al.
Published: (2025)
by: Xu, Nan, et al.
Published: (2025)
HILCodec: High-Fidelity and Lightweight Neural Audio Codec
by: Ahn, Sunghwan, et al.
Published: (2024)
by: Ahn, Sunghwan, et al.
Published: (2024)
JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching
by: Kwon, Mingi, et al.
Published: (2025)
by: Kwon, Mingi, et al.
Published: (2025)
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
by: Ahmad, Zeeshan, et al.
Published: (2025)
by: Ahmad, Zeeshan, et al.
Published: (2025)
TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026)
by: Ma, Chengyuan, et al.
Published: (2026)
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
by: Cho, Hyunjae, et al.
Published: (2024)
by: Cho, Hyunjae, et al.
Published: (2024)
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
by: Kong, Zhifeng, et al.
Published: (2024)
by: Kong, Zhifeng, et al.
Published: (2024)
Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios
by: Cheng, Yongkang, et al.
Published: (2024)
by: Cheng, Yongkang, et al.
Published: (2024)
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
by: Luo, Tianze, et al.
Published: (2025)
by: Luo, Tianze, et al.
Published: (2025)
Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)
by: Gu, Yicheng, et al.
Published: (2025)
Similar Items
-
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025) -
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching
by: Zhu, Han, et al.
Published: (2025) -
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026) -
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024) -
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)