:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yao, Zengwei, Kang, Wei, Zhu, Han, Guo, Liyong, Ye, Lingxuan, Kuang, Fangjun, Zhuang, Weiji, Li, Zhaoqing, Han, Zhifeng, Lin, Long, Povey, Daniel
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2512.23278
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
by: Zhu, Han, et al.
Published: (2025)

ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching
by: Zhu, Han, et al.
Published: (2025)

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models
by: Zhu, Han, et al.
Published: (2026)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
by: Kang, Wei, et al.
Published: (2023)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
by: Yang, Yifan, et al.
Published: (2024)

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)

AudioGAN: A Compact and Efficient Framework for Real-Time High-Fidelity Text-to-Audio Generation
by: Chung, HaeChun
Published: (2025)

RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
by: Park, Hyun Joon, et al.
Published: (2025)

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)

FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
by: Shen, Rubing, et al.
Published: (2024)

Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?
by: Du, Hui-Peng, et al.
Published: (2025)

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
by: Liu, Huadai, et al.
Published: (2024)

HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis
by: Wu, Zhuojia, et al.
Published: (2024)

FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
by: Jung, Chaeyoung, et al.
Published: (2024)

FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)

UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
by: Chen, Zhengyang, et al.
Published: (2024)

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
by: Du, Jiawei, et al.
Published: (2024)

DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)

Few-step Adversarial Schrödinger Bridge for Generative Speech Enhancement
by: Han, Seungu, et al.
Published: (2025)

FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)

GAN-Based Multi-Microphone Spatial Target Speaker Extraction
by: Shetu, Shrishti Saha, et al.
Published: (2025)

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
by: Pia, Nicola, et al.
Published: (2024)

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)

FlowW2N: Whispered-to-Normal Speech Conversion via Flow-Matching
by: Ritter-Gutierrez, Fabian, et al.
Published: (2026)

Leveraging Discriminative Latent Representations for Conditioning GAN-Based Speech Enhancement
by: Shetu, Shrishti Saha, et al.
Published: (2025)

A Universal Harmonic Discriminator for High-quality GAN-based Vocoder
by: Xu, Nan, et al.
Published: (2025)

HILCodec: High-Fidelity and Lightweight Neural Audio Codec
by: Ahn, Sunghwan, et al.
Published: (2024)

JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching
by: Kwon, Mingi, et al.
Published: (2025)

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
by: Ahmad, Zeeshan, et al.
Published: (2025)

TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
by: Ma, Chengyuan, et al.
Published: (2026)

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
by: Cho, Hyunjae, et al.
Published: (2024)

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
by: Kong, Zhifeng, et al.
Published: (2024)

Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios
by: Cheng, Yongkang, et al.
Published: (2024)

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
by: Luo, Tianze, et al.
Published: (2025)

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN
by: Gu, Yicheng, et al.
Published: (2025)