:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Xingchen, Xie, Hanke, Wang, Ziqian, Zhang, Zihan, Xiao, Longshuai, Wang, Shuai, Xie, Lei
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.24708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)

EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
by: Kang, Boyi, et al.
Published: (2025)

SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)

DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation
by: Wang, Ziqian, et al.
Published: (2024)

EchoFree: Towards Ultra Lightweight and Efficient Neural Acoustic Echo Cancellation
by: Li, Xingchen, et al.
Published: (2025)

MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025)

OmniCodec: Low Frame Rate Universal Audio Codec with Semantic-Acoustic Disentanglement
by: Hu, Jingbin, et al.
Published: (2026)

UniFlow: Unifying Speech Front-End Tasks via Continuous Generative Modeling
by: Wang, Ziqian, et al.
Published: (2025)

Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations
by: Rong, Xiaobin, et al.
Published: (2026)

S$^2$Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion
by: Wang, Ziqian, et al.
Published: (2026)

U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement
by: Han, Runduo, et al.
Published: (2024)

AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)

ARiSE: Auto-Regressive Multi-Channel Speech Enhancement
by: Shen, Pengjie, et al.
Published: (2025)

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2024)

ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)

GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement
by: Wang, Chengzhong, et al.
Published: (2024)

REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)

Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)

FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning
by: Wang, Haoxu, et al.
Published: (2026)

SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
by: Dai, Yuhang, et al.
Published: (2026)

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization
by: Xiao, Yunchong, et al.
Published: (2026)

MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)

FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)

Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
by: Xie, Yuying, et al.
Published: (2022)

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
by: Li, Hanzhao, et al.
Published: (2025)

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)

BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
by: Zhang, Zihan, et al.
Published: (2024)

AMDM-SE: Attention-based Multichannel Diffusion Model for Speech Enhancement
by: Opochinsky, Renana, et al.
Published: (2026)

FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)