Saved in:
| Main Authors: | Li, Xingchen, Xie, Hanke, Wang, Ziqian, Zhang, Zihan, Xiao, Longshuai, Wang, Shuai, Xie, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)
by: Zhu, Yike, et al.
Published: (2025)
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)
by: Liu, Zikai, et al.
Published: (2026)
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
by: Kang, Boyi, et al.
Published: (2025)
by: Kang, Boyi, et al.
Published: (2025)
SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)
by: Wang, Ziqian, et al.
Published: (2023)
DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation
by: Wang, Ziqian, et al.
Published: (2024)
by: Wang, Ziqian, et al.
Published: (2024)
EchoFree: Towards Ultra Lightweight and Efficient Neural Acoustic Echo Cancellation
by: Li, Xingchen, et al.
Published: (2025)
by: Li, Xingchen, et al.
Published: (2025)
MeanSE: Efficient Generative Speech Enhancement with Mean Flows
by: Wang, Jiahe, et al.
Published: (2025)
by: Wang, Jiahe, et al.
Published: (2025)
OmniCodec: Low Frame Rate Universal Audio Codec with Semantic-Acoustic Disentanglement
by: Hu, Jingbin, et al.
Published: (2026)
by: Hu, Jingbin, et al.
Published: (2026)
UniFlow: Unifying Speech Front-End Tasks via Continuous Generative Modeling
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations
by: Rong, Xiaobin, et al.
Published: (2026)
by: Rong, Xiaobin, et al.
Published: (2026)
S$^2$Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion
by: Wang, Ziqian, et al.
Published: (2026)
by: Wang, Ziqian, et al.
Published: (2026)
U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement
by: Han, Runduo, et al.
Published: (2024)
by: Han, Runduo, et al.
Published: (2024)
AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)
by: Dai, Yuhang, et al.
Published: (2025)
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)
by: Tian, Wenjie, et al.
Published: (2025)
ARiSE: Auto-Regressive Multi-Channel Speech Enhancement
by: Shen, Pengjie, et al.
Published: (2025)
by: Shen, Pengjie, et al.
Published: (2025)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)
by: Xie, Hanke, et al.
Published: (2025)
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2024)
by: Ning, Ziqian, et al.
Published: (2024)
ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement
by: Wang, Chengzhong, et al.
Published: (2024)
by: Wang, Chengzhong, et al.
Published: (2024)
REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers
by: Jiang, Yuepeng, et al.
Published: (2025)
by: Jiang, Yuepeng, et al.
Published: (2025)
Distinctive and Natural Speaker Anonymization via Singular Value Transformation-assisted Matrix
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning
by: Wang, Haoxu, et al.
Published: (2026)
by: Wang, Haoxu, et al.
Published: (2026)
SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
by: Dai, Yuhang, et al.
Published: (2026)
by: Dai, Yuhang, et al.
Published: (2026)
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
by: Ning, Ziqian, et al.
Published: (2025)
by: Ning, Ziqian, et al.
Published: (2025)
Anonymization, Not Elimination: Utility-Preserved Speech Anonymization
by: Xiao, Yunchong, et al.
Published: (2026)
by: Xiao, Yunchong, et al.
Published: (2026)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
FlowSE: Flow Matching-based Speech Enhancement
by: Lee, Seonggyu, et al.
Published: (2025)
by: Lee, Seonggyu, et al.
Published: (2025)
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)
by: Du, Chenpeng, et al.
Published: (2022)
Complex Recurrent Variational Autoencoder with Application to Speech Enhancement
by: Xie, Yuying, et al.
Published: (2022)
by: Xie, Yuying, et al.
Published: (2022)
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)
by: Ning, Ziqian, et al.
Published: (2023)
FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
by: Li, Hanzhao, et al.
Published: (2025)
by: Li, Hanzhao, et al.
Published: (2025)
Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
by: Ning, Ziqian, et al.
Published: (2024)
by: Ning, Ziqian, et al.
Published: (2024)
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
by: Zhang, Zihan, et al.
Published: (2024)
by: Zhang, Zihan, et al.
Published: (2024)
AMDM-SE: Attention-based Multichannel Diffusion Model for Speech Enhancement
by: Opochinsky, Renana, et al.
Published: (2026)
by: Opochinsky, Renana, et al.
Published: (2026)
FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech
by: Ma, Linhan, et al.
Published: (2025)
by: Ma, Linhan, et al.
Published: (2025)
Similar Items
-
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025) -
FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching
by: Wang, Ziqian, et al.
Published: (2025) -
EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026) -
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
by: Kang, Boyi, et al.
Published: (2025) -
SELM: Speech Enhancement Using Discrete Tokens and Language Models
by: Wang, Ziqian, et al.
Published: (2023)