Saved in:
| Main Authors: | Li, Duojia, Lu, Shenghui, Pan, Hongchen, Zhan, Zongyi, Hong, Qingyang, Li, Lin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.14858 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)
by: Zhu, Yike, et al.
Published: (2025)
AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026)
by: Li, Duojia, et al.
Published: (2026)
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
by: Luo, Longjie, et al.
Published: (2025)
by: Luo, Longjie, et al.
Published: (2025)
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
by: Wang, Kaidi, et al.
Published: (2025)
by: Wang, Kaidi, et al.
Published: (2025)
Continual Audio Deepfake Detection via Universal Adversarial Perturbation
by: Li, Wangjie, et al.
Published: (2025)
by: Li, Wangjie, et al.
Published: (2025)
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)
by: Lu, Shenghui, et al.
Published: (2025)
Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)
by: Wang, Taihui, et al.
Published: (2025)
Investigating training objective for flow matching-based speech enhancement
by: Yang, Liusha, et al.
Published: (2025)
by: Yang, Liusha, et al.
Published: (2025)
GDiffuSE: Diffusion-based speech enhancement with noise model guidance
by: Yanir, Efrayim, et al.
Published: (2025)
by: Yanir, Efrayim, et al.
Published: (2025)
ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
by: Ren, Pengyu, et al.
Published: (2025)
by: Ren, Pengyu, et al.
Published: (2025)
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
by: Luo, Longjie, et al.
Published: (2025)
by: Luo, Longjie, et al.
Published: (2025)
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
by: Li, Xuyuan, et al.
Published: (2023)
by: Li, Xuyuan, et al.
Published: (2023)
ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
by: Guan, Wenhao, et al.
Published: (2023)
by: Guan, Wenhao, et al.
Published: (2023)
IntMeanFlow: Few-step Speech Generation with Integral Velocity Distillation
by: Wang, Wei, et al.
Published: (2025)
by: Wang, Wei, et al.
Published: (2025)
Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)
by: Kumar, Anurag, et al.
Published: (2024)
A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs
by: Bologni, Giovanni, et al.
Published: (2026)
by: Bologni, Giovanni, et al.
Published: (2026)
MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
by: Kaneko, Takuhiro, et al.
Published: (2026)
by: Kaneko, Takuhiro, et al.
Published: (2026)
Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)
by: Wang, Rongxiang, et al.
Published: (2024)
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)
by: Guan, Wenhao, et al.
Published: (2025)
An adaptive filter bank based neural network approach for time delay estimation and speech enhancement
by: Ma, Lu
Published: (2025)
by: Ma, Lu
Published: (2025)
Monaural speech enhancement on drone via Adapter based transfer learning
by: Chen, Xingyu, et al.
Published: (2024)
by: Chen, Xingyu, et al.
Published: (2024)
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)
by: Guan, Wenhao, et al.
Published: (2024)
Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)
by: Tabatabaee, Saba, et al.
Published: (2026)
Gen-SER: When the generative model meets speech emotion recognition
by: Wang, Taihui, et al.
Published: (2026)
by: Wang, Taihui, et al.
Published: (2026)
Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
by: Li, Yue, et al.
Published: (2024)
by: Li, Yue, et al.
Published: (2024)
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
by: Guo, Wenxiang, et al.
Published: (2025)
by: Guo, Wenxiang, et al.
Published: (2025)
Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)
by: Kim, Yunsik, et al.
Published: (2025)
MeanFlow-Accelerated Multimodal Video-to-Audio Synthesis via One-Step Generation
by: Yang, Xiaoran, et al.
Published: (2025)
by: Yang, Xiaoran, et al.
Published: (2025)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
Unsupervised speech enhancement with spectral kurtosis and double deep priors
by: Ohnaka, Hien, et al.
Published: (2024)
by: Ohnaka, Hien, et al.
Published: (2024)
SPGM: Prioritizing Local Features for enhanced speech separation performance
by: Yip, Jia Qi, et al.
Published: (2023)
by: Yip, Jia Qi, et al.
Published: (2023)
Inter-channel Conv-TasNet for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2021)
by: Lee, Dongheon, et al.
Published: (2021)
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)
by: Huang, Hukai, et al.
Published: (2024)
InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection
by: Li, Zongyi, et al.
Published: (2025)
by: Li, Zongyi, et al.
Published: (2025)
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
by: Guan, Wenhao, et al.
Published: (2023)
by: Guan, Wenhao, et al.
Published: (2023)
XMUspeech Systems for the ASVspoof 5 Challenge
by: Li, Wangjie, et al.
Published: (2025)
by: Li, Wangjie, et al.
Published: (2025)
Single-step Controllable Music Bandwidth Extension With Flow Matching
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)
An automatic mixing speech enhancement system for multi-track audio
by: Liu, Xiaojing, et al.
Published: (2024)
by: Liu, Xiaojing, et al.
Published: (2024)
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
Similar Items
-
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025) -
AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026) -
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
by: Luo, Longjie, et al.
Published: (2025) -
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
by: Wang, Kaidi, et al.
Published: (2025) -
Continual Audio Deepfake Detection via Universal Adversarial Perturbation
by: Li, Wangjie, et al.
Published: (2025)