:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Duojia, Lu, Shenghui, Pan, Hongchen, Zhan, Zongyi, Hong, Qingyang, Li, Lin
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.14858
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)

AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026)

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
by: Luo, Longjie, et al.
Published: (2025)

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
by: Wang, Kaidi, et al.
Published: (2025)

Continual Audio Deepfake Detection via Universal Adversarial Perturbation
by: Li, Wangjie, et al.
Published: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)

Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)

Investigating training objective for flow matching-based speech enhancement
by: Yang, Liusha, et al.
Published: (2025)

GDiffuSE: Diffusion-based speech enhancement with noise model guidance
by: Yanir, Efrayim, et al.
Published: (2025)

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization
by: Ren, Pengyu, et al.
Published: (2025)

SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
by: Luo, Longjie, et al.
Published: (2025)

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
by: Li, Xuyuan, et al.
Published: (2023)

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
by: Guan, Wenhao, et al.
Published: (2023)

IntMeanFlow: Few-step Speech Generation with Integral Velocity Distillation
by: Wang, Wei, et al.
Published: (2025)

Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)

A two-step approach for speech enhancement in low-SNR scenarios using cyclostationary beamforming and DNNs
by: Bologni, Giovanni, et al.
Published: (2026)

MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
by: Kaneko, Takuhiro, et al.
Published: (2026)

Cross-attention and Self-attention for Audio-visual Speaker Diarization in MISP-Meeting Challenge
by: Li, Zhaoyang, et al.
Published: (2025)

WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
by: Guan, Wenhao, et al.
Published: (2025)

An adaptive filter bank based neural network approach for time delay estimation and speech enhancement
by: Ma, Lu
Published: (2025)

Monaural speech enhancement on drone via Adapter based transfer learning
by: Chen, Xingyu, et al.
Published: (2024)

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)

Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)

Gen-SER: When the generative model meets speech emotion recognition
by: Wang, Taihui, et al.
Published: (2026)

Spectral oversubtraction? An approach for speech enhancement after robot ego speech filtering in semi-real-time
by: Li, Yue, et al.
Published: (2024)

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
by: Guo, Wenxiang, et al.
Published: (2025)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)

MeanFlow-Accelerated Multimodal Video-to-Audio Synthesis via One-Step Generation
by: Yang, Xiaoran, et al.
Published: (2025)

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)

Unsupervised speech enhancement with spectral kurtosis and double deep priors
by: Ohnaka, Hien, et al.
Published: (2024)

SPGM: Prioritizing Local Features for enhanced speech separation performance
by: Yip, Jia Qi, et al.
Published: (2023)

Inter-channel Conv-TasNet for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2021)

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)

InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection
by: Li, Zongyi, et al.
Published: (2025)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
by: Guan, Wenhao, et al.
Published: (2023)

XMUspeech Systems for the ASVspoof 5 Challenge
by: Li, Wangjie, et al.
Published: (2025)

Single-step Controllable Music Bandwidth Extension With Flow Matching
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)

An automatic mixing speech enhancement system for multi-track audio
by: Liu, Xiaojing, et al.
Published: (2024)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)