:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Longjie, Lu, Shenghui, Li, Lin, Hong, Qingyang
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.24446
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
by: Luo, Longjie, et al.
Published: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)

ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
by: Wang, Zhong-Qiu
Published: (2024)

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
by: Gao, Ming, et al.
Published: (2025)

DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
by: Wang, Xinyu, et al.
Published: (2024)

Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge
by: Huang, Shangkun, et al.
Published: (2025)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
by: Guan, Wenhao, et al.
Published: (2023)

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)

ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)

Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label
by: Yang, Wenhao, et al.
Published: (2024)

XMUspeech Systems for the ASVspoof 5 Challenge
by: Li, Wangjie, et al.
Published: (2025)

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
by: Guan, Wenhao, et al.
Published: (2023)

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec
by: Chen, Peijie, et al.
Published: (2025)

AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
by: Xue, Junxiao, et al.
Published: (2025)

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2026)

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
by: Zezario, Ryandhimas E., et al.
Published: (2023)

Adaptive Convolution for CNN-based Speech Enhancement Models
by: Wang, Dahan, et al.
Published: (2025)

Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation
by: Lin, Zhaofeng, et al.
Published: (2023)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)

The TMU System for the XACLE Challenge: Training Large Audio Language Models with CLAP Pseudo-Labels
by: Tsutsumi, Ayuto, et al.
Published: (2026)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024)

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario
by: Wen, Wen, et al.
Published: (2024)

A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation
by: Lu, Xikun, et al.
Published: (2025)

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)

Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
by: Cappellazzo, Umberto, et al.
Published: (2025)

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
by: Hsieh, Tsun-An, et al.
Published: (2024)

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)

Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2024)

Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks
by: Ai, Yang, et al.
Published: (2024)

PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2025)

Universal Score-based Speech Enhancement with High Content Preservation
by: Scheibler, Robin, et al.
Published: (2024)