:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yuewei, Zou, Huanbin, Zhu, Jie
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2401.10494
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speech Enhancement with Overlapped-Frame Information Fusion and Causal Self-Attention
by: Zhang, Yuewei, et al.
Published: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)

LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)

Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement
by: Lin, Zizhen, et al.
Published: (2024)

A Multi-Stage Framework for Multimodal Controllable Speech Synthesis
by: Niu, Rui, et al.
Published: (2025)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)

LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2024)

From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)

Geometry-Constrained EEG Channel Selection for Brain-Assisted Speech Enhancement
by: Zuo, Keying, et al.
Published: (2024)

Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model
by: Li, Jizhen, et al.
Published: (2025)

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
by: Yang, Bing, et al.
Published: (2024)

SaD: A Scenario-Aware Discriminator for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
by: Tran, Minh, et al.
Published: (2025)

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement
by: Rong, Xiaobin, et al.
Published: (2026)

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
by: Zhu, Qiushi, et al.
Published: (2024)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
by: Cui, Zhongjian, et al.
Published: (2025)

Adaptive Convolution for CNN-based Speech Enhancement Models
by: Wang, Dahan, et al.
Published: (2025)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
by: Zhang, Qiquan, et al.
Published: (2024)

Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)

DroFiT: A Lightweight Band-fused Frequency Attention Toward Real-time UAV Speech Enhancement
by: Lee, Jeongmin, et al.
Published: (2025)

Attention-Based Beamformer For Multi-Channel Speech Enhancement
by: Bai, Jinglin, et al.
Published: (2024)

ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)

SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain
by: Wan, Zixiang, et al.
Published: (2025)

Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
by: Dementyev, Artem, et al.
Published: (2024)

Improving Design of Input Condition Invariant Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
by: Nasretdinov, Rauf, et al.
Published: (2025)

FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
by: Goswami, Nabarun, et al.
Published: (2025)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)

Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)

Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio
by: Li, Li, et al.
Published: (2024)

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)

EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training
by: Kwak, Doyeop, et al.
Published: (2025)

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
by: Hao, Xiang, et al.
Published: (2020)

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)