:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Pengyu, Fang, Ying, Li, Xiaofei
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2502.07205
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics
by: Goswami, Mandip
Published: (2026)

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain
by: Wang, Pengyu, et al.
Published: (2025)

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech
by: Hajal, Karl El, et al.
Published: (2025)

Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
by: Hajal, Karl El, et al.
Published: (2025)

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
by: Kamahori, Keisuke, et al.
Published: (2025)

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
by: Richter, Julius, et al.
Published: (2024)

Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models
by: Lemercier, Jean-Marie, et al.
Published: (2024)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

A Hybrid Model for Weakly-Supervised Speech Dereverberation
by: Bahrman, Louis, et al.
Published: (2025)

Single Channel Blind Dereverberation of Speech Signals
by: Nigam, Dhruv
Published: (2025)

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems
by: Chondhekar, Sujal, et al.
Published: (2025)

Can Layer-wise SSL Features Improve Zero-Shot ASR Performance for Children's Speech?
by: Sinha, Abhijit, et al.
Published: (2025)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
by: Quintas, Sebastião, et al.
Published: (2024)

On the Joint Minimization of Regularization Loss Functions in Deep Variational Bayesian Methods for Attribute-Controlled Symbolic Music Generation
by: Pettenó, Matteo, et al.
Published: (2025)

Speech Unlearning
by: Cheng, Jiali, et al.
Published: (2025)

Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)

Soft Clustering Anchors for Self-Supervised Speech Representation Learning in Joint Embedding Prediction Architectures
by: Ioannides, Georgios, et al.
Published: (2026)

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification
by: Bitterman, Jacob, et al.
Published: (2024)

Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)

kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech
by: Hajal, Karl El, et al.
Published: (2024)

Neural Blind Source Separation and Diarization for Distant Speech Recognition
by: Bando, Yoshiaki, et al.
Published: (2024)

SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by: Chang, Kai-Wei, et al.
Published: (2024)

Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
by: Fu, Szu-Wei, et al.
Published: (2024)

MORE: Multi-Objective Adversarial Attacks on Speech Recognition
by: Gao, Xiaoxue, et al.
Published: (2026)

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition
by: Tripathi, Suraj, et al.
Published: (2019)

RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
by: Shao, Yiwen, et al.
Published: (2023)

JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention
by: Ioannides, Georgios, et al.
Published: (2025)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

ASR-Synchronized Speaker-Role Diarization
by: Ghosh, Arindam, et al.
Published: (2025)

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models
by: Moliner, Eloi, et al.
Published: (2024)

Speech Diarization and ASR with GMM
by: Sharma, Aayush Kumar, et al.
Published: (2023)

Collaborative Watermarking for Adversarial Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2023)

Scaling Speech Tokenizers with Diffusion Autoencoders
by: Wang, Yuancheng, et al.
Published: (2026)

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology
by: Li, Haoyang, et al.
Published: (2024)

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
by: Nakilcioglu, Emin Cagatay, et al.
Published: (2023)

Mitigating Unauthorized Speech Synthesis for Voice Protection
by: Zhang, Zhisheng, et al.
Published: (2024)

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech
by: Bae, Jaesung, et al.
Published: (2026)