:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rajagopalan, Rajalaxmi, Giri, Ritwik, Tang, Zhiqiang, Han, Kyu
Format:	Preprint
Published:	2026
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2602.02413
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sample-Constrained Black Box Optimization for Audio Personalization
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)

Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)

Scaling Speech Tokenizers with Diffusion Autoencoders
by: Wang, Yuancheng, et al.
Published: (2026)

wav2pos: Sound Source Localization using Masked Autoencoders
by: Berg, Axel, et al.
Published: (2024)

Dependency-Aware Discrete Diffusion for Scene Graph Generation
by: Rajagopalan, Rajalaxmi, et al.
Published: (2026)

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)

Exploratory Evaluation of Speech Content Masking
by: Williams, Jennifer, et al.
Published: (2024)

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
by: Wang, Yuancheng, et al.
Published: (2024)

Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study
by: Liu, Wuao, et al.
Published: (2026)

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings
by: Varadhan, Praveen Srinivasa, et al.
Published: (2024)

MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)

Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)

Kernel Learning for Sample Constrained Black-Box Optimization
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)

SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)

From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)

Deep Active Speech Cancellation with Mamba-Masking Network
by: Mishaly, Yehuda, et al.
Published: (2025)

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
by: Jiang, Xilin, et al.
Published: (2024)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)

Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)

Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)

Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS
by: Sankar, Ashwin, et al.
Published: (2024)

A Semi-Supervised Framework for Speech Confidence Detection using Whisper
by: Wynn, Adam, et al.
Published: (2026)

Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments
by: Anacin, et al.
Published: (2026)

Optimizing Neural Architectures for Hindi Speech Separation and Enhancement in Noisy Environments
by: Ramamoorthy, Arnav
Published: (2025)

Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model
by: Ahn, Chung-Soo, et al.
Published: (2025)

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)

Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation
by: Kienegger, Jakob, et al.
Published: (2024)

Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
by: Li, Victor OK, et al.
Published: (2025)

PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection
by: Pahar, Madhurananda, et al.
Published: (2026)

EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
by: Muppidi, Akshay, et al.
Published: (2025)

AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)

Myna: Masking-Based Contrastive Learning of Musical Representations
by: Yonay, Ori, et al.
Published: (2025)

Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025)

A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
by: Adnan, Tariq, et al.
Published: (2024)