Saved in:
| Main Authors: | Rajagopalan, Rajalaxmi, Giri, Ritwik, Tang, Zhiqiang, Han, Kyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02413 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sample-Constrained Black Box Optimization for Audio Personalization
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)
Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025)
by: Rauch, Lukas, et al.
Published: (2025)
Scaling Speech Tokenizers with Diffusion Autoencoders
by: Wang, Yuancheng, et al.
Published: (2026)
by: Wang, Yuancheng, et al.
Published: (2026)
wav2pos: Sound Source Localization using Masked Autoencoders
by: Berg, Axel, et al.
Published: (2024)
by: Berg, Axel, et al.
Published: (2024)
Dependency-Aware Discrete Diffusion for Scene Graph Generation
by: Rajagopalan, Rajalaxmi, et al.
Published: (2026)
by: Rajagopalan, Rajalaxmi, et al.
Published: (2026)
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)
by: Melechovsky, Jan, et al.
Published: (2022)
Exploratory Evaluation of Speech Content Masking
by: Williams, Jennifer, et al.
Published: (2024)
by: Williams, Jennifer, et al.
Published: (2024)
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
by: Wang, Yuancheng, et al.
Published: (2024)
by: Wang, Yuancheng, et al.
Published: (2024)
Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study
by: Liu, Wuao, et al.
Published: (2026)
by: Liu, Wuao, et al.
Published: (2026)
MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)
by: Pham, The Hieu, et al.
Published: (2025)
Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings
by: Varadhan, Praveen Srinivasa, et al.
Published: (2024)
by: Varadhan, Praveen Srinivasa, et al.
Published: (2024)
MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)
by: Paek, Nathan, et al.
Published: (2025)
Kernel Learning for Sample Constrained Black-Box Optimization
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025)
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)
by: Zhang, Zhisheng, et al.
Published: (2025)
From Diet to Free Lunch: Estimating Auxiliary Signal Properties using Dynamic Pruning Masks in Speech Enhancement Networks
by: Miccini, Riccardo, et al.
Published: (2026)
by: Miccini, Riccardo, et al.
Published: (2026)
Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)
by: Chorowski, Jan, et al.
Published: (2021)
Deep Active Speech Cancellation with Mamba-Masking Network
by: Mishaly, Yehuda, et al.
Published: (2025)
by: Mishaly, Yehuda, et al.
Published: (2025)
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)
by: Wu, Linzhi, et al.
Published: (2026)
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
by: Jiang, Xilin, et al.
Published: (2024)
by: Jiang, Xilin, et al.
Published: (2024)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)
by: Chazaridis, Nikos, et al.
Published: (2026)
Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)
by: Dao, Anh-Tuan, et al.
Published: (2026)
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)
by: Zhang, Wenda, et al.
Published: (2026)
Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics
by: Zhang, Ziqian, et al.
Published: (2025)
by: Zhang, Ziqian, et al.
Published: (2025)
IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS
by: Sankar, Ashwin, et al.
Published: (2024)
by: Sankar, Ashwin, et al.
Published: (2024)
A Semi-Supervised Framework for Speech Confidence Detection using Whisper
by: Wynn, Adam, et al.
Published: (2026)
by: Wynn, Adam, et al.
Published: (2026)
Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments
by: Anacin, et al.
Published: (2026)
by: Anacin, et al.
Published: (2026)
Optimizing Neural Architectures for Hindi Speech Separation and Enhancement in Noisy Environments
by: Ramamoorthy, Arnav
Published: (2025)
by: Ramamoorthy, Arnav
Published: (2025)
Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model
by: Ahn, Chung-Soo, et al.
Published: (2025)
by: Ahn, Chung-Soo, et al.
Published: (2025)
Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis
by: Feng, Pengchao, et al.
Published: (2025)
by: Feng, Pengchao, et al.
Published: (2025)
Mask-Weighted Spatial Likelihood Coding for Speaker-Independent Joint Localization and Mask Estimation
by: Kienegger, Jakob, et al.
Published: (2024)
by: Kienegger, Jakob, et al.
Published: (2024)
Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
by: Li, Victor OK, et al.
Published: (2025)
by: Li, Victor OK, et al.
Published: (2025)
PROCESS-2: A Benchmark Speech Corpus for Early Cognitive Impairment Detection
by: Pahar, Madhurananda, et al.
Published: (2026)
by: Pahar, Madhurananda, et al.
Published: (2026)
EmoHRNet: High-Resolution Neural Network Based Speech Emotion Recognition
by: Muppidi, Akshay, et al.
Published: (2025)
by: Muppidi, Akshay, et al.
Published: (2025)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
Myna: Masking-Based Contrastive Learning of Musical Representations
by: Yonay, Ori, et al.
Published: (2025)
by: Yonay, Ori, et al.
Published: (2025)
Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025)
by: Bhowmik, Aritra, et al.
Published: (2025)
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
by: Adnan, Tariq, et al.
Published: (2024)
by: Adnan, Tariq, et al.
Published: (2024)
Similar Items
-
Sample-Constrained Black Box Optimization for Audio Personalization
by: Rajagopalan, Rajalaxmi, et al.
Published: (2025) -
Can Masked Autoencoders Also Listen to Birds?
by: Rauch, Lukas, et al.
Published: (2025) -
Scaling Speech Tokenizers with Diffusion Autoencoders
by: Wang, Yuancheng, et al.
Published: (2026) -
wav2pos: Sound Source Localization using Masked Autoencoders
by: Berg, Axel, et al.
Published: (2024) -
Dependency-Aware Discrete Diffusion for Scene Graph Generation
by: Rajagopalan, Rajalaxmi, et al.
Published: (2026)