:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Weiwei, He, Chenghan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.01084
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024)

Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
by: Chen, Zehua, et al.
Published: (2023)

Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
by: Agrawal, Prabhav, et al.
Published: (2024)

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
by: Jiang, Xilin, et al.
Published: (2024)

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
by: Luong, Diep, et al.
Published: (2025)

Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)

Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
by: Benazir, Afsara, et al.
Published: (2025)

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)

Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
by: Pasini, Marco, et al.
Published: (2024)

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)

TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
by: Ji, Shengpeng, et al.
Published: (2023)

Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech
by: Reszka, Joanna, et al.
Published: (2024)

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
by: Feng, Chen, et al.
Published: (2025)

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)

Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
by: Neekhara, Paarth, et al.
Published: (2024)

GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
by: Baoueb, Teysir, et al.
Published: (2025)

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
by: Battenberg, Eric, et al.
Published: (2024)

RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations
by: Kim, Seungmin, et al.
Published: (2025)

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
by: Fu, Yonggan, et al.
Published: (2022)

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)

Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition
by: Zhou, Nanjun, et al.
Published: (2025)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)

Are Deep Speech Denoising Models Robust to Adversarial Noise?
by: Schwarzer, Will, et al.
Published: (2025)

Investigating the Design Space of Diffusion Models for Speech Enhancement
by: Gonzalez, Philippe, et al.
Published: (2023)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)

Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)

Impact of Speech Mode in Automatic Pathological Speech Detection
by: Sheikh, Shakeel A., et al.
Published: (2024)

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)

MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction
by: Al-Radhi, Mohammed Salah, et al.
Published: (2025)

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)

Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)