Saved in:
| Main Authors: | Lin, Weiwei, He, Chenghan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.01084 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024)
by: Lin, Weiwei, et al.
Published: (2024)
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
by: Chen, Zehua, et al.
Published: (2023)
by: Chen, Zehua, et al.
Published: (2023)
Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)
by: Johnson, Bjorn, et al.
Published: (2026)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
by: Lemercier, Jean-Marie, et al.
Published: (2022)
by: Lemercier, Jean-Marie, et al.
Published: (2022)
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
by: Agrawal, Prabhav, et al.
Published: (2024)
by: Agrawal, Prabhav, et al.
Published: (2024)
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
by: Jiang, Xilin, et al.
Published: (2024)
by: Jiang, Xilin, et al.
Published: (2024)
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
by: Luong, Diep, et al.
Published: (2025)
by: Luong, Diep, et al.
Published: (2025)
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)
by: Levkovitch, Alon, et al.
Published: (2024)
Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
by: Benazir, Afsara, et al.
Published: (2025)
by: Benazir, Afsara, et al.
Published: (2025)
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)
by: Melechovsky, Jan, et al.
Published: (2022)
Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)
by: Hsu, Po-chun, et al.
Published: (2022)
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
by: Pasini, Marco, et al.
Published: (2024)
by: Pasini, Marco, et al.
Published: (2024)
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
by: Kwon, Taegyun, et al.
Published: (2024)
by: Kwon, Taegyun, et al.
Published: (2024)
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
by: Ji, Shengpeng, et al.
Published: (2023)
by: Ji, Shengpeng, et al.
Published: (2023)
Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech
by: Reszka, Joanna, et al.
Published: (2024)
by: Reszka, Joanna, et al.
Published: (2024)
Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models
by: Feng, Chen, et al.
Published: (2025)
by: Feng, Chen, et al.
Published: (2025)
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)
by: de Oliveira, Danilo, et al.
Published: (2024)
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)
by: Wang, Yuancheng, et al.
Published: (2025)
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
by: Neekhara, Paarth, et al.
Published: (2024)
by: Neekhara, Paarth, et al.
Published: (2024)
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
by: Baoueb, Teysir, et al.
Published: (2025)
by: Baoueb, Teysir, et al.
Published: (2025)
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
by: Battenberg, Eric, et al.
Published: (2024)
by: Battenberg, Eric, et al.
Published: (2024)
RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations
by: Kim, Seungmin, et al.
Published: (2025)
by: Kim, Seungmin, et al.
Published: (2025)
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
by: Fu, Yonggan, et al.
Published: (2022)
by: Fu, Yonggan, et al.
Published: (2022)
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)
by: Kakoulidis, Panos, et al.
Published: (2024)
Gradient Norm-based Fine-Tuning for Backdoor Defense in Automatic Speech Recognition
by: Zhou, Nanjun, et al.
Published: (2025)
by: Zhou, Nanjun, et al.
Published: (2025)
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)
by: Ravenscroft, William, et al.
Published: (2024)
Are Deep Speech Denoising Models Robust to Adversarial Noise?
by: Schwarzer, Will, et al.
Published: (2025)
by: Schwarzer, Will, et al.
Published: (2025)
Investigating the Design Space of Diffusion Models for Speech Enhancement
by: Gonzalez, Philippe, et al.
Published: (2023)
by: Gonzalez, Philippe, et al.
Published: (2023)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)
by: Zhang, Leying, et al.
Published: (2023)
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)
by: Richter, Julius, et al.
Published: (2022)
Impact of Speech Mode in Automatic Pathological Speech Detection
by: Sheikh, Shakeel A., et al.
Published: (2024)
by: Sheikh, Shakeel A., et al.
Published: (2024)
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)
by: Hirschkind, Nameer, et al.
Published: (2024)
MiSTR: Multi-Modal iEEG-to-Speech Synthesis with Transformer-Based Prosody Prediction and Neural Phase Reconstruction
by: Al-Radhi, Mohammed Salah, et al.
Published: (2025)
by: Al-Radhi, Mohammed Salah, et al.
Published: (2025)
Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)
by: Sadeghi, Mostafa, et al.
Published: (2025)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)
by: Sarkar, Eklavya, et al.
Published: (2024)
Similar Items
-
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024) -
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025) -
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
by: Chen, Zehua, et al.
Published: (2023) -
Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026) -
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)