Saved in:
| Main Authors: | Tan, Xiaozhou, Zhao, Minghui, Ragni, Anton |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.18470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Decoding Order Matters in Autoregressive Speech Synthesis
by: Zhao, Minghui, et al.
Published: (2026)
by: Zhao, Minghui, et al.
Published: (2026)
Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion
by: Ai, Jiabao, et al.
Published: (2026)
by: Ai, Jiabao, et al.
Published: (2026)
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)
by: Sun, Wanli, et al.
Published: (2025)
Self-Train Before You Transcribe
by: Flynn, Robert, et al.
Published: (2024)
by: Flynn, Robert, et al.
Published: (2024)
Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
by: Chen, Zehua, et al.
Published: (2023)
by: Chen, Zehua, et al.
Published: (2023)
Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition
by: Flynn, Robert, et al.
Published: (2026)
by: Flynn, Robert, et al.
Published: (2026)
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
by: Leung, Wing-Zin, et al.
Published: (2024)
by: Leung, Wing-Zin, et al.
Published: (2024)
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
by: Ji, Shengpeng, et al.
Published: (2023)
by: Ji, Shengpeng, et al.
Published: (2023)
Investigating the Design Space of Diffusion Models for Speech Enhancement
by: Gonzalez, Philippe, et al.
Published: (2023)
by: Gonzalez, Philippe, et al.
Published: (2023)
Emphasis Sensitivity in Speech Representations
by: Cassini, Shaun, et al.
Published: (2025)
by: Cassini, Shaun, et al.
Published: (2025)
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
Discrete Optimal Transport and Voice Conversion
by: Selitskiy, Anton, et al.
Published: (2025)
by: Selitskiy, Anton, et al.
Published: (2025)
On-device Streaming Discrete Speech Units
by: Choi, Kwanghee, et al.
Published: (2025)
by: Choi, Kwanghee, et al.
Published: (2025)
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
by: Baoueb, Teysir, et al.
Published: (2025)
by: Baoueb, Teysir, et al.
Published: (2025)
Drax: Speech Recognition with Discrete Flow Matching
by: Navon, Aviv, et al.
Published: (2025)
by: Navon, Aviv, et al.
Published: (2025)
Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)
by: Johnson, Bjorn, et al.
Published: (2026)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)
by: de Oliveira, Danilo, et al.
Published: (2024)
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
by: Lovelace, Justin, et al.
Published: (2025)
by: Lovelace, Justin, et al.
Published: (2025)
Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech
by: Reszka, Joanna, et al.
Published: (2024)
by: Reszka, Joanna, et al.
Published: (2024)
Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency
by: Lay, Bunlong, et al.
Published: (2025)
by: Lay, Bunlong, et al.
Published: (2025)
Text-to-Speech for Unseen Speakers via Low-Complexity Discrete Unit-Based Frame Selection
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)
by: Wang, Yuancheng, et al.
Published: (2025)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)
by: Zhang, Leying, et al.
Published: (2023)
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
by: Richter, Julius, et al.
Published: (2022)
by: Richter, Julius, et al.
Published: (2022)
High-Fidelity Speech Enhancement via Discrete Audio Tokens
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)
by: Hirschkind, Nameer, et al.
Published: (2024)
Speech Foundation Models Generalize to Time Series Tasks from Wearable Sensor Data
by: Narain, Jaya, et al.
Published: (2025)
by: Narain, Jaya, et al.
Published: (2025)
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2025)
by: Lin, Weiwei, et al.
Published: (2025)
Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement
by: Sadeghi, Mostafa, et al.
Published: (2025)
by: Sadeghi, Mostafa, et al.
Published: (2025)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)
by: Diatlova, Daria, et al.
Published: (2024)
DiffSoundStream: Efficient Speech Tokenization via Diffusion Decoding
by: Yang, Yang, et al.
Published: (2025)
by: Yang, Yang, et al.
Published: (2025)
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler
by: Gonzalez, Philippe, et al.
Published: (2023)
by: Gonzalez, Philippe, et al.
Published: (2023)
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
by: Ju, Zeqian, et al.
Published: (2024)
by: Ju, Zeqian, et al.
Published: (2024)
Bone-conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models
by: Khanagha, Sina, et al.
Published: (2026)
by: Khanagha, Sina, et al.
Published: (2026)
Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis
by: Jiang, Xilin, et al.
Published: (2024)
by: Jiang, Xilin, et al.
Published: (2024)
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
Zero-Shot Mono-to-Binaural Speech Synthesis
by: Levkovitch, Alon, et al.
Published: (2024)
by: Levkovitch, Alon, et al.
Published: (2024)
Similar Items
-
Decoding Order Matters in Autoregressive Speech Synthesis
by: Zhao, Minghui, et al.
Published: (2026) -
Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion
by: Ai, Jiabao, et al.
Published: (2026) -
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025) -
Self-Train Before You Transcribe
by: Flynn, Robert, et al.
Published: (2024) -
Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)