Saved in:
| Main Authors: | Liang, Ziqi, Shi, Haoxiang, Wang, Jiawei, Lu, Keda |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.08164 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025)
by: Glazer, Neta, et al.
Published: (2025)
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024)
by: Li, Yingting, et al.
Published: (2024)
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)
by: Sun, Wanli, et al.
Published: (2025)
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)
by: Guo, Yinlin, et al.
Published: (2024)
AlignCap: Aligning Speech Emotion Captioning to Human Preferences
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
by: Kawamura, Masaya, et al.
Published: (2025)
by: Kawamura, Masaya, et al.
Published: (2025)
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
by: Manku, Ruskin Raj, et al.
Published: (2025)
by: Manku, Ruskin Raj, et al.
Published: (2025)
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
by: Jiang, Ziyue, et al.
Published: (2025)
by: Jiang, Ziyue, et al.
Published: (2025)
LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention
by: Zhang, Yaokai, et al.
Published: (2025)
by: Zhang, Yaokai, et al.
Published: (2025)
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
by: Liu, Jeongmin, et al.
Published: (2024)
by: Liu, Jeongmin, et al.
Published: (2024)
Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)
by: Zhou, Wei, et al.
Published: (2024)
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
by: Ji, Shengpeng, et al.
Published: (2023)
by: Ji, Shengpeng, et al.
Published: (2023)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)
by: Ko, Myeongjin, et al.
Published: (2023)
Compact Neural TTS Voices for Accessibility
by: Jain, Kunal, et al.
Published: (2025)
by: Jain, Kunal, et al.
Published: (2025)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
by: Falai, Alessio, et al.
Published: (2025)
by: Falai, Alessio, et al.
Published: (2025)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)
by: Kakoulidis, Panos, et al.
Published: (2024)
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
by: Wang, Yuancheng, et al.
Published: (2025)
by: Wang, Yuancheng, et al.
Published: (2025)
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
by: Gaur, Neeraj, et al.
Published: (2024)
by: Gaur, Neeraj, et al.
Published: (2024)
Test-Time Training for Speech Enhancement
by: Behera, Avishkar, et al.
Published: (2025)
by: Behera, Avishkar, et al.
Published: (2025)
DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2025)
by: Lu, Ye-Xin, et al.
Published: (2025)
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)
by: Hirschkind, Nameer, et al.
Published: (2024)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)
by: de Oliveira, Danilo, et al.
Published: (2024)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models
by: Vora, Jayneel, et al.
Published: (2024)
by: Vora, Jayneel, et al.
Published: (2024)
Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
by: Xie, Yuan, et al.
Published: (2023)
by: Xie, Yuan, et al.
Published: (2023)
LoRP-TTS: Low-Rank Personalized Text-To-Speech
by: Bondaruk, Łukasz, et al.
Published: (2025)
by: Bondaruk, Łukasz, et al.
Published: (2025)
Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)
by: Cheng, Longbiao, et al.
Published: (2024)
Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
by: Menon, Aditya Srinivas, et al.
Published: (2026)
by: Menon, Aditya Srinivas, et al.
Published: (2026)
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
by: Melechovsky, Jan, et al.
Published: (2022)
by: Melechovsky, Jan, et al.
Published: (2022)
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
by: Chen, Zehua, et al.
Published: (2023)
by: Chen, Zehua, et al.
Published: (2023)
RobustSpeechFlow: Learning Robust Text-to-Speech Trajectories via Augmentation-based Contrastive Flow Matching
by: Yang, Jinhyeok, et al.
Published: (2026)
by: Yang, Jinhyeok, et al.
Published: (2026)
Similar Items
-
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025) -
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025) -
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
by: Li, Yingting, et al.
Published: (2024) -
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025) -
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)