Saved in:
| Main Authors: | Luo, Tianze, Miao, Xingchen, Duan, Wenbo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.16689 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)
by: Zhu, Ge, et al.
Published: (2024)
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)
by: Gu, Yicheng, et al.
Published: (2024)
High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)
by: Labiausse, Tom, et al.
Published: (2025)
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)
by: He, Xiangheng, et al.
Published: (2024)
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2024)
by: Zhong, Jinzuomu, et al.
Published: (2024)
FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
by: Yun, Jun-Hak, et al.
Published: (2025)
by: Yun, Jun-Hak, et al.
Published: (2025)
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)
by: Yoneyama, Reo, et al.
Published: (2025)
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
by: Yusuyin, Saierdaer, et al.
Published: (2024)
by: Yusuyin, Saierdaer, et al.
Published: (2024)
Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)
by: Welker, Simon, et al.
Published: (2025)
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)
by: Du, Hui-Peng, et al.
Published: (2024)
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)
by: Lan, Gael Le, et al.
Published: (2024)
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
by: Kanda, Naoyuki, et al.
Published: (2024)
by: Kanda, Naoyuki, et al.
Published: (2024)
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
by: Lee, Beomseok, et al.
Published: (2024)
by: Lee, Beomseok, et al.
Published: (2024)
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)
by: Yang, Dongchao, et al.
Published: (2024)
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)
by: Miao, Xiaoxiao, et al.
Published: (2024)
EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting
by: Buyuksolak, Oguzhan, et al.
Published: (2026)
by: Buyuksolak, Oguzhan, et al.
Published: (2026)
Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement
by: Guimarães, Heitor R., et al.
Published: (2025)
by: Guimarães, Heitor R., et al.
Published: (2025)
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
by: Cui, Ziyun, et al.
Published: (2024)
by: Cui, Ziyun, et al.
Published: (2024)
Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features
by: Ohnaka, Hien, et al.
Published: (2026)
by: Ohnaka, Hien, et al.
Published: (2026)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
by: Tang, Yun, et al.
Published: (2025)
by: Tang, Yun, et al.
Published: (2025)
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)
by: Xie, Yuan, et al.
Published: (2026)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers
by: Lin, Liang, et al.
Published: (2025)
by: Lin, Liang, et al.
Published: (2025)
Borderless Long Speech Synthesis
by: Song, Xingchen, et al.
Published: (2026)
by: Song, Xingchen, et al.
Published: (2026)
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)
by: Jiang, Xiao-Hang, et al.
Published: (2024)
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
by: Zhang, Chong, et al.
Published: (2025)
by: Zhang, Chong, et al.
Published: (2025)
Efficient Interleaved Speech Modeling through Knowledge Distillation
by: Nouriborji, Mohammadmahdi, et al.
Published: (2025)
by: Nouriborji, Mohammadmahdi, et al.
Published: (2025)
EMMeTT: Efficient Multimodal Machine Translation Training
by: Żelasko, Piotr, et al.
Published: (2024)
by: Żelasko, Piotr, et al.
Published: (2024)
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
by: Hung, Chia-Yu, et al.
Published: (2024)
by: Hung, Chia-Yu, et al.
Published: (2024)
Efficient Speech Translation through Model Compression and Knowledge Distillation
by: Moslem, Yasmin
Published: (2025)
by: Moslem, Yasmin
Published: (2025)
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
by: Du, Yichao, et al.
Published: (2024)
by: Du, Yichao, et al.
Published: (2024)
Similar Items
-
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024) -
MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024) -
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024) -
High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025) -
ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)