:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Luo, Tianze, Miao, Xingchen, Duan, Wenbo
Format:	Preprint
Published:	2025
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2503.16689
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)

MusicHiFi: Fast High-Fidelity Stereo Vocoding
by: Zhu, Ge, et al.
Published: (2024)

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
by: Gu, Yicheng, et al.
Published: (2024)

High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)

ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis
by: He, Xiangheng, et al.
Published: (2024)

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2024)

FLowHigh: Towards Efficient and High-Quality Audio Super-Resolution with Single-Step Flow Matching
by: Yun, Jun-Hak, et al.
Published: (2025)

Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
by: Yoneyama, Reo, et al.
Published: (2025)

FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
by: Yusuyin, Saierdaer, et al.
Published: (2024)

Real-Time Streaming Mel Vocoding with Generative Flow Matching
by: Welker, Simon, et al.
Published: (2025)

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
by: Du, Hui-Peng, et al.
Published: (2024)

Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
by: Kanda, Naoyuki, et al.
Published: (2024)

Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)

Speech Foundation Models and Crowdsourcing for Efficient, High-Quality Data Collection
by: Lee, Beomseok, et al.
Published: (2024)

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)

A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)

EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting
by: Buyuksolak, Oguzhan, et al.
Published: (2026)

Improving Resource-Efficient Speech Enhancement via Neural Differentiable DSP Vocoder Refinement
by: Guimarães, Heitor R., et al.
Published: (2025)

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
by: Cui, Ziyun, et al.
Published: (2024)

Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features
by: Ohnaka, Hien, et al.
Published: (2026)

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
by: Tang, Yun, et al.
Published: (2025)

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
by: Xie, Yuan, et al.
Published: (2026)

UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers
by: Lin, Liang, et al.
Published: (2025)

Borderless Long Speech Synthesis
by: Song, Xingchen, et al.
Published: (2026)

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
by: Zuo, Jialong, et al.
Published: (2025)

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
by: Zhang, Chong, et al.
Published: (2025)

Efficient Interleaved Speech Modeling through Knowledge Distillation
by: Nouriborji, Mohammadmahdi, et al.
Published: (2025)

EMMeTT: Efficient Multimodal Machine Translation Training
by: Żelasko, Piotr, et al.
Published: (2024)

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
by: Hung, Chia-Yu, et al.
Published: (2024)

Efficient Speech Translation through Model Compression and Knowledge Distillation
by: Moslem, Yasmin
Published: (2025)

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
by: Du, Yichao, et al.
Published: (2024)