:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ai, Jiabao, Zhao, Minghui, Ragni, Anton
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2603.14032
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Discrete-Time Diffusion-Like Models for Speech Synthesis
von: Tan, Xiaozhou, et al.
Veröffentlicht: (2025)

Decoding Order Matters in Autoregressive Speech Synthesis
von: Zhao, Minghui, et al.
Veröffentlicht: (2026)

Score-Based Training for Energy-Based TTS Models
von: Sun, Wanli, et al.
Veröffentlicht: (2025)

Beyond the Utterance: An Empirical Study of Very Long Context Speech Recognition
von: Flynn, Robert, et al.
Veröffentlicht: (2026)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
von: Liu, Huadai, et al.
Veröffentlicht: (2023)

How Much Context Does My Attention-Based ASR System Need?
von: Flynn, Robert, et al.
Veröffentlicht: (2023)

Emphasis Sensitivity in Speech Representations
von: Cassini, Shaun, et al.
Veröffentlicht: (2025)

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
von: He, Xinlu, et al.
Veröffentlicht: (2025)

MELA-TTS: Joint transformer-diffusion model with representation alignment for speech synthesis
von: An, Keyu, et al.
Veröffentlicht: (2025)

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
von: Leung, Wing-Zin, et al.
Veröffentlicht: (2024)

Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
von: Zhao, Yuxiang, et al.
Veröffentlicht: (2025)

Self-Train Before You Transcribe
von: Flynn, Robert, et al.
Veröffentlicht: (2024)

Unified Diffusion Refinement for Multi-Channel Speech Enhancement and Separation
von: Xu, Zhongweiyang, et al.
Veröffentlicht: (2026)

Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising
von: Lu, Ye-Xin, et al.
Veröffentlicht: (2025)

Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech
von: Li, Zirui, et al.
Veröffentlicht: (2025)

NDF+: Joint Neural Directional Filtering and Diffuse Sound Extraction
von: Huang, Weilong, et al.
Veröffentlicht: (2026)

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools
von: Răgman, Teodora, et al.
Veröffentlicht: (2026)

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
von: Park, Hyun Joon, et al.
Veröffentlicht: (2024)

Diffusion-based Signal Refiner for Speech Enhancement and Separation
von: Hirano, Masato, et al.
Veröffentlicht: (2023)

Improving Music Source Separation with Diffusion and Consistency Refinement
von: Karchkhadze, Tornike, et al.
Veröffentlicht: (2024)

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
von: Ma, Linhan, et al.
Veröffentlicht: (2024)

SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
von: Yang, Chenyu, et al.
Veröffentlicht: (2025)

SponTTS: modeling and transferring spontaneous style for TTS
von: Li, Hanzhao, et al.
Veröffentlicht: (2023)

Scalable Controllable Accented TTS
von: Xinyuan, Henry Li, et al.
Veröffentlicht: (2025)

HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-based TTS
von: Nie, Sihang, et al.
Veröffentlicht: (2025)

StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis
von: Chen, Zhiyong, et al.
Veröffentlicht: (2024)

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
von: Eskimez, Sefik Emre, et al.
Veröffentlicht: (2024)

E1 TTS: Simple and Fast Non-Autoregressive TTS
von: Liu, Zhijun, et al.
Veröffentlicht: (2024)

Zero-Shot TTS With Enhanced Audio Prompts: Bsc Submission For The 2026 Wildspoof Challenge TTS Track
von: Giraldo, Jose, et al.
Veröffentlicht: (2026)

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
von: Lu, Ye-Xin, et al.
Veröffentlicht: (2025)

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
von: Liu, Chenlin, et al.
Veröffentlicht: (2025)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
von: Xue, Heyang, et al.
Veröffentlicht: (2025)

SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS
von: Nguyen, Tan Dat, et al.
Veröffentlicht: (2025)

T5Gemma-TTS Technical Report
von: Arata, Chihiro, et al.
Veröffentlicht: (2026)

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS
von: Seo, Deokjin, et al.
Veröffentlicht: (2026)

A Non-autoregressive Model for Joint STT and TTS
von: Sunder, Vishal, et al.
Veröffentlicht: (2025)

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
von: Li, Yinghao Aaron, et al.
Veröffentlicht: (2024)

ProSE: Diffusion Priors for Speech Enhancement
von: Kumar, Sonal, et al.
Veröffentlicht: (2025)

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
von: You, Fuming, et al.
Veröffentlicht: (2024)

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
von: Jiang, Ziyue, et al.
Veröffentlicht: (2025)