:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Han, Wooseok, Kang, Minki, Kim, Changhun, Yang, Eunho
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Sound Artificial Intelligence Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2412.20155
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
di: Kang, Minki, et al.
Pubblicazione: (2023)

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
di: Wu, Zhichao, et al.
Pubblicazione: (2025)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
di: Cho, Deok-Hyeon, et al.
Pubblicazione: (2025)

No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
di: Shin, Seungyoun, et al.
Pubblicazione: (2025)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
di: Liu, Jiaxuan, et al.
Pubblicazione: (2024)

Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models
di: Lee, Kyowoon, et al.
Pubblicazione: (2025)

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
di: Kim, Nam-Gyu, et al.
Pubblicazione: (2025)

LoRP-TTS: Low-Rank Personalized Text-To-Speech
di: Bondaruk, Łukasz, et al.
Pubblicazione: (2025)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
di: Liu, Yutong, et al.
Pubblicazione: (2025)

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
di: Kim, Ji-Hoon, et al.
Pubblicazione: (2024)

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
di: Xie, Tianxin, et al.
Pubblicazione: (2025)

Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
di: Kim, Taesoo, et al.
Pubblicazione: (2025)

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
di: Cho, Deok-Hyeon, et al.
Pubblicazione: (2024)

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis
di: Choi, Youngwon, et al.
Pubblicazione: (2026)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
di: Wang, Wenbin, et al.
Pubblicazione: (2024)

Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
di: Susladkar, Onkar Kishor, et al.
Pubblicazione: (2024)

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
di: Wang, Xinsheng, et al.
Pubblicazione: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
di: Kang, Jiawen, et al.
Pubblicazione: (2024)

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
di: Deng, Wei, et al.
Pubblicazione: (2025)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
di: Melechovsky, Jan, et al.
Pubblicazione: (2024)

Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement
di: Chen, Qianniu, et al.
Pubblicazione: (2025)

Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech
di: Chu, Yunji, et al.
Pubblicazione: (2024)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
di: Wang, Haoyu, et al.
Pubblicazione: (2024)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
di: Li, Haitao, et al.
Pubblicazione: (2026)

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
di: Qi, Xin, et al.
Pubblicazione: (2024)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
di: Jiang, Yicong, et al.
Pubblicazione: (2024)

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
di: Lemerle, Théodor, et al.
Pubblicazione: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
di: Fu, Ruibo, et al.
Pubblicazione: (2024)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
di: Ma, Ziyang, et al.
Pubblicazione: (2023)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
di: Cho, Deok-Hyeon, et al.
Pubblicazione: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
di: Bataev, Vladimir, et al.
Pubblicazione: (2025)

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
di: Yang, Dong, et al.
Pubblicazione: (2025)

Adaptive Duration Model for Text Speech Alignment
di: Cao, Junjie
Pubblicazione: (2025)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
di: Park, Nohil, et al.
Pubblicazione: (2024)

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability
di: Park, Hyun Joon, et al.
Pubblicazione: (2024)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
di: Guan, Wenhao, et al.
Pubblicazione: (2023)

Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
di: Zhao, Junchuan, et al.
Pubblicazione: (2025)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
di: Guo, Yiwei, et al.
Pubblicazione: (2024)

Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
di: Liu, Chenlin, et al.
Pubblicazione: (2025)

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
di: Shi, Hao, et al.
Pubblicazione: (2024)