:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Matiyali, Neeraj, Srivastava, Siddharth, Sharma, Gaurav
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Sound Computation and Language
Accesso online:	https://arxiv.org/abs/2508.17031
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
di: Kawamura, Masaya, et al.
Pubblicazione: (2024)

Styleclone: Face Stylization with Diffusion Based Data Augmentation
di: Matiyali, Neeraj, et al.
Pubblicazione: (2025)

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
di: Liu, Jiaxuan, et al.
Pubblicazione: (2024)

GSA-TTS : Toward Zero-Shot Speech Synthesis based on Gradual Style Adaptor
di: Lee, Seokgi, et al.
Pubblicazione: (2025)

Preserve Anything: Controllable Image Synthesis with Object Preservation
di: Sharma, Prasen Kumar, et al.
Pubblicazione: (2025)

MunTTS: A Text-to-Speech System for Mundari
di: Gumma, Varun, et al.
Pubblicazione: (2024)

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
di: Liu, Yutong, et al.
Pubblicazione: (2025)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
di: Ma, Ziyang, et al.
Pubblicazione: (2023)

MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
di: Peng, Yifan, et al.
Pubblicazione: (2024)

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model
di: Lehečka, Jan, et al.
Pubblicazione: (2024)

Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation
di: He, Jiaxu, et al.
Pubblicazione: (2026)

HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
di: Li, Yingting, et al.
Pubblicazione: (2024)

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
di: Liu, Sen, et al.
Pubblicazione: (2024)

Investigation of Speaker Representation for Target-Speaker Speech Processing
di: Ashihara, Takanori, et al.
Pubblicazione: (2024)

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Factorized Discrete Flow Matching
di: Nguyen, Ngoc-Son, et al.
Pubblicazione: (2025)

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
di: Li, Xiang, et al.
Pubblicazione: (2024)

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
di: Lou, Haowei, et al.
Pubblicazione: (2025)

Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
di: Falai, Alessio, et al.
Pubblicazione: (2025)

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
di: Wang, Wenbin, et al.
Pubblicazione: (2024)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
di: Guan, Wenhao, et al.
Pubblicazione: (2023)

Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
di: Tomashenko, Natalia, et al.
Pubblicazione: (2024)

Style Mixture of Experts for Expressive Text-To-Speech Synthesis
di: Jawaid, Ahad, et al.
Pubblicazione: (2024)

Lombard Speech Synthesis for Any Voice with Controllable Style Embeddings
di: Akti, Seymanur, et al.
Pubblicazione: (2026)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
di: Wang, Haoyu, et al.
Pubblicazione: (2024)

Are Paralinguistic Representations all that is needed for Speech Emotion Recognition?
di: Phukan, Orchid Chetia, et al.
Pubblicazione: (2024)

Length-Aware Rotary Position Embedding for Text-Speech Alignment
di: Kim, Hyeongju, et al.
Pubblicazione: (2025)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
di: Kong, Jungil, et al.
Pubblicazione: (2023)

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
di: Sudo, Yui, et al.
Pubblicazione: (2025)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
di: Zhong, Yi, et al.
Pubblicazione: (2023)

Comparative Evaluation of Expressive Japanese Character Text-to-Speech with VITS and Style-BERT-VITS2
di: Rackauckas, Zackary, et al.
Pubblicazione: (2025)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
di: Fu, Ruibo, et al.
Pubblicazione: (2024)

Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
di: Sakuma, Asahi, et al.
Pubblicazione: (2025)

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
di: Battenberg, Eric, et al.
Pubblicazione: (2024)

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
di: Chen, Xi, et al.
Pubblicazione: (2024)

Scaling Rich Style-Prompted Text-to-Speech Datasets
di: Diwan, Anuj, et al.
Pubblicazione: (2025)

GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM
di: Song, Yaodong, et al.
Pubblicazione: (2025)

Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
di: Di, Xinhan, et al.
Pubblicazione: (2024)

Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
di: Mei, Yuxiang, et al.
Pubblicazione: (2026)

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
di: Luo, Dan, et al.
Pubblicazione: (2025)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
di: Bataev, Vladimir, et al.
Pubblicazione: (2025)