Saved in:
| Main Authors: | Ueda, Lucas H., Marques, Leonardo B. de M. M., Simões, Flávio O., Neto, Mário U., Runstein, Fernando, Bó, Bianca Dal, Costa, Paula D. P. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.17364 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation
by: Ueda, Lucas H., et al.
Published: (2026)
by: Ueda, Lucas H., et al.
Published: (2026)
SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)
by: Li, Hanzhao, et al.
Published: (2023)
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)
Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model
by: Ueda, Lucas, et al.
Published: (2025)
by: Ueda, Lucas, et al.
Published: (2025)
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)
by: Koutsogiannaki, Maria, et al.
Published: (2024)
Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)
by: He, Shulin, et al.
Published: (2022)
Improved symbolic drum style classification with grammar-based hierarchical representations
by: Géré, Léo, et al.
Published: (2024)
by: Géré, Léo, et al.
Published: (2024)
Combining audio control and style transfer using latent diffusion
by: Demerlé, Nils, et al.
Published: (2024)
by: Demerlé, Nils, et al.
Published: (2024)
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS
by: Kunešová, Marie, et al.
Published: (2025)
by: Kunešová, Marie, et al.
Published: (2025)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)
by: Yang, Xiaoyu, et al.
Published: (2023)
Crab: Multi Layer Contrastive Supervision to Improve Speech Emotion Recognition Under Both Acted and Natural Speech Condition
by: Ueda, Lucas H., et al.
Published: (2026)
by: Ueda, Lucas H., et al.
Published: (2026)
Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)
by: Ma, Linhan, et al.
Published: (2023)
Improving speaker verification robustness with synthetic emotional utterances
by: Koditala, Nikhil Kumar, et al.
Published: (2024)
by: Koditala, Nikhil Kumar, et al.
Published: (2024)
Complexity of frequency fluctuations and the interpretive style in the bass viola da gamba
by: Lugo, Igor, et al.
Published: (2025)
by: Lugo, Igor, et al.
Published: (2025)
SynthCloner: Synthesizer-style Audio Transfer via Factorized Codec with ADSR Envelope Control
by: Liu, Jeng-Yue, et al.
Published: (2025)
by: Liu, Jeng-Yue, et al.
Published: (2025)
A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)
by: Welford, Alejandro Sosa, et al.
Published: (2025)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
by: Sun, Chang, et al.
Published: (2024)
by: Sun, Chang, et al.
Published: (2024)
On the influence of language similarity in non-target speaker verification trials
by: Reuter, Paul M., et al.
Published: (2025)
by: Reuter, Paul M., et al.
Published: (2025)
E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)
by: Pascu, Octavian, et al.
Published: (2023)
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)
by: Eskimez, Sefik Emre, et al.
Published: (2024)
ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)
How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)
by: Li, Pengqi, et al.
Published: (2024)
Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)
by: A., Tomilov A., et al.
Published: (2024)
The importance of spatial and spectral information in multiple speaker tracking
by: Beit-On, Hanan, et al.
Published: (2024)
by: Beit-On, Hanan, et al.
Published: (2024)
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)
by: Xue, Heyang, et al.
Published: (2025)
Audio-visual child-adult speaker classification in dyadic interactions
by: Xu, Anfeng, et al.
Published: (2023)
by: Xu, Anfeng, et al.
Published: (2023)
Spoken language change detection inspired by speaker change detection
by: Mishra, Jagabandhu, et al.
Published: (2023)
by: Mishra, Jagabandhu, et al.
Published: (2023)
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)
by: Ma, Yi, et al.
Published: (2024)
Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)
by: Eisenberg, Aviad, et al.
Published: (2025)
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)
Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)
by: Murata, Masato, et al.
Published: (2025)
HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
by: Kacprzak, Stanisław, et al.
Published: (2024)
by: Kacprzak, Stanisław, et al.
Published: (2024)
Improving fairness in speaker verification via Group-adapted Fusion Network
by: Shen, Hua, et al.
Published: (2022)
by: Shen, Hua, et al.
Published: (2022)
Similar Items
-
SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation
by: Ueda, Lucas H., et al.
Published: (2026) -
SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023) -
Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching
by: Marques, Leonardo B. de M. M., et al.
Published: (2024) -
Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model
by: Ueda, Lucas, et al.
Published: (2025) -
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)