:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ueda, Lucas H., Marques, Leonardo B. de M. M., Simões, Flávio O., Neto, Mário U., Runstein, Fernando, Bó, Bianca Dal, Costa, Paula D. P.
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2409.17364
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SelfTTS: cross-speaker style transfer through explicit embedding disentanglement and self-refinement using self-augmentation
by: Ueda, Lucas H., et al.
Published: (2026)

SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)

Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching
by: Marques, Leonardo B. de M. M., et al.
Published: (2024)

Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model
by: Ueda, Lucas, et al.
Published: (2025)

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)

Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)

Improved symbolic drum style classification with grammar-based hierarchical representations
by: Géré, Léo, et al.
Published: (2024)

Combining audio control and style transfer using latent diffusion
by: Demerlé, Nils, et al.
Published: (2024)

An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS
by: Kunešová, Marie, et al.
Published: (2025)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)

PromptASR for contextualized ASR with controllable style
by: Yang, Xiaoyu, et al.
Published: (2023)

Crab: Multi Layer Contrastive Supervision to Improve Speech Emotion Recognition Under Both Acted and Natural Speech Condition
by: Ueda, Lucas H., et al.
Published: (2026)

Accent-VITS:accent transfer for end-to-end TTS
by: Ma, Linhan, et al.
Published: (2023)

Improving speaker verification robustness with synthetic emotional utterances
by: Koditala, Nikhil Kumar, et al.
Published: (2024)

Complexity of frequency fluctuations and the interpretive style in the bass viola da gamba
by: Lugo, Igor, et al.
Published: (2025)

SynthCloner: Synthesizer-style Audio Transfer via Factorized Codec with ADSR Envelope Control
by: Liu, Jeng-Yue, et al.
Published: (2025)

A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
by: Sun, Chang, et al.
Published: (2024)

On the influence of language similarity in non-target speaker verification trials
by: Reuter, Paul M., et al.
Published: (2025)

E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)

Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
by: Eskimez, Sefik Emre, et al.
Published: (2024)

ManaTTS Persian: a recipe for creating TTS datasets for lower resource languages
by: Qharabagh, Mahta Fetrat, et al.
Published: (2024)

How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)

Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)

The importance of spatial and spectral information in multiple speaker tracking
by: Beit-On, Hanan, et al.
Published: (2024)

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
by: He, Xinlu, et al.
Published: (2025)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)

Audio-visual child-adult speaker classification in dyadic interactions
by: Xu, Anfeng, et al.
Published: (2023)

Spoken language change detection inspired by speaker change detection
by: Mishra, Jagabandhu, et al.
Published: (2023)

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)

Spectral or spatial? Leveraging both for speaker extraction in challenging data conditions
by: Eisenberg, Aviad, et al.
Published: (2025)

Why disentanglement-based speaker anonymization systems fail at preserving emotions?
by: Gaznepoglu, Ünal Ege, et al.
Published: (2025)

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control
by: Murata, Masato, et al.
Published: (2025)

HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
by: Kacprzak, Stanisław, et al.
Published: (2024)

Improving fairness in speaker verification via Group-adapted Fusion Network
by: Shen, Hua, et al.
Published: (2022)