Saved in:
| Main Authors: | Wang, Shuai, Zhu, Pengcheng, Li, Haizhou |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.15782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026)
by: Ma, Yi, et al.
Published: (2026)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)
by: Liu, Zhijun, et al.
Published: (2024)
Interpreting the Dimensions of Speaker Embedding Space
by: Huckvale, Mark
Published: (2025)
by: Huckvale, Mark
Published: (2025)
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)
by: Ma, Yi, et al.
Published: (2025)
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
by: Liu, Tianchi, et al.
Published: (2023)
by: Liu, Tianchi, et al.
Published: (2023)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)
by: Lin, Chaohao, et al.
Published: (2025)
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)
by: Zeng, Bang, et al.
Published: (2024)
Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Fine-Grained Quantitative Emotion Editing for Speech Generation
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2024)
by: Horiguchi, Shota, et al.
Published: (2024)
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)
by: Inoue, Sho, et al.
Published: (2025)
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)
by: Zhou, Xuehao, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)
by: Zeng, Bang, et al.
Published: (2025)
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)
by: Wang, Wupeng, et al.
Published: (2025)
Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings
by: Horiguchi, Shota, et al.
Published: (2024)
by: Horiguchi, Shota, et al.
Published: (2024)
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
by: Thienpondt, Jenthe, et al.
Published: (2024)
by: Thienpondt, Jenthe, et al.
Published: (2024)
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
by: Li, Sirui, et al.
Published: (2025)
by: Li, Sirui, et al.
Published: (2025)
Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)
by: Ren, Yiming, et al.
Published: (2025)
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Explaining Speaker and Spoof Embeddings via Probing
by: Liu, Xuechen, et al.
Published: (2024)
by: Liu, Xuechen, et al.
Published: (2024)
Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data
by: Bai, Qibing, et al.
Published: (2025)
by: Bai, Qibing, et al.
Published: (2025)
Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance
by: Shi, Runwu, et al.
Published: (2025)
by: Shi, Runwu, et al.
Published: (2025)
DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
by: Huang, Wen, et al.
Published: (2024)
by: Huang, Wen, et al.
Published: (2024)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction
by: Wu, Wenxuan, et al.
Published: (2025)
by: Wu, Wenxuan, et al.
Published: (2025)
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)
by: Wang, Qing, et al.
Published: (2024)
Similar Items
-
PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026) -
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024) -
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024) -
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024) -
E1 TTS: Simple and Fast Non-Autoregressive TTS
by: Liu, Zhijun, et al.
Published: (2024)