Saved in:
| Main Authors: | Chiu, Aemon Yat Fei, Fung, Kei Ching, Li, Roger Tsz Yeung, Li, Jingyu, Lee, Tan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.05310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
by: Li, Jingyu, et al.
Published: (2024)
by: Li, Jingyu, et al.
Published: (2024)
CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)
Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters
by: Chiu, Aemon Yat Fei, et al.
Published: (2026)
by: Chiu, Aemon Yat Fei, et al.
Published: (2026)
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
by: Xiao, Yujia, et al.
Published: (2025)
by: Xiao, Yujia, et al.
Published: (2025)
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)
by: Aldeneh, Zakaria, et al.
Published: (2024)
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2024)
by: Chen, Yafeng, et al.
Published: (2024)
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2023)
by: Chen, Yafeng, et al.
Published: (2023)
Refining Self-Supervised Learnt Speech Representation using Brain Activations
by: Li, Hengyu, et al.
Published: (2024)
by: Li, Hengyu, et al.
Published: (2024)
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
by: Tian, Yusheng, et al.
Published: (2024)
by: Tian, Yusheng, et al.
Published: (2024)
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
Leveraging Self-Supervised Learning for Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2024)
by: Han, Jiangyu, et al.
Published: (2024)
On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
by: Cai, Danwei, et al.
Published: (2024)
by: Cai, Danwei, et al.
Published: (2024)
In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
by: Jin, Jiawei, et al.
Published: (2025)
by: Jin, Jiawei, et al.
Published: (2025)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
by: Ge, Zirui, et al.
Published: (2023)
by: Ge, Zirui, et al.
Published: (2023)
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
by: Geng, Haopeng, et al.
Published: (2024)
by: Geng, Haopeng, et al.
Published: (2024)
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)
by: Venkateswaran, Nitin, et al.
Published: (2025)
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)
by: Li, Jialu, et al.
Published: (2024)
Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance
by: Shi, Runwu, et al.
Published: (2025)
by: Shi, Runwu, et al.
Published: (2025)
Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)
by: Ashihara, Takanori, et al.
Published: (2024)
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)
by: Tan, Chao, et al.
Published: (2024)
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
by: Stahl, Benjamin, et al.
Published: (2025)
by: Stahl, Benjamin, et al.
Published: (2025)
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)
by: Zhou, Xuehao, et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
by: Zhu, Xiaoxu, et al.
Published: (2025)
by: Zhu, Xiaoxu, et al.
Published: (2025)
SLAP: Learning Speaker and Health-Related Representations from Natural Language Supervision
by: Ando, Angelika, et al.
Published: (2025)
by: Ando, Angelika, et al.
Published: (2025)
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
by: Ruggiero, Giuseppe, et al.
Published: (2025)
by: Ruggiero, Giuseppe, et al.
Published: (2025)
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)
by: Kong, Jungil, et al.
Published: (2023)
SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
by: Zhang, Hanlin, et al.
Published: (2026)
by: Zhang, Hanlin, et al.
Published: (2026)
Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
by: Lepage, Theo, et al.
Published: (2024)
by: Lepage, Theo, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
The DKU System for Multi-Speaker Automatic Speech Recognition in MLC-SLM Challenge
by: Lin, Yuke, et al.
Published: (2025)
by: Lin, Yuke, et al.
Published: (2025)
Learning Emotion-Invariant Speaker Representations for Speaker Verification
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
by: Lin, Zhennan, et al.
Published: (2026)
by: Lin, Zhennan, et al.
Published: (2026)
Similar Items
-
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
by: Li, Jingyu, et al.
Published: (2024) -
CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025
by: Chiu, Aemon Yat Fei, et al.
Published: (2025) -
Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters
by: Chiu, Aemon Yat Fei, et al.
Published: (2026) -
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
by: Xiao, Yujia, et al.
Published: (2025) -
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)