:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chiu, Aemon Yat Fei, Fung, Kei Ching, Li, Roger Tsz Yeung, Li, Jingyu, Lee, Tan
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2501.05310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
by: Li, Jingyu, et al.
Published: (2024)

CUHK-EE Systems for the vTAD Challenge at NCMMSC 2025
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)

Voice Timbre Attribute Detection with Compact and Interpretable Training-Free Acoustic Parameters
by: Chiu, Aemon Yat Fei, et al.
Published: (2026)

PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
by: Xiao, Yujia, et al.
Published: (2025)

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2024)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
by: Chen, Yafeng, et al.
Published: (2023)

Refining Self-Supervised Learnt Speech Representation using Brain Activations
by: Li, Hengyu, et al.
Published: (2024)

Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
by: Tian, Yusheng, et al.
Published: (2024)

Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)

Leveraging Self-Supervised Learning for Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2024)

On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
by: Cai, Danwei, et al.
Published: (2024)

In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
by: Jin, Jiawei, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
by: Ge, Zirui, et al.
Published: (2023)

Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
by: Geng, Haopeng, et al.
Published: (2024)

Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
by: Liu, Rui, et al.
Published: (2024)

Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
by: Li, Jialu, et al.
Published: (2024)

Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance
by: Shi, Runwu, et al.
Published: (2025)

Investigation of Speaker Representation for Target-Speaker Speech Processing
by: Ashihara, Takanori, et al.
Published: (2024)

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)

Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
by: Stahl, Benjamin, et al.
Published: (2025)

Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
by: Zhu, Xiaoxu, et al.
Published: (2025)

SLAP: Learning Speaker and Health-Related Representations from Natural Language Supervision
by: Ando, Angelika, et al.
Published: (2025)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)

Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
by: Ruggiero, Giuseppe, et al.
Published: (2025)

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
by: Kong, Jungil, et al.
Published: (2023)

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
by: Zhang, Hanlin, et al.
Published: (2026)

Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations
by: Lepage, Theo, et al.
Published: (2024)

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)

The DKU System for Multi-Speaker Automatic Speech Recognition in MLC-SLM Challenge
by: Lin, Yuke, et al.
Published: (2025)

Learning Emotion-Invariant Speaker Representations for Speaker Verification
by: Tian, Jingguang, et al.
Published: (2025)

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
by: Lin, Zhennan, et al.
Published: (2026)