:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Jingru, Ge, Meng, Ao, Junyi, Deng, Liqun, Li, Haizhou
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.02826
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)

Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
by: Sun, Zhaokai, et al.
Published: (2025)

Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
by: Guo, Yinlin, et al.
Published: (2023)

WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024)

Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
by: Ruggiero, Giuseppe, et al.
Published: (2025)

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
by: Ma, Duo, et al.
Published: (2024)

PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement
by: Rong, Xiaobin, et al.
Published: (2025)

Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection
by: Stourbe, Theophile, et al.
Published: (2024)

Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
by: Nakazawa, Kazushi
Published: (2026)

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
by: Zhang, Qishan, et al.
Published: (2024)

Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)

Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing
by: Peng, Junyi, et al.
Published: (2025)

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
by: Wu, Wenxuan, et al.
Published: (2024)

Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)

Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
by: Zhu, Xiaoxu, et al.
Published: (2025)

Target Speech Extraction with Pre-trained Self-supervised Learning Models
by: Peng, Junyi, et al.
Published: (2024)

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)

MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)

AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)

Interpolating Speaker Identities in Embedding Space for Data Expansion
by: Liu, Tianchi, et al.
Published: (2025)

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models
by: Whetten, Ryan, et al.
Published: (2026)

Speaker-Conditioned Phrase Break Prediction for Text-to-Speech with Phoneme-Level Pre-trained Language Model
by: Yang, Dong, et al.
Published: (2025)

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)

Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)

Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)

Leveraging Language Information for Target Language Extraction
by: Yıldırım, Mehmet Sinan, et al.
Published: (2025)

Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)

Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
by: Lin, Yuke, et al.
Published: (2025)

Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features
by: van Rensburg, Kyle Janse, et al.
Published: (2026)

Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders
by: Lam, Phat, et al.
Published: (2024)

A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
by: Wang, Shuai, et al.
Published: (2024)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
by: Tao, Ruijie, et al.
Published: (2024)

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)