Saved in:
| Main Authors: | Lin, Jingru, Ge, Meng, Ao, Junyi, Deng, Liqun, Li, Haizhou |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.02826 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024)
by: Diatlova, Daria, et al.
Published: (2024)
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
by: Sun, Zhaokai, et al.
Published: (2025)
by: Sun, Zhaokai, et al.
Published: (2025)
Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
by: Guo, Yinlin, et al.
Published: (2023)
by: Guo, Yinlin, et al.
Published: (2023)
WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024)
by: Combei, David, et al.
Published: (2024)
Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
by: Ruggiero, Giuseppe, et al.
Published: (2025)
by: Ruggiero, Giuseppe, et al.
Published: (2025)
Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks
by: Ma, Duo, et al.
Published: (2024)
by: Ma, Duo, et al.
Published: (2024)
PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement
by: Rong, Xiaobin, et al.
Published: (2025)
by: Rong, Xiaobin, et al.
Published: (2025)
Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection
by: Stourbe, Theophile, et al.
Published: (2024)
by: Stourbe, Theophile, et al.
Published: (2024)
Frame-Aligned Fusion of Canary and WavLM for Non-Intrusive Intelligibility Prediction of Hearing-Aid-Processed Speech
by: Nakazawa, Kazushi
Published: (2026)
by: Nakazawa, Kazushi
Published: (2026)
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)
by: Zhang, Fan, et al.
Published: (2023)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
by: Zhang, Qishan, et al.
Published: (2024)
by: Zhang, Qishan, et al.
Published: (2024)
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)
by: Wang, Wupeng, et al.
Published: (2025)
Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing
by: Peng, Junyi, et al.
Published: (2025)
by: Peng, Junyi, et al.
Published: (2025)
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
by: Wu, Wenxuan, et al.
Published: (2024)
by: Wu, Wenxuan, et al.
Published: (2024)
Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)
by: Jiang, Yidi, et al.
Published: (2023)
Speaker Disentanglement of Speech Pre-trained Model Based on Interpretability
by: Zhu, Xiaoxu, et al.
Published: (2025)
by: Zhu, Xiaoxu, et al.
Published: (2025)
Target Speech Extraction with Pre-trained Self-supervised Learning Models
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)
by: Liu, Alexander H., et al.
Published: (2025)
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
MOPSA: Mixture of Prompt-Experts Based Speaker Adaptation for Elderly Speech Recognition
by: Deng, Chengxi, et al.
Published: (2025)
by: Deng, Chengxi, et al.
Published: (2025)
AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)
by: Lin, Jingru, et al.
Published: (2026)
Interpolating Speaker Identities in Embedding Space for Data Expansion
by: Liu, Tianchi, et al.
Published: (2025)
by: Liu, Tianchi, et al.
Published: (2025)
A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models
by: Whetten, Ryan, et al.
Published: (2026)
by: Whetten, Ryan, et al.
Published: (2026)
Speaker-Conditioned Phrase Break Prediction for Text-to-Speech with Phoneme-Level Pre-trained Language Model
by: Yang, Dong, et al.
Published: (2025)
by: Yang, Dong, et al.
Published: (2025)
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis
by: Zhou, Xuehao, et al.
Published: (2024)
by: Zhou, Xuehao, et al.
Published: (2024)
Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)
by: Han, Jiangyu, et al.
Published: (2025)
Leveraging Language Information for Target Language Extraction
by: Yıldırım, Mehmet Sinan, et al.
Published: (2025)
by: Yıldırım, Mehmet Sinan, et al.
Published: (2025)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
by: Lin, Yuke, et al.
Published: (2025)
by: Lin, Yuke, et al.
Published: (2025)
Interpreting Speaker Characteristics in the Dimensions of Self-Supervised Speech Features
by: van Rensburg, Kyle Janse, et al.
Published: (2026)
by: van Rensburg, Kyle Janse, et al.
Published: (2026)
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders
by: Lam, Phat, et al.
Published: (2024)
by: Lam, Phat, et al.
Published: (2024)
A Large-Scale Probing Analysis of Speaker-Specific Attributes in Self-Supervised Speech Representations
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)
by: Chiu, Aemon Yat Fei, et al.
Published: (2025)
M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)
by: Aronowitz, Hagai, et al.
Published: (2026)
Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)
by: Aldeneh, Zakaria, et al.
Published: (2024)
Similar Items
-
Adapting WavLM for Speech Emotion Recognition
by: Diatlova, Daria, et al.
Published: (2024) -
Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM
by: Sun, Zhaokai, et al.
Published: (2025) -
Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
by: Guo, Yinlin, et al.
Published: (2023) -
WavLM model ensemble for audio deepfake detection
by: Combei, David, et al.
Published: (2024) -
Eta-WavLM: Efficient Speaker Identity Removal in Self-Supervised Speech Representations Using a Simple Linear Equation
by: Ruggiero, Giuseppe, et al.
Published: (2025)