Saved in:
| Main Authors: | Bhardwaj, Saurabh, Srivastava, Smriti, Bhandari, Abhishek, Gupta, Krit, Bahl, Hitesh, Gupta, J. R. P. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.18902 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025)
by: Nfissi, Alaa, et al.
Published: (2025)
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
by: Goel, Arnav, et al.
Published: (2024)
by: Goel, Arnav, et al.
Published: (2024)
Wavelet-Based Time-Frequency Fingerprinting for Feature Extraction of Traditional Irish Music
by: Shore, Noah
Published: (2025)
by: Shore, Noah
Published: (2025)
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)
by: Agrawal, Saurabh, et al.
Published: (2025)
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)
by: Han, Qiushi, et al.
Published: (2025)
Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)
by: Li, Guinan, et al.
Published: (2024)
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)
by: Zeng, Bang, et al.
Published: (2024)
Training-Free Multi-Step Inference for Target Speaker Extraction
by: You, Zhenghai, et al.
Published: (2026)
by: You, Zhenghai, et al.
Published: (2026)
Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Joint Learning Global-Local Speaker Classification to Enhance End-to-End Speaker Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2026)
by: Dai, Yuhang, et al.
Published: (2026)
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)
by: Aldeneh, Zakaria, et al.
Published: (2024)
Self-Tuning Spectral Clustering for Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)
by: Raghav, Nikhil, et al.
Published: (2024)
Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
by: Jafarzadeh, Pourya, et al.
Published: (2024)
by: Jafarzadeh, Pourya, et al.
Published: (2024)
Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)
by: Zeng, Bang, et al.
Published: (2025)
Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction
by: Liu, Yun, et al.
Published: (2026)
by: Liu, Yun, et al.
Published: (2026)
Spoofing-Aware Speaker Verification via Wavelet Prompt Tuning and Multi-Model Ensembles
by: Farhadipour, Aref, et al.
Published: (2026)
by: Farhadipour, Aref, et al.
Published: (2026)
Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)
by: Matiyali, Neeraj, et al.
Published: (2025)
Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions
by: Lin, Wan, et al.
Published: (2024)
by: Lin, Wan, et al.
Published: (2024)
Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)
by: Shen, Pengjie, et al.
Published: (2025)
Binaural Target Speaker Extraction using Individualized HRTF
by: Ellinson, Yoav, et al.
Published: (2025)
by: Ellinson, Yoav, et al.
Published: (2025)
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
U3-xi: Pushing the Boundaries of Speaker Recognition by Incorporating Uncertainty
by: Li, Junjie, et al.
Published: (2026)
by: Li, Junjie, et al.
Published: (2026)
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)
by: Yin, Han, et al.
Published: (2025)
Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)
by: Fortier, Alexandrine, et al.
Published: (2025)
MK-SGC-SC: Multiple Kernel Guided Sparse Graph Construction in Spectral Clustering for Unsupervised Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2026)
by: Raghav, Nikhil, et al.
Published: (2026)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)
Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
Fitting Auditory Filterbanks with Multiresolution Neural Networks
by: Lostanlen, Vincent, et al.
Published: (2023)
by: Lostanlen, Vincent, et al.
Published: (2023)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
On the application of Visibility Graphs in the Spectral Domain for Speaker Recognition
by: Bocaccio, Hernan, et al.
Published: (2025)
by: Bocaccio, Hernan, et al.
Published: (2025)
Similar Items
-
Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025) -
SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025) -
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
by: Goel, Arnav, et al.
Published: (2024) -
Wavelet-Based Time-Frequency Fingerprinting for Feature Extraction of Traditional Irish Music
by: Shore, Noah
Published: (2025) -
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)