:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bhardwaj, Saurabh, Srivastava, Smriti, Bhandari, Abhishek, Gupta, Krit, Bahl, Hitesh, Gupta, J. R. P.
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2512.18902
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025)

Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
by: Goel, Arnav, et al.
Published: (2024)

Wavelet-Based Time-Frequency Fingerprinting for Feature Extraction of Traditional Irish Music
by: Shore, Noah
Published: (2025)

SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)

An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)

Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)

Training-Free Multi-Step Inference for Target Speaker Extraction
by: You, Zhenghai, et al.
Published: (2026)

Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)

USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)

Joint Learning Global-Local Speaker Classification to Enhance End-to-End Speaker Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2026)

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)

Self-Tuning Spectral Clustering for Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)

Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
by: Jafarzadeh, Pourya, et al.
Published: (2024)

Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)

Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction
by: Liu, Yun, et al.
Published: (2026)

Spoofing-Aware Speaker Verification via Wavelet Prompt Tuning and Multi-Model Ensembles
by: Farhadipour, Aref, et al.
Published: (2026)

Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)

Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions
by: Lin, Wan, et al.
Published: (2024)

Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)

Binaural Target Speaker Extraction using Individualized HRTF
by: Ellinson, Yoav, et al.
Published: (2025)

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)

U3-xi: Pushing the Boundaries of Speaker Recognition by Incorporating Uncertainty
by: Li, Junjie, et al.
Published: (2026)

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)

Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)

MK-SGC-SC: Multiple Kernel Guided Sparse Graph Construction in Spectral Clustering for Unsupervised Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2026)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)

THAI Speech Emotion Recognition (THAI-SER) corpus
by: Wongpithayadisai, Jilamika, et al.
Published: (2025)

Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion
by: Li, Xiao, et al.
Published: (2025)

Fitting Auditory Filterbanks with Multiresolution Neural Networks
by: Lostanlen, Vincent, et al.
Published: (2023)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)

On the application of Visibility Graphs in the Spectral Domain for Speaker Recognition
by: Bocaccio, Hernan, et al.
Published: (2025)