Saved in:
| Main Authors: | You, Zhenghai, Shi, Ying, Li, Lantian, Wang, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.10921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
by: Wang, Tianhao, et al.
Published: (2024)
by: Wang, Tianhao, et al.
Published: (2024)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech
by: Yuan, Junming, et al.
Published: (2025)
by: Yuan, Junming, et al.
Published: (2025)
Serialized Output Training by Learned Dominance
by: Shi, Ying, et al.
Published: (2024)
by: Shi, Ying, et al.
Published: (2024)
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)
by: Zeng, Bang, et al.
Published: (2024)
Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction
by: Liu, Yun, et al.
Published: (2026)
by: Liu, Yun, et al.
Published: (2026)
Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
by: Zeng, Bang, et al.
Published: (2025)
by: Zeng, Bang, et al.
Published: (2025)
AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026)
by: Li, Duojia, et al.
Published: (2026)
Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions
by: Lin, Wan, et al.
Published: (2024)
by: Lin, Wan, et al.
Published: (2024)
Unmixing the Crowd: Learning Mixture-to-Set Speaker Embeddings for Enrollment-Free Target Speech Extraction
by: Sidharth, FNU, et al.
Published: (2026)
by: Sidharth, FNU, et al.
Published: (2026)
Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)
by: Han, Qiushi, et al.
Published: (2025)
On the effectiveness of enrollment speech augmentation for Target Speaker Extraction
by: Li, Junjie, et al.
Published: (2024)
by: Li, Junjie, et al.
Published: (2024)
Target Speaker Extraction with Curriculum Learning
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Discriminative-Generative Target Speaker Extraction with Decoder-Only Language Models
by: Zeng, Bang, et al.
Published: (2026)
by: Zeng, Bang, et al.
Published: (2026)
Listen to Extract: Onset-Prompted Target Speaker Extraction
by: Shen, Pengjie, et al.
Published: (2025)
by: Shen, Pengjie, et al.
Published: (2025)
Improved Feature Extraction Network for Neuro-Oriented Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Beyond Speaker Identity: Text Guided Target Speech Extraction
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Binaural Target Speaker Extraction using Individualized HRTF
by: Ellinson, Yoav, et al.
Published: (2025)
by: Ellinson, Yoav, et al.
Published: (2025)
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)
by: Li, Zixuan, et al.
Published: (2025)
Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2025)
by: Dai, Wang, et al.
Published: (2025)
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)
by: Ma, Hao, et al.
Published: (2025)
Binaural Selective Attention Model for Target Speaker Extraction
by: Meng, Hanyu, et al.
Published: (2024)
by: Meng, Hanyu, et al.
Published: (2024)
FlowTSE: Target Speaker Extraction with Flow Matching
by: Navon, Aviv, et al.
Published: (2025)
by: Navon, Aviv, et al.
Published: (2025)
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)
by: Tang, Beilong, et al.
Published: (2024)
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model
by: Peng, Shuhai, et al.
Published: (2026)
by: Peng, Shuhai, et al.
Published: (2026)
HRTF-guided Binaural Target Speaker Extraction with Real-World Validation
by: Ellinson, Yoav, et al.
Published: (2026)
by: Ellinson, Yoav, et al.
Published: (2026)
USED: Universal Speaker Extraction and Diarization
by: Ao, Junyi, et al.
Published: (2023)
by: Ao, Junyi, et al.
Published: (2023)
Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
by: Jin, Zhan, et al.
Published: (2025)
by: Jin, Zhan, et al.
Published: (2025)
How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)
by: Li, Pengqi, et al.
Published: (2024)
Training-Free Multi-Step Audio Source Separation
by: Zang, Yongyi, et al.
Published: (2025)
by: Zang, Yongyi, et al.
Published: (2025)
Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)
by: Fortier, Alexandrine, et al.
Published: (2025)
Similar Items
-
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025) -
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024) -
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
by: Wang, Tianhao, et al.
Published: (2024) -
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024) -
MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech
by: Yuan, Junming, et al.
Published: (2025)