Saved in:
| Main Authors: | Wittek, Nils, Melnikov, Anton, Kaiser, Bert, Zimmermann, André |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.17708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)
by: Pan, Zexu, et al.
Published: (2025)
Utilizing Speaker Profiles for Impersonation Audio Detection
by: Gu, Hao, et al.
Published: (2024)
by: Gu, Hao, et al.
Published: (2024)
Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025)
by: Ren, Yiming, et al.
Published: (2025)
Speaker Distance Estimation in Enclosures from Single-Channel Audio
by: Neri, Michael, et al.
Published: (2024)
by: Neri, Michael, et al.
Published: (2024)
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
HearFit+: Personalized Fitness Monitoring via Audio Signals on Smart Speakers
by: Xie, Yadong, et al.
Published: (2025)
by: Xie, Yadong, et al.
Published: (2025)
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)
by: Li, Guinan, et al.
Published: (2024)
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
Robust Audio-Visual Target Speaker Extraction with Emotion-Aware Multiple Enrollment Fusion
by: Jin, Zhan, et al.
Published: (2025)
by: Jin, Zhan, et al.
Published: (2025)
Descriptor:: Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)
by: Vijaykumar, Rahul, et al.
Published: (2025)
by: Vijaykumar, Rahul, et al.
Published: (2025)
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
by: Yang, Xiaoyu, et al.
Published: (2024)
by: Yang, Xiaoyu, et al.
Published: (2024)
HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
by: Zhao, Jinzheng, et al.
Published: (2023)
by: Zhao, Jinzheng, et al.
Published: (2023)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
by: Li, Zixuan, et al.
Published: (2025)
by: Li, Zixuan, et al.
Published: (2025)
AVFSNet: Audio-Visual Speech Separation for Flexible Number of Speakers with Multi-Scale and Multi-Task Learning
by: Zhang, Daning, et al.
Published: (2025)
by: Zhang, Daning, et al.
Published: (2025)
Vibration Sensitivity of one-port and two-port MEMS microphones
by: Doyon-D'Amour, Francis, et al.
Published: (2024)
by: Doyon-D'Amour, Francis, et al.
Published: (2024)
Interfacing PDM MEMS microphones with PFM spiking systems: Application for Neuromorphic Auditory Sensors
by: Jimenez-Fernandez, Angel, et al.
Published: (2019)
by: Jimenez-Fernandez, Angel, et al.
Published: (2019)
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
by: He, Mao-Kui, et al.
Published: (2024)
by: He, Mao-Kui, et al.
Published: (2024)
Speaker Contrastive Learning for Source Speaker Tracing
by: Wang, Qing, et al.
Published: (2024)
by: Wang, Qing, et al.
Published: (2024)
BANC: Towards Efficient Binaural Audio Neural Codec for Overlapping Speech
by: Ratnarajah, Anton, et al.
Published: (2023)
by: Ratnarajah, Anton, et al.
Published: (2023)
Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)
by: Lin, Chaohao, et al.
Published: (2025)
Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Learning Emotion-Invariant Speaker Representations for Speaker Verification
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
An Investigation on Speaker Augmentation for End-to-End Speaker Extraction
by: You, Zhenghai, et al.
Published: (2025)
by: You, Zhenghai, et al.
Published: (2025)
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
Plug-and-Steer: Decoupling Separation and Selection in Audio-Visual Target Speaker Extraction
by: Kwak, Doyeop, et al.
Published: (2026)
by: Kwak, Doyeop, et al.
Published: (2026)
Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
Mitigating Non-Target Speaker Bias in Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Implementation and Applications of WakeWords Integrated with Speaker Recognition: A Case Study
by: Filho, Alexandre Costa Ferro, et al.
Published: (2024)
by: Filho, Alexandre Costa Ferro, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
by: Zeng, Bang, et al.
Published: (2024)
by: Zeng, Bang, et al.
Published: (2024)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)
by: Park, Nohil, et al.
Published: (2024)
On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
Guided Speaker Embedding
by: Horiguchi, Shota, et al.
Published: (2024)
by: Horiguchi, Shota, et al.
Published: (2024)
Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Similar Items
-
Online Audio-Visual Autoregressive Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025) -
Utilizing Speaker Profiles for Impersonation Audio Detection
by: Gu, Hao, et al.
Published: (2024) -
Can Audio Large Language Models Verify Speaker Identity?
by: Ren, Yiming, et al.
Published: (2025) -
Speaker Distance Estimation in Enclosures from Single-Channel Audio
by: Neri, Michael, et al.
Published: (2024) -
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
by: Tao, Ruijie, et al.
Published: (2024)