Saved in:
| Main Authors: | Helwani, Karim, Do, Hoang, Luan, James, Srinivasan, Sriram |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13379 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sound Source Separation Using Latent Variational Block-Wise Disentanglement
by: Helwani, Karim, et al.
Published: (2024)
by: Helwani, Karim, et al.
Published: (2024)
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)
by: Dutta, Soumya, et al.
Published: (2024)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
by: Gruttadauria, Elio, et al.
Published: (2025)
by: Gruttadauria, Elio, et al.
Published: (2025)
Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
by: Bourdin, Yann, et al.
Published: (2025)
by: Bourdin, Yann, et al.
Published: (2025)
A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model
by: Hu, Xiaolin, et al.
Published: (2026)
by: Hu, Xiaolin, et al.
Published: (2026)
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)
by: Wang, Quan, et al.
Published: (2022)
StreamVC: Real-Time Low-Latency Voice Conversion
by: Yang, Yang, et al.
Published: (2024)
by: Yang, Yang, et al.
Published: (2024)
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
by: Kumar, Anurag, et al.
Published: (2025)
by: Kumar, Anurag, et al.
Published: (2025)
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)
by: Liao, Yen-Lun, et al.
Published: (2022)
Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)
by: Han, Qiushi, et al.
Published: (2025)
Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)
by: Chazaridis, Nikos, et al.
Published: (2026)
Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)
by: Dao, Anh-Tuan, et al.
Published: (2026)
Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)
by: Fortier, Alexandrine, et al.
Published: (2025)
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
by: Morrone, Giovanni, et al.
Published: (2023)
by: Morrone, Giovanni, et al.
Published: (2023)
End-to-end Piano Performance-MIDI to Score Conversion with Transformers
by: Beyer, Tim, et al.
Published: (2024)
by: Beyer, Tim, et al.
Published: (2024)
FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
by: Jobayer, Md, et al.
Published: (2024)
by: Jobayer, Md, et al.
Published: (2024)
End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers
by: Bartoli, Pietro, et al.
Published: (2025)
by: Bartoli, Pietro, et al.
Published: (2025)
Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)
by: India, Miquel, et al.
Published: (2025)
HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
by: Shashaank, N, et al.
Published: (2023)
by: Shashaank, N, et al.
Published: (2023)
HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)
by: Chang, Heng-Jui, et al.
Published: (2024)
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
by: Ok, Hyunjong, et al.
Published: (2025)
by: Ok, Hyunjong, et al.
Published: (2025)
BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings
by: Charlot, Théo, et al.
Published: (2025)
by: Charlot, Théo, et al.
Published: (2025)
Text-Dependent Speaker Verification (TdSV) Challenge 2024: Team Naive System Report
by: Rostami, Amir Mohammad, et al.
Published: (2026)
by: Rostami, Amir Mohammad, et al.
Published: (2026)
An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)
by: Seo, Geonwoo
Published: (2025)
SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
by: Nam, KiHyun, et al.
Published: (2026)
by: Nam, KiHyun, et al.
Published: (2026)
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)
by: Wang, Quan, et al.
Published: (2024)
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)
by: Tang, Beilong, et al.
Published: (2024)
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
by: Biju, Emil, et al.
Published: (2024)
by: Biju, Emil, et al.
Published: (2024)
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
by: Gao, Chenyang, et al.
Published: (2024)
by: Gao, Chenyang, et al.
Published: (2024)
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
by: Sang, Mufan, et al.
Published: (2024)
by: Sang, Mufan, et al.
Published: (2024)
Whispy: Adapting STT Whisper Models to Real-Time Environments
by: Bevilacqua, Antonio, et al.
Published: (2024)
by: Bevilacqua, Antonio, et al.
Published: (2024)
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
by: Yin, Chun, et al.
Published: (2024)
by: Yin, Chun, et al.
Published: (2024)
Self-Supervised Learning for Speaker Recognition: A study and review
by: Lepage, Theo, et al.
Published: (2026)
by: Lepage, Theo, et al.
Published: (2026)
Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Investigating Confidence Estimation Measures for Speaker Diarization
by: Chowdhury, Anurag, et al.
Published: (2024)
by: Chowdhury, Anurag, et al.
Published: (2024)
Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)
by: Khan, Ali Sartaz, et al.
Published: (2025)
Cosine Scoring with Uncertainty for Neural Speaker Embedding
by: Wang, Qiongqiong, et al.
Published: (2024)
by: Wang, Qiongqiong, et al.
Published: (2024)
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)
by: Agrawal, Saurabh, et al.
Published: (2025)
Similar Items
-
Sound Source Separation Using Latent Variational Block-Wise Disentanglement
by: Helwani, Karim, et al.
Published: (2024) -
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024) -
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024) -
O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
by: Gruttadauria, Elio, et al.
Published: (2025) -
Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
by: Bourdin, Yann, et al.
Published: (2025)