:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Helwani, Karim, Do, Hoang, Luan, James, Srinivasan, Sriram
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Sound
Online Access:	https://arxiv.org/abs/2603.13379
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sound Source Separation Using Latent Variational Block-Wise Disentanglement
by: Helwani, Karim, et al.
Published: (2024)

Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
by: Dutta, Soumya, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
by: Gruttadauria, Elio, et al.
Published: (2025)

Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
by: Bourdin, Yann, et al.
Published: (2025)

A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model
by: Hu, Xiaolin, et al.
Published: (2026)

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
by: Wang, Quan, et al.
Published: (2022)

StreamVC: Real-Time Low-Latency Voice Conversion
by: Yang, Yang, et al.
Published: (2024)

SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
by: Kumar, Anurag, et al.
Published: (2025)

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)

Brainprint-Modulated Target Speaker Extraction
by: Han, Qiushi, et al.
Published: (2025)

Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)

Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)

Multi-Target Backdoor Attacks Against Speaker Recognition
by: Fortier, Alexandrine, et al.
Published: (2025)

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
by: Morrone, Giovanni, et al.
Published: (2023)

End-to-end Piano Performance-MIDI to Score Conversion with Transformers
by: Beyer, Tim, et al.
Published: (2024)

FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
by: Jobayer, Md, et al.
Published: (2024)

End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers
by: Bartoli, Pietro, et al.
Published: (2025)

Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)

HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones
by: Shashaank, N, et al.
Published: (2023)

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)

Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
by: Ok, Hyunjong, et al.
Published: (2025)

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings
by: Charlot, Théo, et al.
Published: (2025)

Text-Dependent Speaker Verification (TdSV) Challenge 2024: Team Naive System Report
by: Rostami, Amir Mohammad, et al.
Published: (2026)

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)

SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning
by: Nam, KiHyun, et al.
Published: (2026)

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)

TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)

AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
by: Biju, Emil, et al.
Published: (2024)

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
by: Gao, Chenyang, et al.
Published: (2024)

Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
by: Sang, Mufan, et al.
Published: (2024)

Whispy: Adapting STT Whisper Models to Real-Time Environments
by: Bevilacqua, Antonio, et al.
Published: (2024)

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
by: Yin, Chun, et al.
Published: (2024)

Self-Supervised Learning for Speaker Recognition: A study and review
by: Lepage, Theo, et al.
Published: (2026)

Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)

Investigating Confidence Estimation Measures for Speaker Diarization
by: Chowdhury, Anurag, et al.
Published: (2024)

Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)

Cosine Scoring with Uncertainty for Neural Speaker Embedding
by: Wang, Qiongqiong, et al.
Published: (2024)

SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)