:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Bartolo, Matthias
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.06804
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems
by: Melhem, Rawad, et al.
Published: (2024)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan
by: Hossein, Zeinali, et al.
Published: (2024)

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)

Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)

The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
by: Salameh, Raghad, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
by: Wang, Rui, et al.
Published: (2024)

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
by: Sheng, Zhengyan, et al.
Published: (2025)

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
by: Nayeem, Md., et al.
Published: (2025)

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

Who is Authentic Speaker
by: Huang, Qiang
Published: (2024)

Music Genre Classification: A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches
by: Prajuli, Sachin, et al.
Published: (2026)

NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention
by: De Silva, Dashanka, et al.
Published: (2024)

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)

Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study
by: Ahmed, Alabi, et al.
Published: (2026)

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network
by: Shahan, Irfan Nafiz, et al.
Published: (2024)

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)

ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
by: Wang, Kexue, et al.
Published: (2026)

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)

Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
by: Xu, Shitong, et al.
Published: (2025)

Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
by: Han, Wooseok, et al.
Published: (2024)

Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
by: Feng, Tiantian, et al.
Published: (2024)

The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024
by: Molavi, Mohammadreza, et al.
Published: (2024)

MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers
by: Park, Kyeongman, et al.
Published: (2025)

Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
by: Kim, Taesoo, et al.
Published: (2025)

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
by: Pan, Zexu, et al.
Published: (2025)