:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shahan, Irfan Nafiz, Auvi, Pulok Ahmed
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2411.15082
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study
by: Ahmed, Alabi, et al.
Published: (2026)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
by: Paim, Kayuã Oleques, et al.
Published: (2023)

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
by: Bartolo, Matthias
Published: (2024)

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
by: Salameh, Raghad, et al.
Published: (2024)

Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
by: Haque, Kazi Nazmul, et al.
Published: (2024)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems
by: Melhem, Rawad, et al.
Published: (2024)

Planing It by Ear: Convolutional Neural Networks for Acoustic Anomaly Detection in Industrial Wood Planers
by: Deschênes, Anthony, et al.
Published: (2025)

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)

Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)

Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks
by: Zhao, Xufang, et al.
Published: (2025)

Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks
by: Atif, Youness
Published: (2025)

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)

ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
by: Wang, Kexue, et al.
Published: (2026)

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)

Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
by: Grau-Haro, Jordi, et al.
Published: (2025)

A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition
by: Huang, Wei, et al.
Published: (2025)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)

Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)

Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)

Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation
by: Ratnarajah, Anton, et al.
Published: (2026)

Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)

The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)

Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
by: Sheng, Zhengyan, et al.
Published: (2025)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
by: Wang, Rui, et al.
Published: (2024)

Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)

BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset
by: Fahad, Istiaq Ahmed, et al.
Published: (2025)

Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection
by: Lin, Chu-Hsuan Abraham, et al.
Published: (2024)