Saved in:
| Main Authors: | Shahan, Irfan Nafiz, Auvi, Pulok Ahmed |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.15082 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study
by: Ahmed, Alabi, et al.
Published: (2026)
by: Ahmed, Alabi, et al.
Published: (2026)
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
by: Paim, Kayuã Oleques, et al.
Published: (2023)
by: Paim, Kayuã Oleques, et al.
Published: (2023)
Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
by: Bartolo, Matthias
Published: (2024)
by: Bartolo, Matthias
Published: (2024)
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)
by: Zhang, Fengrun, et al.
Published: (2024)
Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers
by: Salameh, Raghad, et al.
Published: (2024)
by: Salameh, Raghad, et al.
Published: (2024)
Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
by: Haque, Kazi Nazmul, et al.
Published: (2024)
by: Haque, Kazi Nazmul, et al.
Published: (2024)
Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems
by: Melhem, Rawad, et al.
Published: (2024)
by: Melhem, Rawad, et al.
Published: (2024)
Planing It by Ear: Convolutional Neural Networks for Acoustic Anomaly Detection in Industrial Wood Planers
by: Deschênes, Anthony, et al.
Published: (2025)
by: Deschênes, Anthony, et al.
Published: (2025)
Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)
by: Singh, Arshdeep, et al.
Published: (2025)
Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks
by: Zhao, Xufang, et al.
Published: (2025)
by: Zhao, Xufang, et al.
Published: (2025)
Audio-to-Image Encoding for Improved Voice Characteristic Detection Using Deep Convolutional Neural Networks
by: Atif, Youness
Published: (2025)
by: Atif, Youness
Published: (2025)
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)
by: Palzer, David, et al.
Published: (2025)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
by: Ma, Yi, et al.
Published: (2025)
by: Ma, Yi, et al.
Published: (2025)
ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
by: Wang, Kexue, et al.
Published: (2026)
by: Wang, Kexue, et al.
Published: (2026)
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)
by: Emon, Jakaria Islam, et al.
Published: (2025)
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
by: Hwang, Injune, et al.
Published: (2024)
by: Hwang, Injune, et al.
Published: (2024)
Comprehensive Evaluation of CNN-Based Audio Tagging Models on Resource-Constrained Devices
by: Grau-Haro, Jordi, et al.
Published: (2025)
by: Grau-Haro, Jordi, et al.
Published: (2025)
A Multi-task Learning Balanced Attention Convolutional Neural Network Model for Few-shot Underwater Acoustic Target Recognition
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
Explainable Attribute-Based Speaker Verification
by: Wu, Xiaoliang, et al.
Published: (2024)
by: Wu, Xiaoliang, et al.
Published: (2024)
From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)
by: Landini, Federico
Published: (2024)
Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)
by: Korzh, Dmitrii, et al.
Published: (2024)
Pretraining Multi-Speaker Identification for Neural Speaker Diarization
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation
by: Ratnarajah, Anton, et al.
Published: (2026)
by: Ratnarajah, Anton, et al.
Published: (2026)
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)
by: Pacheco, Eduardo, et al.
Published: (2025)
The VoxCeleb Speaker Recognition Challenge: A Retrospective
by: Huh, Jaesung, et al.
Published: (2024)
by: Huh, Jaesung, et al.
Published: (2024)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Unispeaker: A Unified Approach for Multimodality-driven Speaker Generation
by: Sheng, Zhengyan, et al.
Published: (2025)
by: Sheng, Zhengyan, et al.
Published: (2025)
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding
by: Wang, Rui, et al.
Published: (2024)
by: Wang, Rui, et al.
Published: (2024)
Evaluating Speaker Identity Coding in Self-supervised Models and Humans
by: Elbanna, Gasser
Published: (2024)
by: Elbanna, Gasser
Published: (2024)
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset
by: Fahad, Istiaq Ahmed, et al.
Published: (2025)
by: Fahad, Istiaq Ahmed, et al.
Published: (2025)
Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection
by: Lin, Chu-Hsuan Abraham, et al.
Published: (2024)
by: Lin, Chu-Hsuan Abraham, et al.
Published: (2024)
Similar Items
-
Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study
by: Ahmed, Alabi, et al.
Published: (2026) -
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024) -
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
by: Paim, Kayuã Oleques, et al.
Published: (2023) -
Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
by: Bartolo, Matthias
Published: (2024) -
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
by: Zhang, Fengrun, et al.
Published: (2024)