Saved in:
Bibliographic Details
Main Author: Bartolo, Matthias
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.06804
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929560956698624
author Bartolo, Matthias
author_facet Bartolo, Matthias
contents In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
format Preprint
id arxiv_https___arxiv_org_abs_2408_06804
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
Bartolo, Matthias
Sound
Artificial Intelligence
Audio and Speech Processing
In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
title Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
topic Sound
Artificial Intelligence
Audio and Speech Processing
url https://arxiv.org/abs/2408.06804