Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Bartolo, Matthias
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.06804
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929560956698624
author	Bartolo, Matthias
author_facet	Bartolo, Matthias
contents	In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_06804
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation Bartolo, Matthias Sound Artificial Intelligence Audio and Speech Processing In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.
title	Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation
topic	Sound Artificial Intelligence Audio and Speech Processing
url	https://arxiv.org/abs/2408.06804

Similar Items