Saved in:
Bibliographic Details
Main Authors: Akram, Ali, Stanojevic, Marija, Ehghaghi, Malikeh, Novikova, Jekaterina
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.01981
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913302133604352
author Akram, Ali
Stanojevic, Marija
Ehghaghi, Malikeh
Novikova, Jekaterina
author_facet Akram, Ali
Stanojevic, Marija
Ehghaghi, Malikeh
Novikova, Jekaterina
contents Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in developing more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance.
format Preprint
id arxiv_https___arxiv_org_abs_2404_01981
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
Akram, Ali
Stanojevic, Marija
Ehghaghi, Malikeh
Novikova, Jekaterina
Machine Learning
Sound
Audio and Speech Processing
Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients and identify and exclude the individuals who try to enroll multiple times in the same trial. Since clinical studies are often conducted across different countries, creating a system that can perform speaker verification in diverse languages without additional development effort is imperative. We evaluate pre-trained TitaNet, ECAPA-TDNN, and SpeakerNet models by enrolling and testing with speech-impaired patients speaking English, German, Danish, Spanish, and Arabic languages. Our results demonstrate that tested models can effectively generalize to clinical speakers, with less than 2.7% EER for European Languages and 8.26% EER for Arabic. This represents a significant step in developing more versatile and efficient speaker verification systems for cognitive and mental health clinical trials that can be used across a wide range of languages and dialects, substantially reducing the effort required to develop speaker verification systems for multiple languages. We also evaluate how speech tasks and number of speakers involved in the trial influence the performance and show that the type of speech tasks impacts the model performance.
title Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials
topic Machine Learning
Sound
Audio and Speech Processing
url https://arxiv.org/abs/2404.01981