Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Estevez, Mariel, Bonomi, Cyntia, Ribas, Dayana, Ortega, Alfonso, Ferrer, Luciana
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2504.08997
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913789577789440
author	Estevez, Mariel Bonomi, Cyntia Ribas, Dayana Ortega, Alfonso Ferrer, Luciana
author_facet	Estevez, Mariel Bonomi, Cyntia Ribas, Dayana Ortega, Alfonso Ferrer, Luciana
contents	We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demographic groups, particularly focusing on gender and age-based cohorts. Performance evaluation was based on multiple metrics, including normalised costs and cross-entropy. We employed calibration techniques trained separately on predefined demographic groups to address group-dependent miscalibration. Analysis revealed significant performance disparities across groups despite strong global metrics. The system showed systematic biases, misclassifying healthy speakers over 55 as having a voice disorder and speakers with disorders aged 14-30 as healthy. Group-specific calibration improved posterior probability quality, reducing overconfidence. For young disordered speakers, low severity scores were identified as contributing to poor system performance. For older speakers, age-related voice characteristics and potential limitations in the pretrained Hubert model used as feature extractor likely affected results. The study demonstrates that global performance metrics are insufficient for evaluating AVDD system performance. Group-specific analysis may unmask problems in system performance which are hidden within global metrics. Further, group-dependent calibration strategies help mitigate biases, resulting in a more reliable indication of system confidence. These findings emphasize the need for demographic-specific evaluation and calibration in voice disorder detection systems, while providing a methodological framework applicable to broader biomedical classification tasks where demographic metadata is available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_08997
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems Estevez, Mariel Bonomi, Cyntia Ribas, Dayana Ortega, Alfonso Ferrer, Luciana Audio and Speech Processing We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demographic groups, particularly focusing on gender and age-based cohorts. Performance evaluation was based on multiple metrics, including normalised costs and cross-entropy. We employed calibration techniques trained separately on predefined demographic groups to address group-dependent miscalibration. Analysis revealed significant performance disparities across groups despite strong global metrics. The system showed systematic biases, misclassifying healthy speakers over 55 as having a voice disorder and speakers with disorders aged 14-30 as healthy. Group-specific calibration improved posterior probability quality, reducing overconfidence. For young disordered speakers, low severity scores were identified as contributing to poor system performance. For older speakers, age-related voice characteristics and potential limitations in the pretrained Hubert model used as feature extractor likely affected results. The study demonstrates that global performance metrics are insufficient for evaluating AVDD system performance. Group-specific analysis may unmask problems in system performance which are hidden within global metrics. Further, group-dependent calibration strategies help mitigate biases, resulting in a more reliable indication of system confidence. These findings emphasize the need for demographic-specific evaluation and calibration in voice disorder detection systems, while providing a methodological framework applicable to broader biomedical classification tasks where demographic metadata is available.
title	Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2504.08997

Similar Items