Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Arasteh, Soroosh Tayebi, Afza, Saba, Nguyen, Tri-Thien, Buess, Lukas, Parvin, Maryam, Arias-Vergara, Tomas, Perez-Toro, Paula Andrea, Hung, Hiu Ching, Lotfinia, Mahshad, Gorges, Thomas, Noeth, Elmar, Schuster, Maria, Yang, Seung Hee, Maier, Andreas
Format:	Preprint
Publié:	2025
Sujets:	Audio and Speech Processing Artificial Intelligence Machine Learning
Accès en ligne:	https://arxiv.org/abs/2505.00409
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909050385465344
author	Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas
author_facet	Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas
contents	Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_00409
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Perceptual implications of automatic anonymization in pathological speech Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas Audio and Speech Processing Artificial Intelligence Machine Learning Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
title	Perceptual implications of automatic anonymization in pathological speech
topic	Audio and Speech Processing Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2505.00409

Documents similaires