Enregistré dans:
Détails bibliographiques
Auteurs principaux: Arasteh, Soroosh Tayebi, Afza, Saba, Nguyen, Tri-Thien, Buess, Lukas, Parvin, Maryam, Arias-Vergara, Tomas, Perez-Toro, Paula Andrea, Hung, Hiu Ching, Lotfinia, Mahshad, Gorges, Thomas, Noeth, Elmar, Schuster, Maria, Yang, Seung Hee, Maier, Andreas
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2505.00409
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909050385465344
author Arasteh, Soroosh Tayebi
Afza, Saba
Nguyen, Tri-Thien
Buess, Lukas
Parvin, Maryam
Arias-Vergara, Tomas
Perez-Toro, Paula Andrea
Hung, Hiu Ching
Lotfinia, Mahshad
Gorges, Thomas
Noeth, Elmar
Schuster, Maria
Yang, Seung Hee
Maier, Andreas
author_facet Arasteh, Soroosh Tayebi
Afza, Saba
Nguyen, Tri-Thien
Buess, Lukas
Parvin, Maryam
Arias-Vergara, Tomas
Perez-Toro, Paula Andrea
Hung, Hiu Ching
Lotfinia, Mahshad
Gorges, Thomas
Noeth, Elmar
Schuster, Maria
Yang, Seung Hee
Maier, Andreas
contents Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
format Preprint
id arxiv_https___arxiv_org_abs_2505_00409
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Perceptual implications of automatic anonymization in pathological speech
Arasteh, Soroosh Tayebi
Afza, Saba
Nguyen, Tri-Thien
Buess, Lukas
Parvin, Maryam
Arias-Vergara, Tomas
Perez-Toro, Paula Andrea
Hung, Hiu Ching
Lotfinia, Mahshad
Gorges, Thomas
Noeth, Elmar
Schuster, Maria
Yang, Seung Hee
Maier, Andreas
Audio and Speech Processing
Artificial Intelligence
Machine Learning
Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
title Perceptual implications of automatic anonymization in pathological speech
topic Audio and Speech Processing
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2505.00409