Enregistré dans:
| Auteurs principaux: | , , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2505.00409 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866909050385465344 |
|---|---|
| author | Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas |
| author_facet | Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas |
| contents | Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_00409 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Perceptual implications of automatic anonymization in pathological speech Arasteh, Soroosh Tayebi Afza, Saba Nguyen, Tri-Thien Buess, Lukas Parvin, Maryam Arias-Vergara, Tomas Perez-Toro, Paula Andrea Hung, Hiu Ching Lotfinia, Mahshad Gorges, Thomas Noeth, Elmar Schuster, Maria Yang, Seung Hee Maier, Andreas Audio and Speech Processing Artificial Intelligence Machine Learning Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use. |
| title | Perceptual implications of automatic anonymization in pathological speech |
| topic | Audio and Speech Processing Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2505.00409 |