Saved in:
Bibliographic Details
Main Authors: Alderete, John, Hui, Macarious Kin Fung, Mohan, Aanchan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.13060
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909740933578752
author Alderete, John
Hui, Macarious Kin Fung
Mohan, Aanchan
author_facet Alderete, John
Hui, Macarious Kin Fung
Mohan, Aanchan
contents The Simon Fraser University Speech Error Database (SFUSED) is a public data collection developed for linguistic and psycholinguistic research. Here we demonstrate how its design and annotations can be used to test and evaluate speech recognition models. The database comprises systematically annotated speech errors from spontaneous English speech, with each error tagged for intended and actual error productions. The annotation schema incorporates multiple classificatory dimensions that are of some value to model assessment, including linguistic hierarchical level, contextual sensitivity, degraded words, word corrections, and both word-level and syllable-level error positioning. To assess the value of these classificatory variables, we evaluated the transcription accuracy of WhisperX across 5,300 documented word and phonological errors. This analysis demonstrates the atabase's effectiveness as a diagnostic tool for ASR system performance.
format Preprint
id arxiv_https___arxiv_org_abs_2508_13060
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Evaluating ASR robustness to spontaneous speech errors: A study of WhisperX using a Speech Error Database
Alderete, John
Hui, Macarious Kin Fung
Mohan, Aanchan
Computation and Language
The Simon Fraser University Speech Error Database (SFUSED) is a public data collection developed for linguistic and psycholinguistic research. Here we demonstrate how its design and annotations can be used to test and evaluate speech recognition models. The database comprises systematically annotated speech errors from spontaneous English speech, with each error tagged for intended and actual error productions. The annotation schema incorporates multiple classificatory dimensions that are of some value to model assessment, including linguistic hierarchical level, contextual sensitivity, degraded words, word corrections, and both word-level and syllable-level error positioning. To assess the value of these classificatory variables, we evaluated the transcription accuracy of WhisperX across 5,300 documented word and phonological errors. This analysis demonstrates the atabase's effectiveness as a diagnostic tool for ASR system performance.
title Evaluating ASR robustness to spontaneous speech errors: A study of WhisperX using a Speech Error Database
topic Computation and Language
url https://arxiv.org/abs/2508.13060