Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Koenecke, Allison, Choi, Anna Seo Gyeong, Mei, Katelyn X., Schellmann, Hilke, Sloane, Mona
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Computers and Society
Online Access:	https://arxiv.org/abs/2402.08021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914782532075520
author	Koenecke, Allison Choi, Anna Seo Gyeong Mei, Katelyn X. Schellmann, Hilke Sloane, Mona
author_facet	Koenecke, Allison Choi, Anna Seo Gyeong Mei, Katelyn X. Schellmann, Hilke Sloane, Mona
contents	Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_08021
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Careless Whisper: Speech-to-Text Hallucination Harms Koenecke, Allison Choi, Anna Seo Gyeong Mei, Katelyn X. Schellmann, Hilke Sloane, Mona Computation and Language Computers and Society Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.
title	Careless Whisper: Speech-to-Text Hallucination Harms
topic	Computation and Language Computers and Society
url	https://arxiv.org/abs/2402.08021

Similar Items