Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ron, Yonathan, Gilboa, Shiri, Dubnov, Tammuz
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.18966
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918349674381312
author	Ron, Yonathan Gilboa, Shiri Dubnov, Tammuz
author_facet	Ron, Yonathan Gilboa, Shiri Dubnov, Tammuz
contents	Domain-specific speech remains a persistent challenge for automatic speech recognition (ASR), even for state-of-the-art systems like OpenAI's Whisper. We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining. The pipeline intercepts Whisper's initial transcript, applies specialized LLM agents for domain context identification, named entity recognition, and jargon detection, and generates compact prompts that guide Whisper's decoder. Evaluated on 421 NBA basketball commentary segments (a domain characterized by dense proper nouns and technical terminology) our best pipeline achieves a statistically significant 17.0% relative reduction in word error rate (WER; from 0.217 to 0.180, p<0.001). Improvements are observed in 40.1% of segments with degradation in only 7.1%, substantially outperforming direct transcript post-editing. These results demonstrate that prompt-based augmentation can deliver scalable domain adaptation for ASR, offering a practical alternative to costly model fine-tuning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_18966
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation Ron, Yonathan Gilboa, Shiri Dubnov, Tammuz Computation and Language Domain-specific speech remains a persistent challenge for automatic speech recognition (ASR), even for state-of-the-art systems like OpenAI's Whisper. We introduce Whisper: Courtside Edition, a novel multi-agent large language model (LLM) pipeline that enhances Whisper transcriptions without retraining. The pipeline intercepts Whisper's initial transcript, applies specialized LLM agents for domain context identification, named entity recognition, and jargon detection, and generates compact prompts that guide Whisper's decoder. Evaluated on 421 NBA basketball commentary segments (a domain characterized by dense proper nouns and technical terminology) our best pipeline achieves a statistically significant 17.0% relative reduction in word error rate (WER; from 0.217 to 0.180, p<0.001). Improvements are observed in 40.1% of segments with degradation in only 7.1%, substantially outperforming direct transcript post-editing. These results demonstrate that prompt-based augmentation can deliver scalable domain adaptation for ASR, offering a practical alternative to costly model fine-tuning.
title	Whisper: Courtside Edition Enhancing ASR Performance Through LLM-Driven Context Generation
topic	Computation and Language
url	https://arxiv.org/abs/2602.18966

Similar Items