Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Zefang, Zhu, Chenyang, Cho, Sangwoo, Zhang, Shi-Xiong
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.18721
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915810961784832
author	Liu, Zefang Zhu, Chenyang Cho, Sangwoo Zhang, Shi-Xiong
author_facet	Liu, Zefang Zhu, Chenyang Cho, Sangwoo Zhang, Shi-Xiong
contents	Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlike conventional text-based correctors, our approach conditions the LLM on both the ASR hypothesis and the source audio, allowing it to recover phonetically accurate transcripts even from severe recognition errors. These refined pseudo-labels serve as high-fidelity targets for fine-tuning the ASR model in an iterative cycle. Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_18721
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models Liu, Zefang Zhu, Chenyang Cho, Sangwoo Zhang, Shi-Xiong Computation and Language Audio and Speech Processing Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlike conventional text-based correctors, our approach conditions the LLM on both the ASR hypothesis and the source audio, allowing it to recover phonetically accurate transcripts even from severe recognition errors. These refined pseudo-labels serve as high-fidelity targets for fine-tuning the ASR model in an iterative cycle. Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.
title	ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models
topic	Computation and Language Audio and Speech Processing
url	https://arxiv.org/abs/2602.18721

Similar Items