Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Wanting, Wang, Weiran
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.23171
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911471887187968
author	Huang, Wanting Wang, Weiran
author_facet	Huang, Wanting Wang, Weiran
contents	Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_23171
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization Huang, Wanting Wang, Weiran Audio and Speech Processing Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.
title	Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2602.23171

Similar Items