Saved in:
Bibliographic Details
Main Authors: Huang, Wanting, Wang, Weiran
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.23171
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911471887187968
author Huang, Wanting
Wang, Weiran
author_facet Huang, Wanting
Wang, Weiran
contents Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.
format Preprint
id arxiv_https___arxiv_org_abs_2602_23171
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
Huang, Wanting
Wang, Weiran
Audio and Speech Processing
Consistency regularization (CR) improves the robustness and accuracy of Connectionist Temporal Classification (CTC) by ensuring predictions remain stable across input perturbations. In this work, we propose Align-Consistency, an extension of CR designed for Align-Refine -- a non-autoregressive (non-AR) model that performs iterative refinement of frame-level hypotheses. This method leverages the speed of parallel inference while significantly boosting recognition performance. The effectiveness of Align-Consistency is demonstrated in two settings. First, in the fully supervised setting, our results indicate that applying CR to both the base CTC model and the subsequent refinement steps is critical, and the accuracy improvements from non-AR decoding and CR are mutually additive. Second, for semi-supervised ASR, we employ fast non-AR decoding to generate online pseudo-labels on unlabeled data, which are used to further refine the supervised model and lead to substantial gains.
title Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization
topic Audio and Speech Processing
url https://arxiv.org/abs/2602.23171