Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Yang, Xuwen
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Sound Audio and Speech Processing I.2.7
Online Access:	https://arxiv.org/abs/2508.15853
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

End-to-end ASR models, despite their success on benchmarks, often pro-duce catastrophic semantic errors in noisy environments. We attribute this fragility to the prevailing 'direct mapping' objective, which solely penalizes final output errors while leaving the model's internal computational pro-cess unconstrained. To address this, we introduce the Multi-Granularity Soft Consistency (MGSC) framework, a model-agnostic, plug-and-play module that enforces internal self-consistency by simultaneously regulariz-ing macro-level sentence semantics and micro-level token alignment. Cru-cially, our work is the first to uncover a powerful synergy between these two consistency granularities: their joint optimization yields robustness gains that significantly surpass the sum of their individual contributions. On a public dataset, MGSC reduces the average Character Error Rate by a relative 8.7% across diverse noise conditions, primarily by preventing se-vere meaning-altering mistakes. Our work demonstrates that enforcing in-ternal consistency is a crucial step towards building more robust and trust-worthy AI.

Similar Items