Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Rehman, Abdul, Zhang, Jian-Jun, Yang, Xiaosong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning Audio and Speech Processing I.2.7
Online Access:	https://arxiv.org/abs/2508.15316
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within phoneme-length windows.

Similar Items