Guardado en:
Detalles Bibliográficos
Autores principales: Attia, Ahmed Adel, Demszky, Dorottya, Liu, Jing, Espy-Wilson, Carol
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2505.17088
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917284615815168
author Attia, Ahmed Adel
Demszky, Dorottya
Liu, Jing
Espy-Wilson, Carol
author_facet Attia, Ahmed Adel
Demszky, Dorottya
Liu, Jing
Espy-Wilson, Carol
contents Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2505_17088
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data
Attia, Ahmed Adel
Demszky, Dorottya
Liu, Jing
Espy-Wilson, Carol
Audio and Speech Processing
Computation and Language
Sound
Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.
title From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data
topic Audio and Speech Processing
Computation and Language
Sound
url https://arxiv.org/abs/2505.17088