Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Attia, Ahmed Adel, Demszky, Dorottya, Liu, Jing, Espy-Wilson, Carol
Formato:	Preprint
Publicado:	2025
Materias:	Audio and Speech Processing Computation and Language Sound
Acceso en línea:	https://arxiv.org/abs/2505.17088
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917284615815168
author	Attia, Ahmed Adel Demszky, Dorottya Liu, Jing Espy-Wilson, Carol
author_facet	Attia, Ahmed Adel Demszky, Dorottya Liu, Jing Espy-Wilson, Carol
contents	Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_17088
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data Attia, Ahmed Adel Demszky, Dorottya Liu, Jing Espy-Wilson, Carol Audio and Speech Processing Computation and Language Sound Recent progress in speech recognition has relied on models trained on vast amounts of labeled data. However, classroom Automatic Speech Recognition (ASR) faces the real-world challenge of abundant weak transcripts paired with only a small amount of accurate, gold-standard data. In such low-resource settings, high transcription costs make re-transcription impractical. To address this, we ask: what is the best approach when abundant inexpensive weak transcripts coexist with limited gold-standard data, as is the case for classroom speech data? We propose Weakly Supervised Pretraining (WSP), a two-step process where models are first pretrained on weak transcripts in a supervised manner, and then fine-tuned on accurate data. Our results, based on both synthetic and real weak transcripts, show that WSP outperforms alternative methods, establishing it as an effective training methodology for low-resource ASR in real-world scenarios.
title	From Weak Labels to Strong Results: Utilizing 5,000 Hours of Noisy Classroom Transcripts with Minimal Accurate Data
topic	Audio and Speech Processing Computation and Language Sound
url	https://arxiv.org/abs/2505.17088

Ejemplares similares