Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Chen, Wang, Jie, Liu, Xiaoqian, Dong, Qianqian, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2406.15846
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911929933496320
author	Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu
author_facet	Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu
contents	Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_15846
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Revisiting Interpolation Augmentation for Speech-to-Text Generation Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu Computation and Language Audio and Speech Processing Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
title	Revisiting Interpolation Augmentation for Speech-to-Text Generation
topic	Computation and Language Audio and Speech Processing
url	https://arxiv.org/abs/2406.15846

Similar Items