Saved in:
Bibliographic Details
Main Authors: Xu, Chen, Wang, Jie, Liu, Xiaoqian, Dong, Qianqian, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.15846
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911929933496320
author Xu, Chen
Wang, Jie
Liu, Xiaoqian
Dong, Qianqian
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
Man, Dapeng
Yang, Wu
author_facet Xu, Chen
Wang, Jie
Liu, Xiaoqian
Dong, Qianqian
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
Man, Dapeng
Yang, Wu
contents Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
format Preprint
id arxiv_https___arxiv_org_abs_2406_15846
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Revisiting Interpolation Augmentation for Speech-to-Text Generation
Xu, Chen
Wang, Jie
Liu, Xiaoqian
Dong, Qianqian
Zhang, Chunliang
Xiao, Tong
Zhu, Jingbo
Man, Dapeng
Yang, Wu
Computation and Language
Audio and Speech Processing
Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.
title Revisiting Interpolation Augmentation for Speech-to-Text Generation
topic Computation and Language
Audio and Speech Processing
url https://arxiv.org/abs/2406.15846