Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.15846 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911929933496320 |
|---|---|
| author | Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu |
| author_facet | Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu |
| contents | Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_15846 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Revisiting Interpolation Augmentation for Speech-to-Text Generation Xu, Chen Wang, Jie Liu, Xiaoqian Dong, Qianqian Zhang, Chunliang Xiao, Tong Zhu, Jingbo Man, Dapeng Yang, Wu Computation and Language Audio and Speech Processing Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings. |
| title | Revisiting Interpolation Augmentation for Speech-to-Text Generation |
| topic | Computation and Language Audio and Speech Processing |
| url | https://arxiv.org/abs/2406.15846 |