Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gan, Lu, Li, Xi
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2511.07821
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914168034033664
author	Gan, Lu Li, Xi
author_facet	Gan, Lu Li, Xi
contents	The development of high-performance, on-device keyword spotting (KWS) systems for ultra-low-power hardware is critically constrained by the scarcity of specialized, multi-command training datasets. Traditional data collection through human recording is costly, slow, and lacks scalability. This paper introduces SYNTTS-COMMANDS, a novel, multilingual voice command dataset entirely generated using state-of-the-art Text-to-Speech (TTS) synthesis. By leveraging the CosyVoice 2 model and speaker embeddings from public corpora, we created a scalable collection of English and Chinese commands. Extensive benchmarking across a range of efficient acoustic models demonstrates that our synthetic dataset enables exceptional accuracy, achieving up to 99.5\% on English and 98\% on Chinese command recognition. These results robustly validate that synthetic speech can effectively replace human-recorded audio for training KWS classifiers. Our work directly addresses the data bottleneck in TinyML, providing a practical, scalable foundation for building private, low-latency, and energy-efficient voice interfaces on resource-constrained edge devices. The dataset and source code are publicly available at https://github.com/lugan113/SynTTS-Commands-Official.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_07821
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SynTTS-Commands: A Public Dataset for On-Device KWS via TTS-Synthesized Multilingual Speech Gan, Lu Li, Xi Sound The development of high-performance, on-device keyword spotting (KWS) systems for ultra-low-power hardware is critically constrained by the scarcity of specialized, multi-command training datasets. Traditional data collection through human recording is costly, slow, and lacks scalability. This paper introduces SYNTTS-COMMANDS, a novel, multilingual voice command dataset entirely generated using state-of-the-art Text-to-Speech (TTS) synthesis. By leveraging the CosyVoice 2 model and speaker embeddings from public corpora, we created a scalable collection of English and Chinese commands. Extensive benchmarking across a range of efficient acoustic models demonstrates that our synthetic dataset enables exceptional accuracy, achieving up to 99.5\% on English and 98\% on Chinese command recognition. These results robustly validate that synthetic speech can effectively replace human-recorded audio for training KWS classifiers. Our work directly addresses the data bottleneck in TinyML, providing a practical, scalable foundation for building private, low-latency, and energy-efficient voice interfaces on resource-constrained edge devices. The dataset and source code are publicly available at https://github.com/lugan113/SynTTS-Commands-Official.
title	SynTTS-Commands: A Public Dataset for On-Device KWS via TTS-Synthesized Multilingual Speech
topic	Sound
url	https://arxiv.org/abs/2511.07821

Similar Items