Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jiang, Wenbo, Zhang, Rui, Li, Hongwei, Liu, Xiaoyuan, Yang, Haomiao, Yu, Shui
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Databases Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.10446
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910537725509632
author	Jiang, Wenbo Zhang, Rui Li, Hongwei Liu, Xiaoyuan Yang, Haomiao Yu, Shui
author_facet	Jiang, Wenbo Zhang, Rui Li, Hongwei Liu, Xiaoyuan Yang, Haomiao Yu, Shui
contents	Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_10446
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DDFAD: Dataset Distillation Framework for Audio Data Jiang, Wenbo Zhang, Rui Li, Hongwei Liu, Xiaoyuan Yang, Haomiao Yu, Shui Sound Artificial Intelligence Databases Audio and Speech Processing Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.
title	DDFAD: Dataset Distillation Framework for Audio Data
topic	Sound Artificial Intelligence Databases Audio and Speech Processing
url	https://arxiv.org/abs/2407.10446

Similar Items