MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Tingna, Zhang, Sikai, Song, Mingming, Sun, Limin
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning Systems and Control
Accesso online:	https://arxiv.org/abs/2502.11484
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908518731218944
author	Wang, Tingna Zhang, Sikai Song, Mingming Sun, Limin
author_facet	Wang, Tingna Zhang, Sikai Song, Mingming Sun, Limin
contents	System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_11484
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Dictionary-Learning-Based Data Pruning for System Identification Wang, Tingna Zhang, Sikai Song, Mingming Sun, Limin Machine Learning Systems and Control System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.
title	Dictionary-Learning-Based Data Pruning for System Identification
topic	Machine Learning Systems and Control
url	https://arxiv.org/abs/2502.11484

Documenti analoghi