Salvato in:
Dettagli Bibliografici
Autori principali: Wang, Tingna, Zhang, Sikai, Song, Mingming, Sun, Limin
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2502.11484
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866908518731218944
author Wang, Tingna
Zhang, Sikai
Song, Mingming
Sun, Limin
author_facet Wang, Tingna
Zhang, Sikai
Song, Mingming
Sun, Limin
contents System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.
format Preprint
id arxiv_https___arxiv_org_abs_2502_11484
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Dictionary-Learning-Based Data Pruning for System Identification
Wang, Tingna
Zhang, Sikai
Song, Mingming
Sun, Limin
Machine Learning
Systems and Control
System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning methods. It is found that the proposed method significantly outperforms the random pruning method.
title Dictionary-Learning-Based Data Pruning for System Identification
topic Machine Learning
Systems and Control
url https://arxiv.org/abs/2502.11484