Saved in:
Bibliographic Details
Main Authors: Kathiriya, Niket, Haeri, Hossein, Chen, Cindy, Jerath, Kshitij
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2403.09588
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913265084268544
author Kathiriya, Niket
Haeri, Hossein
Chen, Cindy
Jerath, Kshitij
author_facet Kathiriya, Niket
Haeri, Hossein
Chen, Cindy
Jerath, Kshitij
contents Many modern systems, such as financial, transportation, and telecommunications systems, are time-sensitive in the sense that they demand low-latency predictions for real-time decision-making. Such systems often have to contend with continuous unbounded data streams as well as concept drift, which are challenging requirements that traditional regression techniques are unable to cater to. There exists a need to create novel data stream regression methods that can handle these scenarios. We present a database-inspired datastream regression model that (a) uses inspiration from R*-trees to create granules from incoming datastreams such that relevant information is retained, (b) iteratively forgets granules whose information is deemed to be outdated, thus maintaining a list of only recent, relevant granules, and (c) uses the recent data and granules to provide low-latency predictions. The R*-tree-inspired approach also makes the algorithm amenable to integration with database systems. Our experiments demonstrate that the ability of this method to discard data produces a significant order-of-magnitude improvement in latency and training time when evaluated against the most accurate state-of-the-art algorithms, while the R*-tree-inspired granulation technique provides competitively accurate predictions
format Preprint
id arxiv_https___arxiv_org_abs_2403_09588
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Iterative Forgetting: Online Data Stream Regression Using Database-Inspired Adaptive Granulation
Kathiriya, Niket
Haeri, Hossein
Chen, Cindy
Jerath, Kshitij
Machine Learning
Databases
Many modern systems, such as financial, transportation, and telecommunications systems, are time-sensitive in the sense that they demand low-latency predictions for real-time decision-making. Such systems often have to contend with continuous unbounded data streams as well as concept drift, which are challenging requirements that traditional regression techniques are unable to cater to. There exists a need to create novel data stream regression methods that can handle these scenarios. We present a database-inspired datastream regression model that (a) uses inspiration from R*-trees to create granules from incoming datastreams such that relevant information is retained, (b) iteratively forgets granules whose information is deemed to be outdated, thus maintaining a list of only recent, relevant granules, and (c) uses the recent data and granules to provide low-latency predictions. The R*-tree-inspired approach also makes the algorithm amenable to integration with database systems. Our experiments demonstrate that the ability of this method to discard data produces a significant order-of-magnitude improvement in latency and training time when evaluated against the most accurate state-of-the-art algorithms, while the R*-tree-inspired granulation technique provides competitively accurate predictions
title Iterative Forgetting: Online Data Stream Regression Using Database-Inspired Adaptive Granulation
topic Machine Learning
Databases
url https://arxiv.org/abs/2403.09588