Saved in:
Bibliographic Details
Main Authors: Hirose, Kei, Matsui, Hidetoshi, Masuda, Hiroki
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.15567
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918128524460032
author Hirose, Kei
Matsui, Hidetoshi
Masuda, Hiroki
author_facet Hirose, Kei
Matsui, Hidetoshi
Masuda, Hiroki
contents In various practical situations, forecasting of aggregate values rather than individual ones is often our main focus. For instance, electricity companies are interested in forecasting the total electricity demand in a specific region to ensure reliable grid operation and resource allocation. However, to our knowledge, statistical learning specifically for forecasting aggregate values has not yet been well-established. In particular, the relationship between forecast error and the number of clusters has not been well studied, as clustering is usually treated as unsupervised learning. This study introduces a novel forecasting method specifically focused on the aggregate values in the linear regression model. We call it the Aggregate Value Regression (AVR), and it is constructed by combining all regression models into a single model. With the AVR, we must estimate a huge number of parameters when the number of regression models to be combined is large, resulting in overparameterization. To address the overparameterization issue, we introduce a hierarchical clustering technique, referred to as AVR-C (C stands for clustering). In this approach, several clusters of regression models are constructed, and the AVR is performed within each cluster. The AVR-C introduces a novel bias-variance trade-off theory under the assumption of a misspecified model. In this framework, the number of clusters characterizes model complexity. Monte Carlo simulation is conducted to investigate the behavior of training and test errors of our proposed clustering technique. The bias-variance trade-off theory is also demonstrated through the analysis of electricity demand forecasting.
format Preprint
id arxiv_https___arxiv_org_abs_2508_15567
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Clustering-based aggregate value regression
Hirose, Kei
Matsui, Hidetoshi
Masuda, Hiroki
Methodology
In various practical situations, forecasting of aggregate values rather than individual ones is often our main focus. For instance, electricity companies are interested in forecasting the total electricity demand in a specific region to ensure reliable grid operation and resource allocation. However, to our knowledge, statistical learning specifically for forecasting aggregate values has not yet been well-established. In particular, the relationship between forecast error and the number of clusters has not been well studied, as clustering is usually treated as unsupervised learning. This study introduces a novel forecasting method specifically focused on the aggregate values in the linear regression model. We call it the Aggregate Value Regression (AVR), and it is constructed by combining all regression models into a single model. With the AVR, we must estimate a huge number of parameters when the number of regression models to be combined is large, resulting in overparameterization. To address the overparameterization issue, we introduce a hierarchical clustering technique, referred to as AVR-C (C stands for clustering). In this approach, several clusters of regression models are constructed, and the AVR is performed within each cluster. The AVR-C introduces a novel bias-variance trade-off theory under the assumption of a misspecified model. In this framework, the number of clusters characterizes model complexity. Monte Carlo simulation is conducted to investigate the behavior of training and test errors of our proposed clustering technique. The bias-variance trade-off theory is also demonstrated through the analysis of electricity demand forecasting.
title Clustering-based aggregate value regression
topic Methodology
url https://arxiv.org/abs/2508.15567