Saved in:
Bibliographic Details
Main Authors: Transtrum, Mark K., Hart, Gus L. W., Jarvis, Tyler J., Whitehead, Jared P.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.08294
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912409841565696
author Transtrum, Mark K.
Hart, Gus L. W.
Jarvis, Tyler J.
Whitehead, Jared P.
author_facet Transtrum, Mark K.
Hart, Gus L. W.
Jarvis, Tyler J.
Whitehead, Jared P.
contents A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.
format Preprint
id arxiv_https___arxiv_org_abs_2408_08294
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle eGAD! double descent is explained by Generalized Aliasing Decomposition
Transtrum, Mark K.
Hart, Gus L. W.
Jarvis, Tyler J.
Whitehead, Jared P.
Statistics Theory
Machine Learning
Mathematical Physics
A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.
title eGAD! double descent is explained by Generalized Aliasing Decomposition
topic Statistics Theory
Machine Learning
Mathematical Physics
url https://arxiv.org/abs/2408.08294