Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Transtrum, Mark K., Hart, Gus L. W., Jarvis, Tyler J., Whitehead, Jared P.
Format:	Preprint
Published:	2024
Subjects:	Statistics Theory Machine Learning Mathematical Physics
Online Access:	https://arxiv.org/abs/2408.08294
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912409841565696
author	Transtrum, Mark K. Hart, Gus L. W. Jarvis, Tyler J. Whitehead, Jared P.
author_facet	Transtrum, Mark K. Hart, Gus L. W. Jarvis, Tyler J. Whitehead, Jared P.
contents	A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.
format	Preprint
id	arxiv_https___arxiv_org_abs_2408_08294
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	eGAD! double descent is explained by Generalized Aliasing Decomposition Transtrum, Mark K. Hart, Gus L. W. Jarvis, Tyler J. Whitehead, Jared P. Statistics Theory Machine Learning Mathematical Physics A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.
title	eGAD! double descent is explained by Generalized Aliasing Decomposition
topic	Statistics Theory Machine Learning Mathematical Physics
url	https://arxiv.org/abs/2408.08294

Similar Items