Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Goto, Isao
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2409.01025
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914932233076736
author	Goto, Isao
author_facet	Goto, Isao
contents	This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_01025
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation Goto, Isao Machine Learning This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using statistical interval estimation in order to fairly compare the accuracy of the prediction models.
title	Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation
topic	Machine Learning
url	https://arxiv.org/abs/2409.01025

Similar Items