Enregistré dans:
Détails bibliographiques
Auteurs principaux: Dolgopolyi, Roman, Amaslidou, Ioanna, Margaritou, Agrippina
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2510.00542
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866916982360637440
author Dolgopolyi, Roman
Amaslidou, Ioanna
Margaritou, Agrippina
author_facet Dolgopolyi, Roman
Amaslidou, Ioanna
Margaritou, Agrippina
contents Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. This study evaluates three machine learning models -- Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF), using a real-world dataset drawn from World Health Organization (WHO) and United Nations (UN) sources. After extensive preprocessing to address missing values and inconsistencies, each model's performance was assessed with $R^2$, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show that RF achieves the highest predictive accuracy ($R^2 = 0.9423$), significantly outperforming LR and RDT. Interpretability was prioritized through p-values for LR and feature importance metrics for the tree-based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the synergy between ensemble methods and transparency in addressing public-health challenges. Future research should explore advanced imputation strategies, alternative algorithms (e.g., neural networks), and updated data to further refine predictive accuracy and support evidence-based policymaking in global health contexts.
format Preprint
id arxiv_https___arxiv_org_abs_2510_00542
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest
Dolgopolyi, Roman
Amaslidou, Ioanna
Margaritou, Agrippina
Machine Learning
Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. This study evaluates three machine learning models -- Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF), using a real-world dataset drawn from World Health Organization (WHO) and United Nations (UN) sources. After extensive preprocessing to address missing values and inconsistencies, each model's performance was assessed with $R^2$, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show that RF achieves the highest predictive accuracy ($R^2 = 0.9423$), significantly outperforming LR and RDT. Interpretability was prioritized through p-values for LR and feature importance metrics for the tree-based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the synergy between ensemble methods and transparency in addressing public-health challenges. Future research should explore advanced imputation strategies, alternative algorithms (e.g., neural networks), and updated data to further refine predictive accuracy and support evidence-based policymaking in global health contexts.
title Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest
topic Machine Learning
url https://arxiv.org/abs/2510.00542