Enregistré dans:
| Auteurs principaux: | , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2510.00542 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866916982360637440 |
|---|---|
| author | Dolgopolyi, Roman Amaslidou, Ioanna Margaritou, Agrippina |
| author_facet | Dolgopolyi, Roman Amaslidou, Ioanna Margaritou, Agrippina |
| contents | Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. This study evaluates three machine learning models -- Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF), using a real-world dataset drawn from World Health Organization (WHO) and United Nations (UN) sources. After extensive preprocessing to address missing values and inconsistencies, each model's performance was assessed with $R^2$, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show that RF achieves the highest predictive accuracy ($R^2 = 0.9423$), significantly outperforming LR and RDT. Interpretability was prioritized through p-values for LR and feature importance metrics for the tree-based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the synergy between ensemble methods and transparency in addressing public-health challenges. Future research should explore advanced imputation strategies, alternative algorithms (e.g., neural networks), and updated data to further refine predictive accuracy and support evidence-based policymaking in global health contexts. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_00542 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest Dolgopolyi, Roman Amaslidou, Ioanna Margaritou, Agrippina Machine Learning Life expectancy is a fundamental indicator of population health and socio-economic well-being, yet accurately forecasting it remains challenging due to the interplay of demographic, environmental, and healthcare factors. This study evaluates three machine learning models -- Linear Regression (LR), Regression Decision Tree (RDT), and Random Forest (RF), using a real-world dataset drawn from World Health Organization (WHO) and United Nations (UN) sources. After extensive preprocessing to address missing values and inconsistencies, each model's performance was assessed with $R^2$, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE). Results show that RF achieves the highest predictive accuracy ($R^2 = 0.9423$), significantly outperforming LR and RDT. Interpretability was prioritized through p-values for LR and feature importance metrics for the tree-based models, revealing immunization rates (diphtheria, measles) and demographic attributes (HIV/AIDS, adult mortality) as critical drivers of life-expectancy predictions. These insights underscore the synergy between ensemble methods and transparency in addressing public-health challenges. Future research should explore advanced imputation strategies, alternative algorithms (e.g., neural networks), and updated data to further refine predictive accuracy and support evidence-based policymaking in global health contexts. |
| title | Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2510.00542 |