Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tran, Ngoc Hong, Vien, Lan Kim, Le, Ngoc-Thao Thi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Emerging Technologies
Online Access:	https://arxiv.org/abs/2504.08651
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910909292609536
author	Tran, Ngoc Hong Vien, Lan Kim Le, Ngoc-Thao Thi
author_facet	Tran, Ngoc Hong Vien, Lan Kim Le, Ngoc-Thao Thi
contents	Lung cancer is one of the major causes of death worldwide, and Vietnam is not an exception. This disease is the second most common type of cancer globally and the second most common cause of death in Vietnam, just after liver cancer, with 23,797 fatal cases and 26,262 new cases, or 14.4% of the disease in 2020. Recently, with rising disease rates in Vietnam causing a huge public health burden, lung cancer continues to hold the top position in attention and care. Especially together with climate change, under a variety of types of pollution, deforestation, and modern lifestyles, lung cancer risks are on red alert, particularly in Vietnam. To understand more about the severe disease sources in Vietnam from a diversity of key factors, including environmental features and the current health state, with a particular emphasis on Vietnam's distinct socioeconomic and ecological context, we utilize large datasets such as patient health records and environmental indicators containing necessary information, such as deforestation rate, green cover rate, air pollution, and lung cancer risks, that is collected from well-known governmental sharing websites. Then, we process and connect them and apply analytical methods (heatmap, information gain, p-value, spearman correlation) to determine causal correlations influencing lung cancer risks. Moreover, we deploy machine learning (ML) models (Decision Tree, Random Forest, Support Vector Machine, K-mean clustering) to discover cancer risk patterns. Our experimental results, leveraged by the aforementioned ML models to identify the disease patterns, are promising, particularly, the models as Random Forest, SVM, and PCA are working well on the datasets and give high accuracy (99%), however, the K means clustering has very low accuracy (10%) and does not fit the datasets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_08651
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Application of machine learning models to predict the relationship between air pollution, ecosystem degradation, and health disparities and lung cancer in Vietnam Tran, Ngoc Hong Vien, Lan Kim Le, Ngoc-Thao Thi Machine Learning Emerging Technologies Lung cancer is one of the major causes of death worldwide, and Vietnam is not an exception. This disease is the second most common type of cancer globally and the second most common cause of death in Vietnam, just after liver cancer, with 23,797 fatal cases and 26,262 new cases, or 14.4% of the disease in 2020. Recently, with rising disease rates in Vietnam causing a huge public health burden, lung cancer continues to hold the top position in attention and care. Especially together with climate change, under a variety of types of pollution, deforestation, and modern lifestyles, lung cancer risks are on red alert, particularly in Vietnam. To understand more about the severe disease sources in Vietnam from a diversity of key factors, including environmental features and the current health state, with a particular emphasis on Vietnam's distinct socioeconomic and ecological context, we utilize large datasets such as patient health records and environmental indicators containing necessary information, such as deforestation rate, green cover rate, air pollution, and lung cancer risks, that is collected from well-known governmental sharing websites. Then, we process and connect them and apply analytical methods (heatmap, information gain, p-value, spearman correlation) to determine causal correlations influencing lung cancer risks. Moreover, we deploy machine learning (ML) models (Decision Tree, Random Forest, Support Vector Machine, K-mean clustering) to discover cancer risk patterns. Our experimental results, leveraged by the aforementioned ML models to identify the disease patterns, are promising, particularly, the models as Random Forest, SVM, and PCA are working well on the datasets and give high accuracy (99%), however, the K means clustering has very low accuracy (10%) and does not fit the datasets.
title	Application of machine learning models to predict the relationship between air pollution, ecosystem degradation, and health disparities and lung cancer in Vietnam
topic	Machine Learning Emerging Technologies
url	https://arxiv.org/abs/2504.08651

Similar Items