Saved in:
Bibliographic Details
Main Authors: Bishal, Mahathir Mohammad, Chowdory, Md. Rakibul Hassan, Das, Anik, Kabir, Muhammad Ashad
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.09897
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910719257083904
author Bishal, Mahathir Mohammad
Chowdory, Md. Rakibul Hassan
Das, Anik
Kabir, Muhammad Ashad
author_facet Bishal, Mahathir Mohammad
Chowdory, Md. Rakibul Hassan
Das, Anik
Kabir, Muhammad Ashad
contents The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
format Preprint
id arxiv_https___arxiv_org_abs_2402_09897
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions
Bishal, Mahathir Mohammad
Chowdory, Md. Rakibul Hassan
Das, Anik
Kabir, Muhammad Ashad
Machine Learning
Social and Information Networks
The COVID-19 pandemic has had adverse effects on both physical and mental health. During this pandemic, numerous studies have focused on gaining insights into health-related perspectives from social media. In this study, our primary objective is to develop a machine learning-based web application for automatically classifying COVID-19-related discussions on social media. To achieve this, we label COVID-19-related Twitter data, provide benchmark classification results, and develop a web application. We collected data using the Twitter API and labeled a total of 6,667 tweets into five different classes: health risks, prevention, symptoms, transmission, and treatment. We extracted features using various feature extraction methods and applied them to seven different traditional machine learning algorithms, including Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbour, Logistic Regression, and Linear SVC. Additionally, we used four deep learning algorithms: LSTM, CNN, RNN, and BERT, for classification. Overall, we achieved a maximum F1 score of 90.43% with the CNN algorithm in deep learning. The Linear SVC algorithm exhibited the highest F1 score at 86.13%, surpassing other traditional machine learning approaches. Our study not only contributes to the field of health-related data analysis but also provides a valuable resource in the form of a web-based tool for efficient data classification, which can aid in addressing public health challenges and increasing awareness during pandemics. We made the dataset and application publicly available, which can be downloaded from this link https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
title COVIDHealth: A Benchmark Twitter Dataset and Machine Learning based Web Application for Classifying COVID-19 Discussions
topic Machine Learning
Social and Information Networks
url https://arxiv.org/abs/2402.09897