Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Chandolu, Abhinav
Format: Recurso digital
Sprache:Englisch
Veröffentlicht: Zenodo 2025
Schlagworte:
Online-Zugang:https://doi.org/10.5281/zenodo.16955346
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Inhaltsangabe:
  • <p><strong>This study utilizes supervised machine learning on the U.S. Accident Dataset (7.7M records, 2016–2023) in order to predict crash severity. The dataset provides detailed accident records from all over the United States, including weather, location, time, and infrastructure conditions. Feature groups were created to isolate temporal, environmental, and spatial factors, and were trained on four models, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, and RandomForestRegressor. These results were then evaluated using overall accuracy or mean squared error to compare results. Three sampling conditions were tested: raw imbalance data, SMOTE-resampled data, and SMOTEENN-resampled data. The results show that the RandomForestClassifier trained on the unbalanced infrastructure-based feature set (X2) had the highest accuracy at 87.97%, exceeding the accuracies of all other sampling-based models. While most models had a decreased accuracy with resampling, the regression models improved, showing their tolerance to synthetic noise. This research reinforces the idea that ensemble models perform strongly in class-imbalanced, real-world settings and highlights the predictive impact of infrastructure-related data on accident severity and emergency planning.</strong></p>