תוכן הענינים: :: Library Catalog

שמור ב:

מידע ביבליוגרפי
מחבר ראשי:	Khan Gulrez Shagufa Fazal Ahmed
פורמט:	Recurso digital
שפה:	אנגלית
יצא לאור:	Zenodo 2026
נושאים:	Diabetes prediction XGBoost Ensemble learning SHAP explainability Feature engineering Machine learning Machine Learning Healthcare AI Binary classification Pima Indians dataset Clinical decision support Pharmacoinformatics Precision Medicine Artificial intelligence Latent Autoimmune Diabetes in Adults/diagnosis
גישה מקוונת:	https://doi.org/10.5281/zenodo.20336854
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

תוכן הענינים:

<p>Diabetes mellitus is a chronic metabolic disorder affecting over 537 million adults worldwide. This study presents a complete end-to-end machine learning pipeline for binary classification of diabetes status using the Pima Indians Diabetes Dataset (n=768). The pipeline integrates systematic data cleaning, group-median imputation, IQR-based outlier clipping, and six engineered interaction features. An XGBoost classifier was trained with 300 estimators, class-weighted loss, and L1/L2 regularization. Cross-validation was performed using a scikit-learn Pipeline to prevent data leakage. The model achieved accuracy of 87.0%, recall of 85.2%, precision of 79.3%, F1 score of 82.1%, and ROC-AUC of 94.7%. Five-fold CV AUC was 0.944 (SD=0.014). SHAP analysis identified Glucose, Glucose x BMI interaction, and BMI as the three most impactful predictors. Source code, trained model artifacts, and figures are publicly available on GitHub (https://github.com/randomthingsonlineatsk-cloud/diabetes-xgboost-prediction) and archived on Zenodo (DOI: 10.5281/zenodo.20332710).</p>

פריטים דומים