Saved in:
Bibliographic Details
Main Authors: Odonga, Timothy, Esper, Christine D., Factor, Stewart A., McKay, J. Lucas, Kwon, Hyeokhyen
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.09626
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910005695873024
author Odonga, Timothy
Esper, Christine D.
Factor, Stewart A.
McKay, J. Lucas
Kwon, Hyeokhyen
author_facet Odonga, Timothy
Esper, Christine D.
Factor, Stewart A.
McKay, J. Lucas
Kwon, Hyeokhyen
contents Freezing of gait (FOG) is a debilitating symptom of Parkinson's disease (PD) and a common cause of injurious falls. Recent advances in wearable-based human activity recognition (HAR) enable FOG detection, but bias and fairness in these models remain understudied. Bias refers to systematic errors leading to unequal outcomes, while fairness refers to consistent performance across subject groups. Biased models could systematically underserve patients with specific FOG phenotypes or demographics, potentially widening care disparities. We systematically evaluated bias and fairness of state-of-the-art HAR models for FOG detection across phenotypes and demographics using multi-site datasets. We assessed four mitigation approaches: conventional methods (threshold optimization and adversarial debiasing) and transfer learning approaches (multi-site transfer and fine-tuning large pretrained models). Fairness was quantified using demographic parity ratio (DPR) and equalized odds ratio (EOR). HAR models exhibited substantial bias (DPR & EOR < 0.8) across age, sex, disease duration, and critically, FOG phenotype. Phenotype-specific bias is particularly concerning as tremulous and akinetic FOG require different clinical management. Conventional bias mitigation methods failed: threshold optimization (DPR=-0.126, EOR=+0.063) and adversarial debiasing (DPR=-0.008, EOR=-0.001) showed minimal improvement. In contrast, transfer learning from multi-site datasets significantly improved fairness (DPR=+0.037, p<0.01; EOR=+0.045, p<0.01) and performance (F1-score=+0.020, p<0.05). Transfer learning across diverse datasets is essential for developing equitable HAR models that reliably detect FOG across all patient phenotypes, ensuring wearable-based monitoring benefits all individuals with PD.
format Preprint
id arxiv_https___arxiv_org_abs_2502_09626
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Evidence for Phenotype-Driven Disparities in Freezing of Gait Detection and Approaches to Bias Mitigation
Odonga, Timothy
Esper, Christine D.
Factor, Stewart A.
McKay, J. Lucas
Kwon, Hyeokhyen
Signal Processing
Machine Learning
Freezing of gait (FOG) is a debilitating symptom of Parkinson's disease (PD) and a common cause of injurious falls. Recent advances in wearable-based human activity recognition (HAR) enable FOG detection, but bias and fairness in these models remain understudied. Bias refers to systematic errors leading to unequal outcomes, while fairness refers to consistent performance across subject groups. Biased models could systematically underserve patients with specific FOG phenotypes or demographics, potentially widening care disparities. We systematically evaluated bias and fairness of state-of-the-art HAR models for FOG detection across phenotypes and demographics using multi-site datasets. We assessed four mitigation approaches: conventional methods (threshold optimization and adversarial debiasing) and transfer learning approaches (multi-site transfer and fine-tuning large pretrained models). Fairness was quantified using demographic parity ratio (DPR) and equalized odds ratio (EOR). HAR models exhibited substantial bias (DPR & EOR < 0.8) across age, sex, disease duration, and critically, FOG phenotype. Phenotype-specific bias is particularly concerning as tremulous and akinetic FOG require different clinical management. Conventional bias mitigation methods failed: threshold optimization (DPR=-0.126, EOR=+0.063) and adversarial debiasing (DPR=-0.008, EOR=-0.001) showed minimal improvement. In contrast, transfer learning from multi-site datasets significantly improved fairness (DPR=+0.037, p<0.01; EOR=+0.045, p<0.01) and performance (F1-score=+0.020, p<0.05). Transfer learning across diverse datasets is essential for developing equitable HAR models that reliably detect FOG across all patient phenotypes, ensuring wearable-based monitoring benefits all individuals with PD.
title Evidence for Phenotype-Driven Disparities in Freezing of Gait Detection and Approaches to Bias Mitigation
topic Signal Processing
Machine Learning
url https://arxiv.org/abs/2502.09626