Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02700 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911644526837760 |
|---|---|
| author | Cheema, Ahsan Jamal |
| author_facet | Cheema, Ahsan Jamal |
| contents | Vocal hyperfunction (VH) is a prevalent voice disorder whose ambulatory detection remains challenging despite extensive daily voice data. Prior approaches capture week-long neck-surface accelerometer recordings but collapse them into fixed-length subject-level feature vectors, discarding within-day temporal dynamics encoding nuanced voicing feature interactions. We introduce a novel hybrid architecture combining gradient-boosted trees on day-level distributional features with a CNN-based multiple instance learning (MIL) framework that preserves and learns from from temporal dynamics throughout each day. On the held-out test set, our model exceeds the challenge baselines (AUC: 0.82 PVH, 0.77 NPVH), achieving AUCs of 0.879 for PVH (Rank 5) and 0.848 for NPVH (Rank 3), while also providing insights into clinically relevant information about both pathologies. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_02700 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Neck-Learn: Attention-Based Multiple Instance Learning and Ensemble Framework for Ecological Momentary Assessment Cheema, Ahsan Jamal Audio and Speech Processing Vocal hyperfunction (VH) is a prevalent voice disorder whose ambulatory detection remains challenging despite extensive daily voice data. Prior approaches capture week-long neck-surface accelerometer recordings but collapse them into fixed-length subject-level feature vectors, discarding within-day temporal dynamics encoding nuanced voicing feature interactions. We introduce a novel hybrid architecture combining gradient-boosted trees on day-level distributional features with a CNN-based multiple instance learning (MIL) framework that preserves and learns from from temporal dynamics throughout each day. On the held-out test set, our model exceeds the challenge baselines (AUC: 0.82 PVH, 0.77 NPVH), achieving AUCs of 0.879 for PVH (Rank 5) and 0.848 for NPVH (Rank 3), while also providing insights into clinically relevant information about both pathologies. |
| title | Neck-Learn: Attention-Based Multiple Instance Learning and Ensemble Framework for Ecological Momentary Assessment |
| topic | Audio and Speech Processing |
| url | https://arxiv.org/abs/2605.02700 |