Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Cheema, Ahsan Jamal
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2605.02700
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911644526837760
author	Cheema, Ahsan Jamal
author_facet	Cheema, Ahsan Jamal
contents	Vocal hyperfunction (VH) is a prevalent voice disorder whose ambulatory detection remains challenging despite extensive daily voice data. Prior approaches capture week-long neck-surface accelerometer recordings but collapse them into fixed-length subject-level feature vectors, discarding within-day temporal dynamics encoding nuanced voicing feature interactions. We introduce a novel hybrid architecture combining gradient-boosted trees on day-level distributional features with a CNN-based multiple instance learning (MIL) framework that preserves and learns from from temporal dynamics throughout each day. On the held-out test set, our model exceeds the challenge baselines (AUC: 0.82 PVH, 0.77 NPVH), achieving AUCs of 0.879 for PVH (Rank 5) and 0.848 for NPVH (Rank 3), while also providing insights into clinically relevant information about both pathologies.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_02700
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Neck-Learn: Attention-Based Multiple Instance Learning and Ensemble Framework for Ecological Momentary Assessment Cheema, Ahsan Jamal Audio and Speech Processing Vocal hyperfunction (VH) is a prevalent voice disorder whose ambulatory detection remains challenging despite extensive daily voice data. Prior approaches capture week-long neck-surface accelerometer recordings but collapse them into fixed-length subject-level feature vectors, discarding within-day temporal dynamics encoding nuanced voicing feature interactions. We introduce a novel hybrid architecture combining gradient-boosted trees on day-level distributional features with a CNN-based multiple instance learning (MIL) framework that preserves and learns from from temporal dynamics throughout each day. On the held-out test set, our model exceeds the challenge baselines (AUC: 0.82 PVH, 0.77 NPVH), achieving AUCs of 0.879 for PVH (Rank 5) and 0.848 for NPVH (Rank 3), while also providing insights into clinically relevant information about both pathologies.
title	Neck-Learn: Attention-Based Multiple Instance Learning and Ensemble Framework for Ecological Momentary Assessment
topic	Audio and Speech Processing
url	https://arxiv.org/abs/2605.02700

Similar Items