Saved in:
Bibliographic Details
Main Authors: Ren, Wenze, Lu, Ke-Han, Chang, Kai-Wei, Feng, Tiantian, Fang, Ching, Liao, Zhi-Chi, Yen, Dao Thi Hai, Wang, Syu-Siang, Tsao, Yu, Wang, Chi-Te, Fang, Shih-Hau
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2606.01639
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Deep learning has advanced pathological voice detection rapidly, yet rare laryngeal diseases remain underexplored due to data scarcity. Recurrent Respiratory Papillomatosis (RRP) exemplifies this gap: an HPV-induced disease of the larynx in which patients oscillate between recurrence and post-surgical remission over the years. RRP demands continuous voice monitoring that existing cross-sectional corpora cannot support. We introduce the first longitudinal voice dataset for RRP, comprising recordings from 26 patients with up to ten years of follow-up. Each session pairs sustained vowels with sentence-level utterances, which are annotated by otolaryngologists and confirmed synchronously with laryngoscopy. Building on this resource, we establish a systematic benchmark spanning handcrafted features, end-to-end deep networks, self-supervised pretrained models, and recent audio large language models, all evaluated under session-level cross-validation with patient-level audit. Per-subject longitudinal analyses further confirm that the cross-sectional discriminative signal reflects laryngoscopic disease state rather than stable speaker attributes. This work lays a foundation for rare longitudinal pathological voice tasks in low-resource clinical settings.