Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Wenze, Lu, Ke-Han, Chang, Kai-Wei, Feng, Tiantian, Fang, Ching, Liao, Zhi-Chi, Yen, Dao Thi Hai, Wang, Syu-Siang, Tsao, Yu, Wang, Chi-Te, Fang, Shih-Hau
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2606.01639
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Deep learning has advanced pathological voice detection rapidly, yet rare laryngeal diseases remain underexplored due to data scarcity. Recurrent Respiratory Papillomatosis (RRP) exemplifies this gap: an HPV-induced disease of the larynx in which patients oscillate between recurrence and post-surgical remission over the years. RRP demands continuous voice monitoring that existing cross-sectional corpora cannot support. We introduce the first longitudinal voice dataset for RRP, comprising recordings from 26 patients with up to ten years of follow-up. Each session pairs sustained vowels with sentence-level utterances, which are annotated by otolaryngologists and confirmed synchronously with laryngoscopy. Building on this resource, we establish a systematic benchmark spanning handcrafted features, end-to-end deep networks, self-supervised pretrained models, and recent audio large language models, all evaluated under session-level cross-validation with patient-level audit. Per-subject longitudinal analyses further confirm that the cross-sectional discriminative signal reflects laryngoscopic disease state rather than stable speaker attributes. This work lays a foundation for rare longitudinal pathological voice tasks in low-resource clinical settings.

Similar Items