Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Unnikrishnan, Harikrishnan, Patel, Rita
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2603.02087
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910195033047040
author	Unnikrishnan, Harikrishnan Patel, Rita
author_facet	Unnikrishnan, Harikrishnan Patel, Rita
contents	We present a fully automated, two-stage modular glottal area segmentation framework for high-speed videoendoscopy (HSV) designed for accuracy, generalizability, and real-time playback. Our detection-gated pipeline combines a YOLOv8n glottis localizer with a U-Net segmenter; the localizer defines a tight crop to ensure a consistent field of view and gates the output to reduce spurious segmentations during glottal closure. The models were trained on the GIRAFE (N=600) and BAGLS (N=55,750) datasets. Cross-dataset portability was evaluated by benchmarking GIRAFE-trained models on the BAGLS test set without fine-tuning. In these evaluations, the pipeline achieved a Dice Similarity Coefficient (DSC) of 0.745 (87% of the in-domain ceiling). On in-distribution test sets, the system achieved DSCs of 0.81 (GIRAFE) and 0.856 (BAGLS), outperforming or competing with state-of-the-art methods. An exploratory clinical study of 40 subjects demonstrated that the glottal area Coefficient of Variation (CV distinguished healthy from pathological function (p=0.006). The system processes ~35 frames per second on commodity hardware, enabling interactive clinical review. This design supports uniform extraction of laryngeal kinematic measures across varying acquisition settings. Code, weights, and software are available at https://github.com/hari-krishnan/openglottal.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_02087
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment Unnikrishnan, Harikrishnan Patel, Rita Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning We present a fully automated, two-stage modular glottal area segmentation framework for high-speed videoendoscopy (HSV) designed for accuracy, generalizability, and real-time playback. Our detection-gated pipeline combines a YOLOv8n glottis localizer with a U-Net segmenter; the localizer defines a tight crop to ensure a consistent field of view and gates the output to reduce spurious segmentations during glottal closure. The models were trained on the GIRAFE (N=600) and BAGLS (N=55,750) datasets. Cross-dataset portability was evaluated by benchmarking GIRAFE-trained models on the BAGLS test set without fine-tuning. In these evaluations, the pipeline achieved a Dice Similarity Coefficient (DSC) of 0.745 (87% of the in-domain ceiling). On in-distribution test sets, the system achieved DSCs of 0.81 (GIRAFE) and 0.856 (BAGLS), outperforming or competing with state-of-the-art methods. An exploratory clinical study of 40 subjects demonstrated that the glottal area Coefficient of Variation (CV distinguished healthy from pathological function (p=0.006). The system processes ~35 frames per second on commodity hardware, enabling interactive clinical review. This design supports uniform extraction of laryngeal kinematic measures across varying acquisition settings. Code, weights, and software are available at https://github.com/hari-krishnan/openglottal.
title	A Detection-Gated Pipeline for Robust Glottal Area Waveform Extraction and Clinical Pathology Assessment
topic	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2603.02087

Similar Items