Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Velayuthan, Menan, Gawesha, Asiri, Velayuthan, Purushoth, Kodagoda, Nuwan, Kasthurirathna, Dharshana, Samarasinghe, Pradeepa
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2504.15751
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866917994754473984
author	Velayuthan, Menan Gawesha, Asiri Velayuthan, Purushoth Kodagoda, Nuwan Kasthurirathna, Dharshana Samarasinghe, Pradeepa
author_facet	Velayuthan, Menan Gawesha, Asiri Velayuthan, Purushoth Kodagoda, Nuwan Kasthurirathna, Dharshana Samarasinghe, Pradeepa
contents	In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_15751
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	GADS: A Super Lightweight Model for Head Pose Estimation Velayuthan, Menan Gawesha, Asiri Velayuthan, Purushoth Kodagoda, Nuwan Kasthurirathna, Dharshana Samarasinghe, Pradeepa Computer Vision and Pattern Recognition In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.
title	GADS: A Super Lightweight Model for Head Pose Estimation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.15751

Ähnliche Einträge