Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Velayuthan, Menan, Gawesha, Asiri, Velayuthan, Purushoth, Kodagoda, Nuwan, Kasthurirathna, Dharshana, Samarasinghe, Pradeepa
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2504.15751
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866917994754473984
author Velayuthan, Menan
Gawesha, Asiri
Velayuthan, Purushoth
Kodagoda, Nuwan
Kasthurirathna, Dharshana
Samarasinghe, Pradeepa
author_facet Velayuthan, Menan
Gawesha, Asiri
Velayuthan, Purushoth
Kodagoda, Nuwan
Kasthurirathna, Dharshana
Samarasinghe, Pradeepa
contents In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.
format Preprint
id arxiv_https___arxiv_org_abs_2504_15751
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle GADS: A Super Lightweight Model for Head Pose Estimation
Velayuthan, Menan
Gawesha, Asiri
Velayuthan, Purushoth
Kodagoda, Nuwan
Kasthurirathna, Dharshana
Samarasinghe, Pradeepa
Computer Vision and Pattern Recognition
In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.
title GADS: A Super Lightweight Model for Head Pose Estimation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.15751