Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cheng, Yihua, Zhu, Yaning, Wang, Zongji, Hao, Hongquan, Liu, Yongwei, Cheng, Shiqing, Wang, Xi, Chang, Hyung Jin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.15664
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917620992704512
author	Cheng, Yihua Zhu, Yaning Wang, Zongji Hao, Hongquan Liu, Yongwei Cheng, Shiqing Wang, Xi Chang, Hyung Jin
author_facet	Cheng, Yihua Zhu, Yaning Wang, Zongji Hao, Hongquan Liu, Yongwei Cheng, Shiqing Wang, Xi Chang, Hyung Jin
contents	Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, collected from 125 subjects and covering a large range of gaze and head poses within vehicles. Conventional gaze collection systems are inadequate for in-vehicle use. In this dataset, we propose a new vision-based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annotation challenges. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. In-vehicle face images often suffer from low resolution, prompting our introduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expanding upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation. GazeDPTR shows state-of-the-art performance on the IVGaze dataset. Thirdly, we explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leveraging both positional features from the projection points and visual attributes from images, we achieve superior performance compared to relying solely on visual features, substantiating the advantage of gaze estimation. Our project is available at https://yihua.zone/work/ivgaze.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_15664
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation Cheng, Yihua Zhu, Yaning Wang, Zongji Hao, Hongquan Liu, Yongwei Cheng, Shiqing Wang, Xi Chang, Hyung Jin Computer Vision and Pattern Recognition Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, collected from 125 subjects and covering a large range of gaze and head poses within vehicles. Conventional gaze collection systems are inadequate for in-vehicle use. In this dataset, we propose a new vision-based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annotation challenges. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. In-vehicle face images often suffer from low resolution, prompting our introduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expanding upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation. GazeDPTR shows state-of-the-art performance on the IVGaze dataset. Thirdly, we explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leveraging both positional features from the projection points and visual attributes from images, we achieve superior performance compared to relying solely on visual features, substantiating the advantage of gaze estimation. Our project is available at https://yihua.zone/work/ivgaze.
title	What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.15664

Similar Items