Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Shuguang, Wang, Yuanjing
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2501.01548
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866917883194376192
author	Wang, Shuguang Wang, Yuanjing
author_facet	Wang, Shuguang Wang, Yuanjing
contents	This paper presents a novel neural network architecture featuring automatic fixation point selection, designed to efficiently address complex tasks with reduced network size and computational overhead. The proposed model consists of: a low-resolution channel that captures low-resolution global features from input images; a high-resolution channel that sequentially extracts localized high-resolution features; and a hybrid encoding module that integrates the features from both channels. A defining characteristic of the hybrid encoding module is the inclusion of a fixation point generator, which dynamically produces fixation points, enabling the high-resolution channel to focus on regions of interest. The fixation points are generated in a task-driven manner, enabling the automatic selection of regions of interest. This approach avoids exhaustive high-resolution analysis of the entire image, maintaining task performance and computational efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_01548
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection Wang, Shuguang Wang, Yuanjing Computer Vision and Pattern Recognition This paper presents a novel neural network architecture featuring automatic fixation point selection, designed to efficiently address complex tasks with reduced network size and computational overhead. The proposed model consists of: a low-resolution channel that captures low-resolution global features from input images; a high-resolution channel that sequentially extracts localized high-resolution features; and a hybrid encoding module that integrates the features from both channels. A defining characteristic of the hybrid encoding module is the inclusion of a fixation point generator, which dynamically produces fixation points, enabling the high-resolution channel to focus on regions of interest. The fixation points are generated in a task-driven manner, enabling the automatic selection of regions of interest. This approach avoids exhaustive high-resolution analysis of the entire image, maintaining task performance and computational efficiency.
title	Task-Driven Fixation Network: An Efficient Architecture with Fixation Selection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2501.01548

Ähnliche Einträge