Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Müller, Romy, Dürschmidt, Marcel, Ullrich, Julian, Knoll, Carsten, Weber, Sascha, Seitz, Steffen
Formato:	Preprint
Publicado:	2023
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction
Acceso en línea:	https://arxiv.org/abs/2307.13345
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866929462181888000
author	Müller, Romy Dürschmidt, Marcel Ullrich, Julian Knoll, Carsten Weber, Sascha Seitz, Steffen
author_facet	Müller, Romy Dürschmidt, Marcel Ullrich, Julian Knoll, Carsten Weber, Sascha Seitz, Steffen
contents	Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN.
format	Preprint
id	arxiv_https___arxiv_org_abs_2307_13345
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type Müller, Romy Dürschmidt, Marcel Ullrich, Julian Knoll, Carsten Weber, Sascha Seitz, Steffen Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN.
title	Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type
topic	Computer Vision and Pattern Recognition Artificial Intelligence Human-Computer Interaction
url	https://arxiv.org/abs/2307.13345

Ejemplares similares