Guardado en:
Detalles Bibliográficos
Autores principales: Müller, Romy, Dürschmidt, Marcel, Ullrich, Julian, Knoll, Carsten, Weber, Sascha, Seitz, Steffen
Formato: Preprint
Publicado: 2023
Materias:
Acceso en línea:https://arxiv.org/abs/2307.13345
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866929462181888000
author Müller, Romy
Dürschmidt, Marcel
Ullrich, Julian
Knoll, Carsten
Weber, Sascha
Seitz, Steffen
author_facet Müller, Romy
Dürschmidt, Marcel
Ullrich, Julian
Knoll, Carsten
Weber, Sascha
Seitz, Steffen
contents Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN.
format Preprint
id arxiv_https___arxiv_org_abs_2307_13345
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type
Müller, Romy
Dürschmidt, Marcel
Ullrich, Julian
Knoll, Carsten
Weber, Sascha
Seitz, Steffen
Computer Vision and Pattern Recognition
Artificial Intelligence
Human-Computer Interaction
Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN.
title Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Human-Computer Interaction
url https://arxiv.org/abs/2307.13345