Salvato in:
Dettagli Bibliografici
Autori principali: Muttenthaler, Lukas, Greff, Klaus, Born, Frieda, Spitzer, Bernhard, Kornblith, Simon, Mozer, Michael C., Müller, Klaus-Robert, Unterthiner, Thomas, Lampinen, Andrew K.
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2409.06509
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866911135190482944
author Muttenthaler, Lukas
Greff, Klaus
Born, Frieda
Spitzer, Bernhard
Kornblith, Simon
Mozer, Michael C.
Müller, Klaus-Robert
Unterthiner, Thomas
Lampinen, Andrew K.
author_facet Muttenthaler, Lukas
Greff, Klaus
Born, Frieda
Spitzer, Bernhard
Kornblith, Simon
Mozer, Michael C.
Müller, Klaus-Robert
Unterthiner, Thomas
Lampinen, Andrew K.
contents Deep neural networks have achieved success across a wide range of applications, including as models of human behavior and neural representations in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-aligned behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via finetuning. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgments and more practically useful, thus paving the way toward more robust, interpretable, and human-aligned artificial intelligence systems.
format Preprint
id arxiv_https___arxiv_org_abs_2409_06509
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Aligning Machine and Human Visual Representations across Abstraction Levels
Muttenthaler, Lukas
Greff, Klaus
Born, Frieda
Spitzer, Bernhard
Kornblith, Simon
Mozer, Michael C.
Müller, Klaus-Robert
Unterthiner, Thomas
Lampinen, Andrew K.
Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Deep neural networks have achieved success across a wide range of applications, including as models of human behavior and neural representations in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-aligned behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via finetuning. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgments and more practically useful, thus paving the way toward more robust, interpretable, and human-aligned artificial intelligence systems.
title Aligning Machine and Human Visual Representations across Abstraction Levels
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2409.06509