Saved in:
| Main Authors: | Mehta, Yash, Bonner, Michael F. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05556 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Universal dimensions of visual representation
by: Chen, Zirui, et al.
Published: (2024)
by: Chen, Zirui, et al.
Published: (2024)
Efficient coding along the visual hierarchy
by: Passi, Ananya, et al.
Published: (2026)
by: Passi, Ananya, et al.
Published: (2026)
Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex
by: Thorat, Sushrut, et al.
Published: (2025)
by: Thorat, Sushrut, et al.
Published: (2025)
Self-supervised video pretraining yields robust and more human-aligned visual representations
by: Parthasarathy, Nikhil, et al.
Published: (2022)
by: Parthasarathy, Nikhil, et al.
Published: (2022)
Evaluating alignment between humans and neural network representations in image-based learning tasks
by: Demircan, Can, et al.
Published: (2023)
by: Demircan, Can, et al.
Published: (2023)
Visual representations in the human brain are aligned with large language models
by: Doerig, Adrien, et al.
Published: (2022)
by: Doerig, Adrien, et al.
Published: (2022)
Do text-free diffusion models learn discriminative visual representations?
by: Mukhopadhyay, Soumik, et al.
Published: (2023)
by: Mukhopadhyay, Soumik, et al.
Published: (2023)
Can multimodal representation learning by alignment preserve modality-specific information?
by: Thoreau, Romain, et al.
Published: (2025)
by: Thoreau, Romain, et al.
Published: (2025)
Vision Transformer attention alignment with human visual perception in aesthetic object evaluation
by: Carrasco, Miguel, et al.
Published: (2025)
by: Carrasco, Miguel, et al.
Published: (2025)
Dimensions underlying the representational alignment of deep neural networks with humans
by: Mahner, Florian P., et al.
Published: (2024)
by: Mahner, Florian P., et al.
Published: (2024)
Characterizing the visual representation of objects from the child's view
by: Yang, Jane, et al.
Published: (2026)
by: Yang, Jane, et al.
Published: (2026)
Dilated Convolution with Learnable Spacings makes visual models more aligned with humans: a Grad-CAM study
by: Chamas, Rabih, et al.
Published: (2024)
by: Chamas, Rabih, et al.
Published: (2024)
Do computer vision foundation models learn the low-level characteristics of the human visual system?
by: Cai, Yancheng, et al.
Published: (2025)
by: Cai, Yancheng, et al.
Published: (2025)
Closing the gap in multimodal medical representation alignment
by: Grassucci, Eleonora, et al.
Published: (2026)
by: Grassucci, Eleonora, et al.
Published: (2026)
Towards aligned body representations in vision models
by: Gizdov, Andrey, et al.
Published: (2025)
by: Gizdov, Andrey, et al.
Published: (2025)
Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis
by: Englebert, Alexandre, et al.
Published: (2024)
by: Englebert, Alexandre, et al.
Published: (2024)
Assessing the alignment between infants' visual and linguistic experience using multimodal language models
by: Tan, Alvin Wei Ming, et al.
Published: (2025)
by: Tan, Alvin Wei Ming, et al.
Published: (2025)
A transition towards virtual representations of visual scenes
by: Pereira, Américo, et al.
Published: (2024)
by: Pereira, Américo, et al.
Published: (2024)
MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization
by: Pappa, Massimiliano, et al.
Published: (2024)
by: Pappa, Massimiliano, et al.
Published: (2024)
AGA: An adaptive group alignment framework for structured medical cross-modal representation learning
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
Sparse components distinguish visual pathways & their alignment to neural networks
by: Marvi, Ammar I, et al.
Published: (2025)
by: Marvi, Ammar I, et al.
Published: (2025)
Extending global-local view alignment for self-supervised learning with remote sensing imagery
by: Wanyan, Xinye, et al.
Published: (2023)
by: Wanyan, Xinye, et al.
Published: (2023)
CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation
by: Wei, Chen, et al.
Published: (2024)
by: Wei, Chen, et al.
Published: (2024)
Learning complete and explainable visual representations from itemized text supervision
by: Lyu, Yiwei, et al.
Published: (2025)
by: Lyu, Yiwei, et al.
Published: (2025)
A Self supervised learning framework for imbalanced medical imaging datasets
by: Sharma, Yash Kumar, et al.
Published: (2026)
by: Sharma, Yash Kumar, et al.
Published: (2026)
Generating visual explanations from deep networks using implicit neural representations
by: Byra, Michal, et al.
Published: (2025)
by: Byra, Michal, et al.
Published: (2025)
On the dynamic evolution of CLIP texture-shape bias and its relationship to human alignment and model robustness
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
by: Xie, Yudi, et al.
Published: (2024)
by: Xie, Yudi, et al.
Published: (2024)
Human alignment of neural network representations
by: Muttenthaler, Lukas, et al.
Published: (2022)
by: Muttenthaler, Lukas, et al.
Published: (2022)
Self-supervised structured object representation learning
by: Hadjerci, Oussama, et al.
Published: (2025)
by: Hadjerci, Oussama, et al.
Published: (2025)
Deep video representation learning: a survey
by: Ravanbakhsh, Elham, et al.
Published: (2024)
by: Ravanbakhsh, Elham, et al.
Published: (2024)
Pathological Truth Bias in Vision-Language Models
by: Thube, Yash
Published: (2025)
by: Thube, Yash
Published: (2025)
A deep multiple instance learning approach based on coarse labels for high-resolution land-cover mapping
by: Perantoni, Gianmarco, et al.
Published: (2025)
by: Perantoni, Gianmarco, et al.
Published: (2025)
Quantifying the human visual exposome with vision language models
by: Rominger, Christian, et al.
Published: (2026)
by: Rominger, Christian, et al.
Published: (2026)
UniAR: A Unified model for predicting human Attention and Responses on visual content
by: Li, Peizhao, et al.
Published: (2023)
by: Li, Peizhao, et al.
Published: (2023)
Affine transformation estimation improves visual self-supervised learning
by: Torpey, David, et al.
Published: (2024)
by: Torpey, David, et al.
Published: (2024)
Improving generalization by mimicking the human visual diet
by: Madan, Spandan, et al.
Published: (2022)
by: Madan, Spandan, et al.
Published: (2022)
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
by: Hein, Dennis, et al.
Published: (2024)
by: Hein, Dennis, et al.
Published: (2024)
Generalizability analysis of deep learning predictions of human brain responses to augmented and semantically novel visual stimuli
by: Piskovskyi, Valentyn, et al.
Published: (2024)
by: Piskovskyi, Valentyn, et al.
Published: (2024)
Noise-aware few-shot learning through bi-directional multi-view prompt alignment
by: Niu, Lu, et al.
Published: (2026)
by: Niu, Lu, et al.
Published: (2026)
Similar Items
-
Universal dimensions of visual representation
by: Chen, Zirui, et al.
Published: (2024) -
Efficient coding along the visual hierarchy
by: Passi, Ananya, et al.
Published: (2026) -
Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex
by: Thorat, Sushrut, et al.
Published: (2025) -
Self-supervised video pretraining yields robust and more human-aligned visual representations
by: Parthasarathy, Nikhil, et al.
Published: (2022) -
Evaluating alignment between humans and neural network representations in image-based learning tasks
by: Demircan, Can, et al.
Published: (2023)