Saved in:
| Main Authors: | Sundaram, Shobhita, Fu, Stephanie, Muttenthaler, Lukas, Tamir, Netanel Y., Chai, Lucy, Kornblith, Simon, Darrell, Trevor, Isola, Phillip |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.10817 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
by: Fu, Stephanie, et al.
Published: (2023)
by: Fu, Stephanie, et al.
Published: (2023)
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024)
by: Sundaram, Shobhita, et al.
Published: (2024)
What Makes for a Good Stereoscopic Image?
by: Tamir, Netanel Y., et al.
Published: (2024)
by: Tamir, Netanel Y., et al.
Published: (2024)
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025)
by: Gupta, Sharut, et al.
Published: (2025)
Human alignment of neural network representations
by: Muttenthaler, Lukas, et al.
Published: (2022)
by: Muttenthaler, Lukas, et al.
Published: (2022)
Objective drives the consistency of representational similarity across datasets
by: Ciernik, Laure, et al.
Published: (2024)
by: Ciernik, Laure, et al.
Published: (2024)
When Does Pruning Benefit Vision Representations?
by: Cassano, Enrico, et al.
Published: (2025)
by: Cassano, Enrico, et al.
Published: (2025)
Aligning Machine and Human Visual Representations across Abstraction Levels
by: Muttenthaler, Lukas, et al.
Published: (2024)
by: Muttenthaler, Lukas, et al.
Published: (2024)
LangNav: Language as a Perceptual Representation for Navigation
by: Pan, Bowen, et al.
Published: (2023)
by: Pan, Bowen, et al.
Published: (2023)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
When Do We Not Need Larger Vision Models?
by: Shi, Baifeng, et al.
Published: (2024)
by: Shi, Baifeng, et al.
Published: (2024)
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
by: Bahng, Hyojin, et al.
Published: (2025)
by: Bahng, Hyojin, et al.
Published: (2025)
Hidden in plain sight: VLMs overlook their visual representations
by: Fu, Stephanie, et al.
Published: (2025)
by: Fu, Stephanie, et al.
Published: (2025)
The Platonic Representation Hypothesis
by: Huh, Minyoung, et al.
Published: (2024)
by: Huh, Minyoung, et al.
Published: (2024)
Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment
by: Hu, Yang, et al.
Published: (2025)
by: Hu, Yang, et al.
Published: (2025)
A Vision Check-up for Language Models
by: Sharma, Pratyusha, et al.
Published: (2024)
by: Sharma, Pratyusha, et al.
Published: (2024)
Learning Vision from Models Rivals Learning Vision from Data
by: Tian, Yonglong, et al.
Published: (2023)
by: Tian, Yonglong, et al.
Published: (2023)
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint
by: Lee, Heekyung, et al.
Published: (2025)
by: Lee, Heekyung, et al.
Published: (2025)
Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)
by: Xie, Ji, et al.
Published: (2025)
Words That Make Language Models Perceive
by: Wang, Sophie L., et al.
Published: (2025)
by: Wang, Sophie L., et al.
Published: (2025)
Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings
by: Wang, Tongzhou, et al.
Published: (2022)
by: Wang, Tongzhou, et al.
Published: (2022)
Do Vision Transformers See Like Humans? Evaluating their Perceptual Alignment
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
REOrdering Patches Improves Vision Models
by: Kutscher, Declan, et al.
Published: (2025)
by: Kutscher, Declan, et al.
Published: (2025)
Adaptive Length Image Tokenization via Recurrent Allocation
by: Duggal, Shivam, et al.
Published: (2024)
by: Duggal, Shivam, et al.
Published: (2024)
Activation Reward Models for Few-Shot Model Alignment
by: Chai, Tianning, et al.
Published: (2025)
by: Chai, Tianning, et al.
Published: (2025)
MAPS: Masked Attribution-based Probing of Strategies- A computational framework to align human and model explanations
by: Muzellec, Sabine, et al.
Published: (2025)
by: Muzellec, Sabine, et al.
Published: (2025)
Beyond the final layer: Attentive multilayer fusion for vision transformers
by: Ciernik, Laure, et al.
Published: (2026)
by: Ciernik, Laure, et al.
Published: (2026)
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
by: Huang, Brandon, et al.
Published: (2025)
by: Huang, Brandon, et al.
Published: (2025)
Dimensions underlying the representational alignment of deep neural networks with humans
by: Mahner, Florian P., et al.
Published: (2024)
by: Mahner, Florian P., et al.
Published: (2024)
Context Sensitivity Improves Human-Machine Visual Alignment
by: Born, Frieda, et al.
Published: (2026)
by: Born, Frieda, et al.
Published: (2026)
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)
by: Qin, Yiming, et al.
Published: (2025)
Discovering Divergent Representations between Text-to-Image Models
by: Dunlap, Lisa, et al.
Published: (2025)
by: Dunlap, Lisa, et al.
Published: (2025)
Separating Knowledge and Perception with Procedural Data
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)
by: Mitra, Chancharik, et al.
Published: (2024)
The Indra Representation Hypothesis for Multimodal Alignment
by: Lu, Jianglin, et al.
Published: (2026)
by: Lu, Jianglin, et al.
Published: (2026)
Segment Anything without Supervision
by: Wang, XuDong, et al.
Published: (2024)
by: Wang, XuDong, et al.
Published: (2024)
Single-pass Adaptive Image Tokenization for Minimum Program Search
by: Duggal, Shivam, et al.
Published: (2025)
by: Duggal, Shivam, et al.
Published: (2025)
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
by: Huh, Minyoung, et al.
Published: (2024)
by: Huh, Minyoung, et al.
Published: (2024)
Set Learning for Accurate and Calibrated Models
by: Muttenthaler, Lukas, et al.
Published: (2023)
by: Muttenthaler, Lukas, et al.
Published: (2023)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)
by: Lian, Long, et al.
Published: (2023)
Similar Items
-
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
by: Fu, Stephanie, et al.
Published: (2023) -
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024) -
What Makes for a Good Stereoscopic Image?
by: Tamir, Netanel Y., et al.
Published: (2024) -
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025) -
Human alignment of neural network representations
by: Muttenthaler, Lukas, et al.
Published: (2022)