:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sundaram, Shobhita, Fu, Stephanie, Muttenthaler, Lukas, Tamir, Netanel Y., Chai, Lucy, Kornblith, Simon, Darrell, Trevor, Isola, Phillip
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2410.10817
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
by: Fu, Stephanie, et al.
Published: (2023)

Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024)

What Makes for a Good Stereoscopic Image?
by: Tamir, Netanel Y., et al.
Published: (2024)

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025)

Human alignment of neural network representations
by: Muttenthaler, Lukas, et al.
Published: (2022)

Objective drives the consistency of representational similarity across datasets
by: Ciernik, Laure, et al.
Published: (2024)

When Does Pruning Benefit Vision Representations?
by: Cassano, Enrico, et al.
Published: (2025)

Aligning Machine and Human Visual Representations across Abstraction Levels
by: Muttenthaler, Lukas, et al.
Published: (2024)

LangNav: Language as a Perceptual Representation for Navigation
by: Pan, Bowen, et al.
Published: (2023)

Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)

When Do We Not Need Larger Vision Models?
by: Shi, Baifeng, et al.
Published: (2024)

Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
by: Bahng, Hyojin, et al.
Published: (2025)

Hidden in plain sight: VLMs overlook their visual representations
by: Fu, Stephanie, et al.
Published: (2025)

The Platonic Representation Hypothesis
by: Huh, Minyoung, et al.
Published: (2024)

Beginning with You: Perceptual-Initialization Improves Vision-Language Representation and Alignment
by: Hu, Yang, et al.
Published: (2025)

A Vision Check-up for Language Models
by: Sharma, Pratyusha, et al.
Published: (2024)

Learning Vision from Models Rivals Learning Vision from Data
by: Tian, Yonglong, et al.
Published: (2023)

Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint
by: Lee, Heekyung, et al.
Published: (2025)

Reconstruction Alignment Improves Unified Multimodal Models
by: Xie, Ji, et al.
Published: (2025)

Words That Make Language Models Perceive
by: Wang, Sophie L., et al.
Published: (2025)

Improved Representation of Asymmetrical Distances with Interval Quasimetric Embeddings
by: Wang, Tongzhou, et al.
Published: (2022)

Do Vision Transformers See Like Humans? Evaluating their Perceptual Alignment
by: Hernández-Cámara, Pablo, et al.
Published: (2025)

REOrdering Patches Improves Vision Models
by: Kutscher, Declan, et al.
Published: (2025)

Adaptive Length Image Tokenization via Recurrent Allocation
by: Duggal, Shivam, et al.
Published: (2024)

Activation Reward Models for Few-Shot Model Alignment
by: Chai, Tianning, et al.
Published: (2025)

MAPS: Masked Attribution-based Probing of Strategies- A computational framework to align human and model explanations
by: Muzellec, Sabine, et al.
Published: (2025)

Beyond the final layer: Attentive multilayer fusion for vision transformers
by: Ciernik, Laure, et al.
Published: (2026)

DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
by: Huang, Brandon, et al.
Published: (2025)

Dimensions underlying the representational alignment of deep neural networks with humans
by: Mahner, Florian P., et al.
Published: (2024)

Context Sensitivity Improves Human-Machine Visual Alignment
by: Born, Frieda, et al.
Published: (2026)

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
by: Qin, Yiming, et al.
Published: (2025)

Discovering Divergent Representations between Text-to-Image Models
by: Dunlap, Lisa, et al.
Published: (2025)

Separating Knowledge and Perception with Procedural Data
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)

Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
by: Mitra, Chancharik, et al.
Published: (2024)

The Indra Representation Hypothesis for Multimodal Alignment
by: Lu, Jianglin, et al.
Published: (2026)

Segment Anything without Supervision
by: Wang, XuDong, et al.
Published: (2024)

Single-pass Adaptive Image Tokenization for Minimum Program Search
by: Duggal, Shivam, et al.
Published: (2025)

Training Neural Networks from Scratch with Parallel Low-Rank Adapters
by: Huh, Minyoung, et al.
Published: (2024)

Set Learning for Accurate and Calibrated Models
by: Muttenthaler, Lukas, et al.
Published: (2023)

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
by: Lian, Long, et al.
Published: (2023)