Saved in:
| Main Authors: | Tian, Yonglong, Fan, Lijie, Chen, Kaifeng, Katabi, Dina, Krishnan, Dilip, Isola, Phillip |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.17742 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024)
by: Sundaram, Shobhita, et al.
Published: (2024)
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024)
by: Yang, Jiawei, et al.
Published: (2024)
Reparo: Loss-Resilient Generative Codec for Video Conferencing
by: Li, Tianhong, et al.
Published: (2023)
by: Li, Tianhong, et al.
Published: (2023)
Return of Unconditional Generation: A Self-supervised Representation Generation Method
by: Li, Tianhong, et al.
Published: (2023)
by: Li, Tianhong, et al.
Published: (2023)
Latent Denoising Makes Good Tokenizers
by: Yang, Jiawei, et al.
Published: (2025)
by: Yang, Jiawei, et al.
Published: (2025)
Vision-Language Models Do Not Understand Negation
by: Alhamoud, Kumail, et al.
Published: (2025)
by: Alhamoud, Kumail, et al.
Published: (2025)
STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision
by: Lui, Hin Wai, et al.
Published: (2025)
by: Lui, Hin Wai, et al.
Published: (2025)
A Vision Check-up for Language Models
by: Sharma, Pratyusha, et al.
Published: (2024)
by: Sharma, Pratyusha, et al.
Published: (2024)
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
by: Yu, Seonghoon, et al.
Published: (2025)
by: Yu, Seonghoon, et al.
Published: (2025)
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
by: Bahng, Hyojin, et al.
Published: (2025)
by: Bahng, Hyojin, et al.
Published: (2025)
Infusing fine-grained visual knowledge to Vision-Language Models
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2025)
by: Ypsilantis, Nikolaos-Antonios, et al.
Published: (2025)
Backdooring Vision-Language Models with Out-Of-Distribution Data
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
Optimizing Active Learning in Vision-Language Models via Parameter-Efficient Uncertainty Calibration
by: Narayanan, Athmanarayanan Lakshmi, et al.
Published: (2025)
by: Narayanan, Athmanarayanan Lakshmi, et al.
Published: (2025)
When Does Perceptual Alignment Benefit Vision Representations?
by: Sundaram, Shobhita, et al.
Published: (2024)
by: Sundaram, Shobhita, et al.
Published: (2024)
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
by: Gupta, Sharut, et al.
Published: (2025)
by: Gupta, Sharut, et al.
Published: (2025)
Words That Make Language Models Perceive
by: Wang, Sophie L., et al.
Published: (2025)
by: Wang, Sophie L., et al.
Published: (2025)
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
by: Fan, Lijie, et al.
Published: (2024)
by: Fan, Lijie, et al.
Published: (2024)
Cross-Modal Attention Analysis and Optimization in Vision-Language Models: A Study on Visual Reliability
by: Zhou, Lijie
Published: (2026)
by: Zhou, Lijie
Published: (2026)
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
by: Fu, Stephanie, et al.
Published: (2023)
by: Fu, Stephanie, et al.
Published: (2023)
DiveUp: Learning Feature Upsampling from Diverse Vision Foundation Models
by: Liu, Xiaoqiong, et al.
Published: (2026)
by: Liu, Xiaoqiong, et al.
Published: (2026)
PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks
by: Cui, Cheng, et al.
Published: (2026)
by: Cui, Cheng, et al.
Published: (2026)
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
by: Koo, Juil, et al.
Published: (2025)
by: Koo, Juil, et al.
Published: (2025)
Learning Visual Grounding from Generative Vision and Language Model
by: Wang, Shijie, et al.
Published: (2024)
by: Wang, Shijie, et al.
Published: (2024)
Separating Knowledge and Perception with Procedural Data
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)
by: Rodríguez-Muñoz, Adrián, et al.
Published: (2025)
Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease
by: Megahed, Youssef, et al.
Published: (2025)
by: Megahed, Youssef, et al.
Published: (2025)
Learning Spatial Decay for Vision Transformers
by: Mao, Yuxin, et al.
Published: (2025)
by: Mao, Yuxin, et al.
Published: (2025)
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
by: Howard, Phillip, et al.
Published: (2024)
by: Howard, Phillip, et al.
Published: (2024)
Active Learning via Vision-Language Model Adaptation with Open Data
by: Wang, Tong, et al.
Published: (2025)
by: Wang, Tong, et al.
Published: (2025)
Active Learning for Vision-Language Models
by: Safaei, Bardia, et al.
Published: (2024)
by: Safaei, Bardia, et al.
Published: (2024)
Medical Image Registration Meets Vision Foundation Model: Prototype Learning and Contour Awareness
by: Xu, Hao, et al.
Published: (2025)
by: Xu, Hao, et al.
Published: (2025)
Modular Prompt Learning Improves Vision-Language Models
by: Huang, Zhenhan, et al.
Published: (2025)
by: Huang, Zhenhan, et al.
Published: (2025)
Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights
by: Zhong, Yuan, et al.
Published: (2025)
by: Zhong, Yuan, et al.
Published: (2025)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
by: Wen, Xin, et al.
Published: (2025)
by: Wen, Xin, et al.
Published: (2025)
Cultural Counterfactuals: Evaluating Cultural Biases in Large Vision-Language Models with Counterfactual Examples
by: Howard, Phillip, et al.
Published: (2026)
by: Howard, Phillip, et al.
Published: (2026)
Cascade Prompt Learning for Vision-Language Model Adaptation
by: Wu, Ge, et al.
Published: (2024)
by: Wu, Ge, et al.
Published: (2024)
BigGait: Learning Gait Representation You Want by Large Vision Models
by: Ye, Dingqiang, et al.
Published: (2024)
by: Ye, Dingqiang, et al.
Published: (2024)
Learning Invariant Causal Mechanism from Vision-Language Models
by: Song, Zeen, et al.
Published: (2024)
by: Song, Zeen, et al.
Published: (2024)
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
Learning Where to Edit Vision Transformers
by: Yang, Yunqiao, et al.
Published: (2024)
by: Yang, Yunqiao, et al.
Published: (2024)
Active Prompt Learning in Vision Language Models
by: Bang, Jihwan, et al.
Published: (2023)
by: Bang, Jihwan, et al.
Published: (2023)
Similar Items
-
Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024) -
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024) -
Reparo: Loss-Resilient Generative Codec for Video Conferencing
by: Li, Tianhong, et al.
Published: (2023) -
Return of Unconditional Generation: A Self-supervised Representation Generation Method
by: Li, Tianhong, et al.
Published: (2023) -
Latent Denoising Makes Good Tokenizers
by: Yang, Jiawei, et al.
Published: (2025)