Saved in:
| Main Authors: | Kartik, Manglam, Shah, Neel Tushar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.24753 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
by: Golovanevsky, Michal, et al.
Published: (2025)
by: Golovanevsky, Michal, et al.
Published: (2025)
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
by: Liu, Che, et al.
Published: (2023)
by: Liu, Che, et al.
Published: (2023)
From Logits to Hierarchies: Hierarchical Clustering made Simple
by: Palumbo, Emanuele, et al.
Published: (2024)
by: Palumbo, Emanuele, et al.
Published: (2024)
From Edges to Depth: Probing the Spatial Hierarchy in Vision Transformers
by: Sanghavi, Jainum
Published: (2026)
by: Sanghavi, Jainum
Published: (2026)
Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)
by: Jangra, Kartik, et al.
Published: (2025)
MrSARP: A Hierarchical Deep Generative Prior for SAR Image Super-resolution
by: Agarwal, Tushar, et al.
Published: (2022)
by: Agarwal, Tushar, et al.
Published: (2022)
How Visual Representations Map to Language Feature Space in Multimodal LLMs
by: Venhoff, Constantin, et al.
Published: (2025)
by: Venhoff, Constantin, et al.
Published: (2025)
Probing Visual Language Priors in VLMs
by: Luo, Tiange, et al.
Published: (2024)
by: Luo, Tiange, et al.
Published: (2024)
Simple Vision-Language Math Reasoning via Rendered Text
by: Skripkin, Matvey, et al.
Published: (2025)
by: Skripkin, Matvey, et al.
Published: (2025)
SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read
by: Peng, Yibo, et al.
Published: (2026)
by: Peng, Yibo, et al.
Published: (2026)
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
by: Yoshikawa, Daiki, et al.
Published: (2025)
by: Yoshikawa, Daiki, et al.
Published: (2025)
Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos
by: Biswas, Dipayan, et al.
Published: (2025)
by: Biswas, Dipayan, et al.
Published: (2025)
Morphology-Aware KOA Classification: Integrating Graph Priors with Vision Models
by: Tliba, Marouane, et al.
Published: (2025)
by: Tliba, Marouane, et al.
Published: (2025)
C$^{2}$INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention
by: Li, Xiaohe, et al.
Published: (2024)
by: Li, Xiaohe, et al.
Published: (2024)
Evaluating Precise Geolocation Inference Capabilities of Vision Language Models
by: Jay, Neel, et al.
Published: (2025)
by: Jay, Neel, et al.
Published: (2025)
Transfer Learning with Point Transformers
by: Gupta, Kartik, et al.
Published: (2024)
by: Gupta, Kartik, et al.
Published: (2024)
DRoP: Distributionally Robust Data Pruning
by: Vysogorets, Artem, et al.
Published: (2024)
by: Vysogorets, Artem, et al.
Published: (2024)
On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
by: Neuhaus, Yannic, et al.
Published: (2026)
by: Neuhaus, Yannic, et al.
Published: (2026)
ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification
by: Sajedi, Ahmad, et al.
Published: (2024)
by: Sajedi, Ahmad, et al.
Published: (2024)
Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval
by: Lourenço, Vítor N., et al.
Published: (2019)
by: Lourenço, Vítor N., et al.
Published: (2019)
Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions
by: Bao, Wenxuan, et al.
Published: (2025)
by: Bao, Wenxuan, et al.
Published: (2025)
Hierarchy-Guided Multimodal Representation Learning for Taxonomic Inference
by: Ahmed, Sk Miraj, et al.
Published: (2026)
by: Ahmed, Sk Miraj, et al.
Published: (2026)
Personalized Vision via Visual In-Context Learning
by: Jiang, Yuxin, et al.
Published: (2025)
by: Jiang, Yuxin, et al.
Published: (2025)
ReGuidance: A Simple Diffusion Wrapper for Boosting Sample Quality on Hard Inverse Problems
by: Karan, Aayush, et al.
Published: (2025)
by: Karan, Aayush, et al.
Published: (2025)
Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images
by: Liu, Che, et al.
Published: (2023)
by: Liu, Che, et al.
Published: (2023)
Visualizing the loss landscape of Self-supervised Vision Transformer
by: Lee, Youngwan, et al.
Published: (2024)
by: Lee, Youngwan, et al.
Published: (2024)
Attribute-based Visual Reprogramming for Vision-Language Models
by: Cai, Chengyi, et al.
Published: (2025)
by: Cai, Chengyi, et al.
Published: (2025)
Robust Visual Representation Learning with Multi-modal Prior Knowledge for Image Classification Under Distribution Shift
by: Zhou, Hongkuan, et al.
Published: (2024)
by: Zhou, Hongkuan, et al.
Published: (2024)
CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
by: Liu, Disheng, et al.
Published: (2025)
by: Liu, Disheng, et al.
Published: (2025)
Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing
by: Shu, Yan, et al.
Published: (2024)
by: Shu, Yan, et al.
Published: (2024)
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
by: Li, Yanghao, et al.
Published: (2025)
by: Li, Yanghao, et al.
Published: (2025)
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs
by: Zheng, Naishan, et al.
Published: (2025)
by: Zheng, Naishan, et al.
Published: (2025)
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
by: Hsu, Chih-Chung, et al.
Published: (2026)
by: Hsu, Chih-Chung, et al.
Published: (2026)
Vision Verification Enhanced Fusion of VLMs for Efficient Visual Reasoning
by: Tekin, Selim Furkan, et al.
Published: (2026)
by: Tekin, Selim Furkan, et al.
Published: (2026)
Towards Interpreting Visual Information Processing in Vision-Language Models
by: Neo, Clement, et al.
Published: (2024)
by: Neo, Clement, et al.
Published: (2024)
HEIR: Learning Graph-Based Motion Hierarchies
by: Zheng, Cheng, et al.
Published: (2025)
by: Zheng, Cheng, et al.
Published: (2025)
Neural Prior Estimation: Learning Class Priors from Latent Representations
by: Yavari, Masoud, et al.
Published: (2026)
by: Yavari, Masoud, et al.
Published: (2026)
Recursive Neural Programs: Variational Learning of Image Grammars and Part-Whole Hierarchies
by: Fisher, Ares, et al.
Published: (2022)
by: Fisher, Ares, et al.
Published: (2022)
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
by: Cagnetta, Francesco, et al.
Published: (2023)
by: Cagnetta, Francesco, et al.
Published: (2023)
Hierarchy-Consistent Learning and Adaptive Loss Balancing for Hierarchical Multi-Label Classification
by: Jiang, Ruobing, et al.
Published: (2025)
by: Jiang, Ruobing, et al.
Published: (2025)
Similar Items
-
Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
by: Golovanevsky, Michal, et al.
Published: (2025) -
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
by: Liu, Che, et al.
Published: (2023) -
From Logits to Hierarchies: Hierarchical Clustering made Simple
by: Palumbo, Emanuele, et al.
Published: (2024) -
From Edges to Depth: Probing the Spatial Hierarchy in Vision Transformers
by: Sanghavi, Jainum
Published: (2026) -
Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)