Saved in:
| Main Authors: | You, Weiqiu, Goldberg, Cassandra, Madani, Amin, Hashimoto, Daniel A., Wong, Eric |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.22156 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
by: Chen, Zining, et al.
Published: (2024)
by: Chen, Zining, et al.
Published: (2024)
A Vision Check-up for Language Models
by: Sharma, Pratyusha, et al.
Published: (2024)
by: Sharma, Pratyusha, et al.
Published: (2024)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
by: Fares, Samar, et al.
Published: (2024)
by: Fares, Samar, et al.
Published: (2024)
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
DenseTRF: Texture-Aware Unsupervised Representation Adaptation for Surgical Scene Dense Prediction
by: Liao, Guiqiu, et al.
Published: (2026)
by: Liao, Guiqiu, et al.
Published: (2026)
Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure
by: Chakraborty, Trishna, et al.
Published: (2026)
by: Chakraborty, Trishna, et al.
Published: (2026)
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
by: Guo, Yichen, et al.
Published: (2025)
by: Guo, Yichen, et al.
Published: (2025)
The Dual Mechanisms of Spatial Reasoning in Vision-Language Models
by: Cui, Kelly, et al.
Published: (2026)
by: Cui, Kelly, et al.
Published: (2026)
FORLA: Federated Object-centric Representation Learning with Slot Attention
by: Liao, Guiqiu, et al.
Published: (2025)
by: Liao, Guiqiu, et al.
Published: (2025)
Slot-BERT: Self-supervised Object Discovery in Surgical Video
by: Liao, Guiqiu, et al.
Published: (2025)
by: Liao, Guiqiu, et al.
Published: (2025)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
EasyARC: Evaluating Vision Language Models on True Visual Reasoning
by: Unsal, Mert, et al.
Published: (2025)
by: Unsal, Mert, et al.
Published: (2025)
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
Improved Alignment of Modalities in Large Vision Language Models
by: Jangra, Kartik, et al.
Published: (2025)
by: Jangra, Kartik, et al.
Published: (2025)
Detecting and Preventing Hallucinations in Large Vision Language Models
by: Gunjal, Anisha, et al.
Published: (2023)
by: Gunjal, Anisha, et al.
Published: (2023)
Surgical Vision World Model
by: Koju, Saurabh, et al.
Published: (2025)
by: Koju, Saurabh, et al.
Published: (2025)
Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models
by: Tang, Kai, et al.
Published: (2025)
by: Tang, Kai, et al.
Published: (2025)
CFM: Language-aligned Concept Foundation Model for Vision
by: Wittenmayer, Kai, et al.
Published: (2026)
by: Wittenmayer, Kai, et al.
Published: (2026)
Vision-Language Models Encode Clinical Guidelines for Concept-Based Medical Reasoning
by: Harmanani, Mohamed, et al.
Published: (2026)
by: Harmanani, Mohamed, et al.
Published: (2026)
Tactile Modality Fusion for Vision-Language-Action Models
by: Morissette, Charlotte, et al.
Published: (2026)
by: Morissette, Charlotte, et al.
Published: (2026)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis
by: Trinci, Tomaso, et al.
Published: (2026)
by: Trinci, Tomaso, et al.
Published: (2026)
HoneyBee: Data Recipes for Vision-Language Reasoners
by: Bansal, Hritik, et al.
Published: (2025)
by: Bansal, Hritik, et al.
Published: (2025)
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger
by: Yang, Qi, et al.
Published: (2025)
by: Yang, Qi, et al.
Published: (2025)
SpatiaLQA: A Benchmark for Evaluating Spatial Logical Reasoning in Vision-Language Models
by: Xie, Yuechen, et al.
Published: (2026)
by: Xie, Yuechen, et al.
Published: (2026)
Sherlock: Self-Correcting Reasoning in Vision-Language Models
by: Ding, Yi, et al.
Published: (2025)
by: Ding, Yi, et al.
Published: (2025)
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
by: Wang, Sudong, et al.
Published: (2025)
by: Wang, Sudong, et al.
Published: (2025)
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models
by: Alam, Md Tanvirul
Published: (2026)
by: Alam, Md Tanvirul
Published: (2026)
FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models
by: Huang, Chenyu, et al.
Published: (2026)
by: Huang, Chenyu, et al.
Published: (2026)
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
by: Ding, Yi, et al.
Published: (2024)
by: Ding, Yi, et al.
Published: (2024)
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
by: Wang, Junchi, et al.
Published: (2024)
by: Wang, Junchi, et al.
Published: (2024)
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
by: Tao, Keda, et al.
Published: (2024)
by: Tao, Keda, et al.
Published: (2024)
Simple Vision-Language Math Reasoning via Rendered Text
by: Skripkin, Matvey, et al.
Published: (2025)
by: Skripkin, Matvey, et al.
Published: (2025)
HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models
by: Yu, Le, et al.
Published: (2025)
by: Yu, Le, et al.
Published: (2025)
NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders
by: Dineva, Katarina Trojachanec, et al.
Published: (2026)
by: Dineva, Katarina Trojachanec, et al.
Published: (2026)
OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
by: Meng, Xiang, et al.
Published: (2024)
by: Meng, Xiang, et al.
Published: (2024)
Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models
by: Huang, Xin, et al.
Published: (2025)
by: Huang, Xin, et al.
Published: (2025)
Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information
by: Chu, Xu, et al.
Published: (2025)
by: Chu, Xu, et al.
Published: (2025)
Similar Items
-
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
by: Chen, Zining, et al.
Published: (2024) -
A Vision Check-up for Language Models
by: Sharma, Pratyusha, et al.
Published: (2024) -
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
by: Fares, Samar, et al.
Published: (2024) -
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
by: Li, Yue, et al.
Published: (2025) -
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)