:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Sichu, Zhu, Hongyu, Wang, Wenwen, Zhou, Deyu
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.04355
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
by: Liang, Sichu, et al.
Published: (2025)

Revisiting Data Auditing in Large Vision-Language Models
by: Zhu, Hongyu, et al.
Published: (2025)

When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges
by: Liang, Sichu, et al.
Published: (2026)

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)

CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems
by: Ma, Ziyang, et al.
Published: (2026)

Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)

Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?
by: Zhao, Bowen, et al.
Published: (2024)

Vision Language Models Cannot Plan, but Can They Formalize?
by: He, Muyu, et al.
Published: (2025)

Dynamic Token Reweighting for Robust Vision-Language Models
by: Jiang, Tanqiu, et al.
Published: (2025)

Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models
by: Zhu, Yingjie, et al.
Published: (2025)

Evading Data Provenance in Deep Neural Networks
by: Zhu, Hongyu, et al.
Published: (2025)

Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'
by: Liang, Shanchao, et al.
Published: (2024)

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?
by: Wasi, Azmine Toushik, et al.
Published: (2026)

Stable Language Guidance for Vision-Language-Action Models
by: Zhan, Zhihao, et al.
Published: (2026)

Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
by: Liang, Qiao, et al.
Published: (2025)

Can We Predict Performance of Large Models across Vision-Language Tasks?
by: Zhao, Qinyu, et al.
Published: (2024)

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
by: Steinberg, Jonathan, et al.
Published: (2026)

Can Vision-Language Models Solve the Shell Game?
by: Liu, Tiedong, et al.
Published: (2026)

Can Vision-Language Models Evaluate Handwritten Math?
by: Nath, Oikantik, et al.
Published: (2025)

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)

Investigating Spatial Attention Bias in Vision-Language Models
by: Chaudhary, Aryan, et al.
Published: (2025)

Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)

Exploring Spatial Schema Intuitions in Large Language and Vision Models
by: Wicke, Philipp, et al.
Published: (2024)

ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
by: Ling, Zijian, et al.
Published: (2025)

SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
by: Wang, Zekun, et al.
Published: (2023)

Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)

OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)

Efficient and Effective Model Extraction
by: Zhu, Hongyu, et al.
Published: (2024)

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)

Vision-Language Models Can Self-Improve Reasoning via Reflection
by: Cheng, Kanzhi, et al.
Published: (2024)

Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)

Vision Language Models Are Not (Yet) Spelling Correctors
by: Liang, Junhong, et al.
Published: (2025)

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
by: Ma, Ziyang, et al.
Published: (2025)

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
by: Chen, Boyuan, et al.
Published: (2024)

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
by: Li, Chengzu, et al.
Published: (2024)

Can Transformers Learn $n$-gram Language Models?
by: Svete, Anej, et al.
Published: (2024)

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
by: Li, Dingming, et al.
Published: (2025)