Saved in:
| Main Authors: | Liang, Sichu, Zhu, Hongyu, Wang, Wenwen, Zhou, Deyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04355 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
by: Liang, Sichu, et al.
Published: (2025)
by: Liang, Sichu, et al.
Published: (2025)
Revisiting Data Auditing in Large Vision-Language Models
by: Zhu, Hongyu, et al.
Published: (2025)
by: Zhu, Hongyu, et al.
Published: (2025)
When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges
by: Liang, Sichu, et al.
Published: (2026)
by: Liang, Sichu, et al.
Published: (2026)
VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)
by: Zhao, Hongbo, et al.
Published: (2025)
CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems
by: Ma, Ziyang, et al.
Published: (2026)
by: Ma, Ziyang, et al.
Published: (2026)
Text Prompt Injection of Vision Language Models
by: Zhu, Ruizhe
Published: (2025)
by: Zhu, Ruizhe
Published: (2025)
Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?
by: Zhao, Bowen, et al.
Published: (2024)
by: Zhao, Bowen, et al.
Published: (2024)
Vision Language Models Cannot Plan, but Can They Formalize?
by: He, Muyu, et al.
Published: (2025)
by: He, Muyu, et al.
Published: (2025)
Dynamic Token Reweighting for Robust Vision-Language Models
by: Jiang, Tanqiu, et al.
Published: (2025)
by: Jiang, Tanqiu, et al.
Published: (2025)
Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models
by: Zhu, Yingjie, et al.
Published: (2025)
by: Zhu, Yingjie, et al.
Published: (2025)
Evading Data Provenance in Deep Neural Networks
by: Zhu, Hongyu, et al.
Published: (2025)
by: Zhu, Hongyu, et al.
Published: (2025)
Can Language Models Replace Programmers for Coding? REPOCOD Says 'Not Yet'
by: Liang, Shanchao, et al.
Published: (2024)
by: Liang, Shanchao, et al.
Published: (2024)
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)
by: Zhang, Yanan, et al.
Published: (2024)
SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?
by: Wasi, Azmine Toushik, et al.
Published: (2026)
by: Wasi, Azmine Toushik, et al.
Published: (2026)
Stable Language Guidance for Vision-Language-Action Models
by: Zhan, Zhihao, et al.
Published: (2026)
by: Zhan, Zhihao, et al.
Published: (2026)
Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)
by: Wang, Xinyu, et al.
Published: (2025)
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
by: Liang, Qiao, et al.
Published: (2025)
by: Liang, Qiao, et al.
Published: (2025)
Can We Predict Performance of Large Models across Vision-Language Tasks?
by: Zhao, Qinyu, et al.
Published: (2024)
by: Zhao, Qinyu, et al.
Published: (2024)
Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models
by: Steinberg, Jonathan, et al.
Published: (2026)
by: Steinberg, Jonathan, et al.
Published: (2026)
Can Vision-Language Models Solve the Shell Game?
by: Liu, Tiedong, et al.
Published: (2026)
by: Liu, Tiedong, et al.
Published: (2026)
Can Vision-Language Models Evaluate Handwritten Math?
by: Nath, Oikantik, et al.
Published: (2025)
by: Nath, Oikantik, et al.
Published: (2025)
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)
by: Li, Hongxing, et al.
Published: (2025)
Investigating Spatial Attention Bias in Vision-Language Models
by: Chaudhary, Aryan, et al.
Published: (2025)
by: Chaudhary, Aryan, et al.
Published: (2025)
Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
Exploring Spatial Schema Intuitions in Large Language and Vision Models
by: Wicke, Philipp, et al.
Published: (2024)
by: Wicke, Philipp, et al.
Published: (2024)
ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
by: Ling, Zijian, et al.
Published: (2025)
by: Ling, Zijian, et al.
Published: (2025)
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
by: Wang, Zekun, et al.
Published: (2023)
by: Wang, Zekun, et al.
Published: (2023)
Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)
by: Cho, Hyundong, et al.
Published: (2025)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)
by: Jia, Mengdi, et al.
Published: (2025)
Efficient and Effective Model Extraction
by: Zhu, Hongyu, et al.
Published: (2024)
by: Zhu, Hongyu, et al.
Published: (2024)
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
by: Shao, Zhenwei, et al.
Published: (2025)
by: Shao, Zhenwei, et al.
Published: (2025)
Vision-Language Models Can Self-Improve Reasoning via Reflection
by: Cheng, Kanzhi, et al.
Published: (2024)
by: Cheng, Kanzhi, et al.
Published: (2024)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)
by: Stogiannidis, Ilias, et al.
Published: (2025)
Vision Language Models Are Not (Yet) Spelling Correctors
by: Liang, Junhong, et al.
Published: (2025)
by: Liang, Junhong, et al.
Published: (2025)
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
by: Chen, Boyuan, et al.
Published: (2024)
by: Chen, Boyuan, et al.
Published: (2024)
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
by: Li, Chengzu, et al.
Published: (2024)
by: Li, Chengzu, et al.
Published: (2024)
Can Transformers Learn $n$-gram Language Models?
by: Svete, Anej, et al.
Published: (2024)
by: Svete, Anej, et al.
Published: (2024)
ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
by: Li, Dingming, et al.
Published: (2025)
by: Li, Dingming, et al.
Published: (2025)
Similar Items
-
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
by: Liang, Sichu, et al.
Published: (2025) -
Revisiting Data Auditing in Large Vision-Language Models
by: Zhu, Hongyu, et al.
Published: (2025) -
When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges
by: Liang, Sichu, et al.
Published: (2026) -
VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025) -
CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems
by: Ma, Ziyang, et al.
Published: (2026)