:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Jianshu, Li, Yijiang, Chen, Huifeixin, Lu, Haoran, Xue, Letian, Wang, Bingyang, Liu, Han
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.23898
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
by: Pan, Zhenyu, et al.
Published: (2025)

FairReason: Balancing Reasoning and Social Bias in MLLMs
by: Pan, Zhenyu, et al.
Published: (2025)

Vision Language Models Know Law of Conservation without Understanding More-or-Less
by: Luo, Dezhi, et al.
Published: (2024)

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
by: Yu, Songsong, et al.
Published: (2025)

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study
by: Yang, Zhen, et al.
Published: (2026)

3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)

Vision Language Models Cannot Reason About Physical Transformation
by: Luo, Dezhi, et al.
Published: (2026)

Egocentric Bias in Vision-Language Models
by: Wang, Maijunxian, et al.
Published: (2026)

Revisiting Data Augmentation in Deep Reinforcement Learning
by: Hu, Jianshu, et al.
Published: (2024)

Vision Language Models See What You Want but not What You See
by: Gao, Qingying, et al.
Published: (2024)

Unified Multimodal Understanding via Byte-Pair Visual Encoding
by: Zhang, Wanpeng, et al.
Published: (2025)

Core Knowledge Deficits in Multi-Modal Language Models
by: Li, Yijiang, et al.
Published: (2024)

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
by: Liu, Haoyang, et al.
Published: (2025)

How Do VLAs Effectively Inherit from VLMs?
by: Zhang, Chuheng, et al.
Published: (2025)

Probing Mechanical Reasoning in Large Vision Language Models
by: Sun, Haoran, et al.
Published: (2024)

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration
by: She, Jianshu
Published: (2026)

Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
by: Li, Chenjun
Published: (2026)

See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay
by: Baghel, Ashish, et al.
Published: (2026)

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
by: Liu, Fuxiao, et al.
Published: (2023)

AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
by: Pan, Zhenyu, et al.
Published: (2025)

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
by: Ma, Jiefeng, et al.
Published: (2024)

The Philosophical Foundations of Growing AI Like A Child
by: Luo, Dezhi, et al.
Published: (2025)

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
by: Lu, Haoran, et al.
Published: (2026)

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning
by: Wang, Yi, et al.
Published: (2026)

Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
by: Chen, Tianyu, et al.
Published: (2025)

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
by: Wu, Xuansheng, et al.
Published: (2023)

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
by: Wang, Dianyi, et al.
Published: (2025)

COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control
by: Xia, Canming, et al.
Published: (2026)

SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
by: Zhang, Yuyou, et al.
Published: (2025)

Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach
by: Liu, Shuqi, et al.
Published: (2025)

ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
by: Yang, Letian, et al.
Published: (2026)

EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning
by: Liu, Jiawei, et al.
Published: (2025)

MindCube: Spatial Mental Modeling from Limited Views
by: Wang, Qineng, et al.
Published: (2025)

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
by: Liu, Yue, et al.
Published: (2025)

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024)

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
by: Lu, Shuai, et al.
Published: (2026)

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
by: Zhang, Yi, et al.
Published: (2025)