Saved in:
| Main Authors: | Zhang, Jianshu, Li, Yijiang, Chen, Huifeixin, Lu, Haoran, Xue, Letian, Wang, Bingyang, Liu, Han |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.23898 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
by: Pan, Zhenyu, et al.
Published: (2025)
by: Pan, Zhenyu, et al.
Published: (2025)
FairReason: Balancing Reasoning and Social Bias in MLLMs
by: Pan, Zhenyu, et al.
Published: (2025)
by: Pan, Zhenyu, et al.
Published: (2025)
Vision Language Models Know Law of Conservation without Understanding More-or-Less
by: Luo, Dezhi, et al.
Published: (2024)
by: Luo, Dezhi, et al.
Published: (2024)
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
by: Yu, Songsong, et al.
Published: (2025)
by: Yu, Songsong, et al.
Published: (2025)
How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study
by: Yang, Zhen, et al.
Published: (2026)
by: Yang, Zhen, et al.
Published: (2026)
3D Primitives are a Spatial Language for VLMs
by: Liu, Junze, et al.
Published: (2026)
by: Liu, Junze, et al.
Published: (2026)
Vision Language Models Cannot Reason About Physical Transformation
by: Luo, Dezhi, et al.
Published: (2026)
by: Luo, Dezhi, et al.
Published: (2026)
Egocentric Bias in Vision-Language Models
by: Wang, Maijunxian, et al.
Published: (2026)
by: Wang, Maijunxian, et al.
Published: (2026)
Revisiting Data Augmentation in Deep Reinforcement Learning
by: Hu, Jianshu, et al.
Published: (2024)
by: Hu, Jianshu, et al.
Published: (2024)
Vision Language Models See What You Want but not What You See
by: Gao, Qingying, et al.
Published: (2024)
by: Gao, Qingying, et al.
Published: (2024)
Unified Multimodal Understanding via Byte-Pair Visual Encoding
by: Zhang, Wanpeng, et al.
Published: (2025)
by: Zhang, Wanpeng, et al.
Published: (2025)
Core Knowledge Deficits in Multi-Modal Language Models
by: Li, Yijiang, et al.
Published: (2024)
by: Li, Yijiang, et al.
Published: (2024)
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?
by: Zhang, Yue, et al.
Published: (2026)
by: Zhang, Yue, et al.
Published: (2026)
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
by: Liu, Haoyang, et al.
Published: (2025)
by: Liu, Haoyang, et al.
Published: (2025)
How Do VLAs Effectively Inherit from VLMs?
by: Zhang, Chuheng, et al.
Published: (2025)
by: Zhang, Chuheng, et al.
Published: (2025)
Probing Mechanical Reasoning in Large Vision Language Models
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration
by: She, Jianshu
Published: (2026)
by: She, Jianshu
Published: (2026)
Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
by: Li, Chenjun
Published: (2026)
by: Li, Chenjun
Published: (2026)
See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay
by: Baghel, Ashish, et al.
Published: (2026)
by: Baghel, Ashish, et al.
Published: (2026)
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
by: Liu, Fuxiao, et al.
Published: (2023)
by: Liu, Fuxiao, et al.
Published: (2023)
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
by: Pan, Zhenyu, et al.
Published: (2025)
by: Pan, Zhenyu, et al.
Published: (2025)
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
by: Ma, Jiefeng, et al.
Published: (2024)
by: Ma, Jiefeng, et al.
Published: (2024)
The Philosophical Foundations of Growing AI Like A Child
by: Luo, Dezhi, et al.
Published: (2025)
by: Luo, Dezhi, et al.
Published: (2025)
Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
by: Lu, Haoran, et al.
Published: (2026)
by: Lu, Haoran, et al.
Published: (2026)
Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving
by: Lian, Weitong, et al.
Published: (2026)
by: Lian, Weitong, et al.
Published: (2026)
Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
by: Chen, Tianyu, et al.
Published: (2025)
by: Chen, Tianyu, et al.
Published: (2025)
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
by: Wu, Xuansheng, et al.
Published: (2023)
by: Wu, Xuansheng, et al.
Published: (2023)
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
by: Wang, Dianyi, et al.
Published: (2025)
by: Wang, Dianyi, et al.
Published: (2025)
COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control
by: Xia, Canming, et al.
Published: (2026)
by: Xia, Canming, et al.
Published: (2026)
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
by: Zhang, Yuyou, et al.
Published: (2025)
by: Zhang, Yuyou, et al.
Published: (2025)
Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach
by: Liu, Shuqi, et al.
Published: (2025)
by: Liu, Shuqi, et al.
Published: (2025)
ROAD: Adaptive Data Mixing for Offline-to-Online Reinforcement Learning via Bi-Level Optimization
by: Yang, Letian, et al.
Published: (2026)
by: Yang, Letian, et al.
Published: (2026)
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning
by: Liu, Jiawei, et al.
Published: (2025)
by: Liu, Jiawei, et al.
Published: (2025)
MindCube: Spatial Mental Modeling from Limited Views
by: Wang, Qineng, et al.
Published: (2025)
by: Wang, Qineng, et al.
Published: (2025)
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
by: Liu, Yue, et al.
Published: (2025)
by: Liu, Yue, et al.
Published: (2025)
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024)
by: Li, Shuo, et al.
Published: (2024)
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)
by: Zhang, Jianshu, et al.
Published: (2026)
Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge
by: Lu, Shuai, et al.
Published: (2026)
by: Lu, Shuai, et al.
Published: (2026)
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization
by: Zhang, Yi, et al.
Published: (2025)
by: Zhang, Yi, et al.
Published: (2025)
Similar Items
-
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
by: Pan, Zhenyu, et al.
Published: (2025) -
FairReason: Balancing Reasoning and Social Bias in MLLMs
by: Pan, Zhenyu, et al.
Published: (2025) -
Vision Language Models Know Law of Conservation without Understanding More-or-Less
by: Luo, Dezhi, et al.
Published: (2024) -
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
by: Yu, Songsong, et al.
Published: (2025) -
How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study
by: Yang, Zhen, et al.
Published: (2026)