:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Dinging, Zhao, Yingxiu, Cheng, Xinrui, Lin, Kangheng, Peng, Hongbo, Li, Hongxing, Wang, Zixuan, Dai, Yuhong, Li, Haodong, Wang, Jia, Shi, Yukang, Zhao, Liang, Sun, Jianjian, Ge, Zheng, Zhang, Xiangyu, Lu, Weiming, Xiao, Jun, Zhuang, Yueting, Shen, Yongliang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2604.14144
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
by: Li, Hongxing, et al.
Published: (2025)

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
by: Li, Dingming, et al.
Published: (2025)

Milestone-Guided Policy Learning for Long-Horizon Language Agents
by: Wang, Zixuan, et al.
Published: (2026)

GroundAct: Can LLM Agents Ground Actions in Environmental States?
by: Wang, Zixuan, et al.
Published: (2025)

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning
by: Wang, Aozhe, et al.
Published: (2026)

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
by: Zhang, Wenqi, et al.
Published: (2024)

SpatialFusion: Endowing Unified Image Generation with Intrinsic 3D Geometric Awareness
by: Qiu, Haiyi, et al.
Published: (2026)

WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
by: Dai, Yuhong, et al.
Published: (2026)

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
by: Xu, Haolei, et al.
Published: (2026)

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow
by: Zhang, Wenqi, et al.
Published: (2023)

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
by: Zhang, Wenqi, et al.
Published: (2025)

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
by: Zhang, Wenqi, et al.
Published: (2024)

EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer
by: Zhao, Pukun, et al.
Published: (2025)

Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding
by: Lin, Tao, et al.
Published: (2025)

Self-Evolving Spatial Reasoning in Vision Language Models via Geometric Logic Consistency
by: Liu, Junming, et al.
Published: (2026)

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations
by: Li, Jia, et al.
Published: (2024)

Automatic Instruction Evolving for Large Language Models
by: Zeng, Weihao, et al.
Published: (2024)

EvoTSE: Evolving Enrollment for Target Speaker Extraction
by: Liu, Zikai, et al.
Published: (2026)

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
by: Liu, Yuhong, et al.
Published: (2025)

TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023)

Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
by: Yuan, Yuqian, et al.
Published: (2024)

EvoWiki: Evaluating LLMs on Evolving Knowledge
by: Tang, Wei, et al.
Published: (2024)

RAFT-UP: Robust Alignment for Spatial Transcriptomics with Explicit Control of Spatial Distortion
by: Wu, Yaqi, et al.
Published: (2026)

Unhackable Temporal Rewarding for Scalable Video MLLMs
by: Yu, En, et al.
Published: (2025)

PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)

EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
by: Xu, Haolei, et al.
Published: (2025)

Geometrically-Constrained Agent for Spatial Reasoning
by: Chen, Zeren, et al.
Published: (2025)

Structural-Temporal Coupling Anomaly Detection with Dynamic Graph Transformer
by: Zong, Chang, et al.
Published: (2025)

Let LRMs Break Free from Overthinking via Self-Braking Tuning
by: Zhao, Haoran, et al.
Published: (2025)

Mixed‐Mode Fracturing Characteristics of Asphalt Concrete at Low‐Temperature Considering Random Spatial Combinations of Aggregates and Voids
by: Mengzhang Chen, et al.
Published: (2025)

Hierarchical Budget Policy Optimization for Adaptive Reasoning
by: Lyu, Shangke, et al.
Published: (2025)

Neural Network-Assisted RIS Weight Optimization for Spatial Nulling in Distorted Reflector Antenna Systems
by: Li, Xinrui, et al.
Published: (2025)

TraceTrans: Translation and Spatial Tracing for Surgical Prediction
by: Luo, Xiyu, et al.
Published: (2025)

Reconstructing 4D Spatial Intelligence: A Survey
by: Cao, Yukang, et al.
Published: (2025)

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning
by: Gao, Xianqiang, et al.
Published: (2026)

EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control
by: Zhang, Chushan, et al.
Published: (2026)

Spatial Blindness in Whole-Slide Multiple Instance Learning
by: Li, Xiangyu, et al.
Published: (2026)

EvoCodeBench: A Human-Performance Benchmark for Self-Evolving LLM-Driven Coding Systems
by: Zhang, Wentao, et al.
Published: (2026)