Saved in:
| Main Authors: | Chen, Xiaoyu, Dai, Lu, Wang, Hanqing, Li, Zhuoyu, Dai, Wenbin, Zheng, Yanzong, Xia, Zhenggang, Lin, Junyong, Xiong, Hui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03660 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model
by: Wang, Hanqing, et al.
Published: (2026)
by: Wang, Hanqing, et al.
Published: (2026)
ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research
by: Lin, Junyong, et al.
Published: (2025)
by: Lin, Junyong, et al.
Published: (2025)
Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning
by: Wang, Yi, et al.
Published: (2026)
by: Wang, Yi, et al.
Published: (2026)
Inner Synchronization of Complex‐Valued Stochastic Coupled Networks Via Intermittent Discrete Observation Control
by: Guang Dai, et al.
Published: (2025)
by: Guang Dai, et al.
Published: (2025)
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)
by: Cheng, An-Chieh, et al.
Published: (2024)
Improve Dense Passage Retrieval with Entailment Tuning
by: Dai, Lu, et al.
Published: (2024)
by: Dai, Lu, et al.
Published: (2024)
SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning
by: Jia, Furong, et al.
Published: (2026)
by: Jia, Furong, et al.
Published: (2026)
EgoCoT-Bench: Benchmarking Grounded and Verifiable Operation-Centric Chain of Thought Reasoning for MLLMs
by: Dai, Yang, et al.
Published: (2026)
by: Dai, Yang, et al.
Published: (2026)
CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning
by: Ma, Wenxin, et al.
Published: (2026)
by: Ma, Wenxin, et al.
Published: (2026)
Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
by: Xu, Ming, et al.
Published: (2024)
by: Xu, Ming, et al.
Published: (2024)
Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning
by: Zhong, Hui, et al.
Published: (2026)
by: Zhong, Hui, et al.
Published: (2026)
Data-driven Option Pricing
by: Dai, Min, et al.
Published: (2024)
by: Dai, Min, et al.
Published: (2024)
Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks
by: Sharma, Arun
Published: (2026)
by: Sharma, Arun
Published: (2026)
ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
by: Zhao, Yanpeng, et al.
Published: (2026)
by: Zhao, Yanpeng, et al.
Published: (2026)
TableVista: Benchmarking Multimodal Table Reasoning under Visual and Structural Complexity
by: Yang, Zheyuan, et al.
Published: (2026)
by: Yang, Zheyuan, et al.
Published: (2026)
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
by: Lompo, Boammani Aser, et al.
Published: (2025)
by: Lompo, Boammani Aser, et al.
Published: (2025)
Relative quasi-Gorensteinness in extriangulated categories
by: He, Zhenggang
Published: (2025)
by: He, Zhenggang
Published: (2025)
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
Parameter Estimation of Multi‐Input Multi‐Output Hammerstein Nonlinear System With Deep GRU Networks
by: Feng Li, et al.
Published: (2026)
by: Feng Li, et al.
Published: (2026)
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
by: Zhang, Zhengshen, et al.
Published: (2025)
by: Zhang, Zhengshen, et al.
Published: (2025)
MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models
by: Huang, Zimeng, et al.
Published: (2025)
by: Huang, Zimeng, et al.
Published: (2025)
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
by: Lu, Yuxuan, et al.
Published: (2026)
by: Lu, Yuxuan, et al.
Published: (2026)
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
by: Cao, Fanpu, et al.
Published: (2026)
by: Cao, Fanpu, et al.
Published: (2026)
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
by: Xu, Rui, et al.
Published: (2025)
by: Xu, Rui, et al.
Published: (2025)
Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning
by: Luo, Liqin, et al.
Published: (2025)
by: Luo, Liqin, et al.
Published: (2025)
OptiSAR-Net++: A Large-Scale Benchmark and Transformer-Free Framework for Cross-Domain Remote Sensing Visual Grounding
by: Tang, Xiaoyu, et al.
Published: (2026)
by: Tang, Xiaoyu, et al.
Published: (2026)
Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System
by: Feng, Jiawei, et al.
Published: (2024)
by: Feng, Jiawei, et al.
Published: (2024)
RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking
by: Zou, Jiaru, et al.
Published: (2025)
by: Zou, Jiaru, et al.
Published: (2025)
DSI-Bench: A Benchmark for Dynamic Spatial Intelligence
by: Zhang, Ziang, et al.
Published: (2025)
by: Zhang, Ziang, et al.
Published: (2025)
High Energy Storage of Polymer Blend at Elevated Temperature
by: Guanxiang Zhang, et al.
Published: (2025)
by: Guanxiang Zhang, et al.
Published: (2025)
TableReasoner: Advancing Table Reasoning Framework with Large Language Models
by: Xiong, Sishi, et al.
Published: (2025)
by: Xiong, Sishi, et al.
Published: (2025)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)
by: Jia, Mengdi, et al.
Published: (2025)
QCDGE database, Quantum Chemistry Database with Ground- and Excited-state Properties of 450 Kilo Molecules
by: Zhu, Yifei, et al.
Published: (2024)
by: Zhu, Yifei, et al.
Published: (2024)
TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering
by: Zhu, Junnan, et al.
Published: (2025)
by: Zhu, Junnan, et al.
Published: (2025)
Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
by: Li, Han, et al.
Published: (2024)
by: Li, Han, et al.
Published: (2024)
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
by: Yan, Yibo, et al.
Published: (2024)
by: Yan, Yibo, et al.
Published: (2024)
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
by: Jiang, Lihan, et al.
Published: (2024)
by: Jiang, Lihan, et al.
Published: (2024)
Oh-Trust: Overbooking and Hybrid Trading-empowered Resource Scheduling with Smart Reputation Update over Dynamic Edge Networks
by: Qi, Houyi, et al.
Published: (2025)
by: Qi, Houyi, et al.
Published: (2025)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models
by: Stogiannidis, Ilias, et al.
Published: (2025)
by: Stogiannidis, Ilias, et al.
Published: (2025)
Similar Items
-
VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model
by: Wang, Hanqing, et al.
Published: (2026) -
ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research
by: Lin, Junyong, et al.
Published: (2025) -
Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning
by: Wang, Yi, et al.
Published: (2026) -
Inner Synchronization of Complex‐Valued Stochastic Coupled Networks Via Intermittent Discrete Observation Control
by: Guang Dai, et al.
Published: (2025) -
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)