Saved in:
| Main Authors: | Jia, Furong, Dai, Ling, Deng, Wenjin, Zhang, Fan, Hu, Chen, Jiang, Daxin, Liu, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09463 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024)
by: Li, Ling, et al.
Published: (2024)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025)
by: Li, Ling, et al.
Published: (2025)
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
by: Lee, Seungjae, et al.
Published: (2025)
by: Lee, Seungjae, et al.
Published: (2025)
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
by: Xue, Haotian, et al.
Published: (2025)
by: Xue, Haotian, et al.
Published: (2025)
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
by: Liu, Zuyan, et al.
Published: (2024)
by: Liu, Zuyan, et al.
Published: (2024)
REVERSE: Reinforcing Evidence Verification and Search for Agentic Image geo-localization
by: Li, Yong, et al.
Published: (2026)
by: Li, Yong, et al.
Published: (2026)
Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies
by: Hou, Wenjin, et al.
Published: (2026)
by: Hou, Wenjin, et al.
Published: (2026)
Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning
by: Luo, Liqin, et al.
Published: (2025)
by: Luo, Liqin, et al.
Published: (2025)
Prompting Large Vision-Language Models for Compositional Reasoning
by: Ossowski, Timothy, et al.
Published: (2024)
by: Ossowski, Timothy, et al.
Published: (2024)
GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding
by: Fan, Rong, et al.
Published: (2026)
by: Fan, Rong, et al.
Published: (2026)
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
by: Wei, Yana, et al.
Published: (2025)
by: Wei, Yana, et al.
Published: (2025)
Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework
by: Jia, Furong, et al.
Published: (2025)
by: Jia, Furong, et al.
Published: (2025)
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
by: Zeng, Yu, et al.
Published: (2025)
by: Zeng, Yu, et al.
Published: (2025)
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
by: Prasad, Archiki, et al.
Published: (2023)
by: Prasad, Archiki, et al.
Published: (2023)
Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding
by: Yoon, Hee Suk, et al.
Published: (2026)
by: Yoon, Hee Suk, et al.
Published: (2026)
AgriChain Visually Grounded Expert Verified Reasoning for Interpretable Agricultural Vision Language Models
by: Mahmood, Hazza, et al.
Published: (2026)
by: Mahmood, Hazza, et al.
Published: (2026)
Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)
by: Wei, Tianxin, et al.
Published: (2026)
DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning
by: Yu, Song, et al.
Published: (2025)
by: Yu, Song, et al.
Published: (2025)
GeoArena: Evaluating Open-World Geographic Reasoning in Large Vision-Language Models
by: Jia, Pengyue, et al.
Published: (2025)
by: Jia, Pengyue, et al.
Published: (2025)
Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models
by: Wang, Junxin, et al.
Published: (2026)
by: Wang, Junxin, et al.
Published: (2026)
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
by: Rajabi, Navid, et al.
Published: (2023)
by: Rajabi, Navid, et al.
Published: (2023)
Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes
by: Deng, Shiling, et al.
Published: (2025)
by: Deng, Shiling, et al.
Published: (2025)
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
by: Wang, Yikun, et al.
Published: (2025)
by: Wang, Yikun, et al.
Published: (2025)
Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models
by: Zheng, Ziwei, et al.
Published: (2025)
by: Zheng, Ziwei, et al.
Published: (2025)
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)
by: Yang, Zhaohui, et al.
Published: (2025)
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs
by: Berman, Shmuel, et al.
Published: (2025)
by: Berman, Shmuel, et al.
Published: (2025)
Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models
by: Deng, Wei, et al.
Published: (2026)
by: Deng, Wei, et al.
Published: (2026)
Refining Skewed Perceptions in Vision-Language Contrastive Models through Visual Representations
by: Dai, Haocheng, et al.
Published: (2024)
by: Dai, Haocheng, et al.
Published: (2024)
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
by: Chen, Jiaxing, et al.
Published: (2024)
by: Chen, Jiaxing, et al.
Published: (2024)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
Enhancing Large Language Models through Structured Reasoning
by: Dong, Yubo, et al.
Published: (2025)
by: Dong, Yubo, et al.
Published: (2025)
Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
by: Dong, Yihong, et al.
Published: (2026)
by: Dong, Yihong, et al.
Published: (2026)
Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models
by: Sinha, Sanchit, et al.
Published: (2025)
by: Sinha, Sanchit, et al.
Published: (2025)
Probing Mechanical Reasoning in Large Vision Language Models
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
AgentVLN: Towards Agentic Vision-and-Language Navigation
by: Xin, Zihao, et al.
Published: (2026)
by: Xin, Zihao, et al.
Published: (2026)
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
by: Wang, Haibo, et al.
Published: (2026)
by: Wang, Haibo, et al.
Published: (2026)
Similar Items
-
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
by: Li, Ling, et al.
Published: (2024) -
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models
by: Li, Ling, et al.
Published: (2025) -
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024) -
Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
by: Lee, Seungjae, et al.
Published: (2025) -
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
by: Xue, Haotian, et al.
Published: (2025)