Saved in:
| Main Authors: | Wei, Zhaoyang, Ding, Wenchao, Hao, Yanchao, Chen, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.22172 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
by: Liu, Zhenyang, et al.
Published: (2025)
by: Liu, Zhenyang, et al.
Published: (2025)
EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment
by: Wang, Zhaoyang, et al.
Published: (2025)
by: Wang, Zhaoyang, et al.
Published: (2025)
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
by: Sun, Wenfang, et al.
Published: (2026)
by: Sun, Wenfang, et al.
Published: (2026)
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)
by: Huang, Yanxiang, et al.
Published: (2026)
Reasoning Matters for 3D Visual Grounding
by: Huang, Hsiang-Wei, et al.
Published: (2026)
by: Huang, Hsiang-Wei, et al.
Published: (2026)
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
by: Jiang, Qing, et al.
Published: (2025)
by: Jiang, Qing, et al.
Published: (2025)
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
The Mind's Eye: A Multi-Faceted Reward Framework for Guiding Visual Metaphor Generation
by: Koushik, Girish A., et al.
Published: (2025)
by: Koushik, Girish A., et al.
Published: (2025)
Mutual Information guided Visual Contrastive Learning
by: Chen, Hanyang, et al.
Published: (2025)
by: Chen, Hanyang, et al.
Published: (2025)
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
by: Chen, Wei, et al.
Published: (2024)
by: Chen, Wei, et al.
Published: (2024)
Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
by: Zhang, Shan, et al.
Published: (2025)
by: Zhang, Shan, et al.
Published: (2025)
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
by: Bai, Sule, et al.
Published: (2025)
by: Bai, Sule, et al.
Published: (2025)
Language-Guided Diffusion Model for Visual Grounding
by: Chen, Sijia, et al.
Published: (2023)
by: Chen, Sijia, et al.
Published: (2023)
DeepScan: A Training-Free Framework for Visually Grounded Reasoning in Large Vision-Language Models
by: Li, Yangfu, et al.
Published: (2026)
by: Li, Yangfu, et al.
Published: (2026)
See It, Say It, Sorted: An Iterative Training-Free Framework for Visually-Grounded Multimodal Reasoning in LVLMs
by: Zhang, Yongchang, et al.
Published: (2026)
by: Zhang, Yongchang, et al.
Published: (2026)
Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning
by: Peng, Zhong, et al.
Published: (2025)
by: Peng, Zhong, et al.
Published: (2025)
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
by: Peng, Haosong, et al.
Published: (2025)
by: Peng, Haosong, et al.
Published: (2025)
MIRG-RL: Multi-Image Reasoning and Grounding with Reinforcement Learning
by: Zheng, Lihao, et al.
Published: (2025)
by: Zheng, Lihao, et al.
Published: (2025)
Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025)
by: Sarch, Gabriel, et al.
Published: (2025)
CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering
by: Zeng, Xiyin, et al.
Published: (2026)
by: Zeng, Xiyin, et al.
Published: (2026)
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025)
by: Zhao, Tianyi, et al.
Published: (2025)
Learning GUI Grounding with Spatial Reasoning from Visual Feedback
by: Zhao, Yu, et al.
Published: (2025)
by: Zhao, Yu, et al.
Published: (2025)
From Diffusion to Resolution: Leveraging 2D Diffusion Models for 3D Super-Resolution Task
by: Chen, Bohao, et al.
Published: (2024)
by: Chen, Bohao, et al.
Published: (2024)
VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
EyeSeg: An Uncertainty-Aware Eye Segmentation Framework for AR/VR
by: Peng, Zhengyuan, et al.
Published: (2025)
by: Peng, Zhengyuan, et al.
Published: (2025)
UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition
by: Nan, Xinyu, et al.
Published: (2025)
by: Nan, Xinyu, et al.
Published: (2025)
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
by: Qiang, Chenhui, et al.
Published: (2025)
by: Qiang, Chenhui, et al.
Published: (2025)
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
by: Wang, Yaxian, et al.
Published: (2025)
by: Wang, Yaxian, et al.
Published: (2025)
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
by: Ren, Tianhe, et al.
Published: (2024)
by: Ren, Tianhe, et al.
Published: (2024)
TikArt: Stabilizing Aperture-Guided Fine-Grained Visual Reasoning with Reinforcement Learning
by: Ding, Hao, et al.
Published: (2026)
by: Ding, Hao, et al.
Published: (2026)
CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding
by: Chen, Zhou, et al.
Published: (2025)
by: Chen, Zhou, et al.
Published: (2025)
Hierarchical Contextual Grounding LVLM: Enhancing Fine-Grained Visual-Language Understanding with Robust Grounding
by: Guo, Leilei, et al.
Published: (2025)
by: Guo, Leilei, et al.
Published: (2025)
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning
by: Cao, Meng, et al.
Published: (2025)
by: Cao, Meng, et al.
Published: (2025)
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
by: Liu, Zhenyang, et al.
Published: (2025)
by: Liu, Zhenyang, et al.
Published: (2025)
Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning
by: Luo, Liqin, et al.
Published: (2025)
by: Luo, Liqin, et al.
Published: (2025)
Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
by: Wu, Zhixuan, et al.
Published: (2026)
by: Wu, Zhixuan, et al.
Published: (2026)
Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs
by: Zhu, Fangrui, et al.
Published: (2025)
by: Zhu, Fangrui, et al.
Published: (2025)
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
by: Xiao, Linhui, et al.
Published: (2024)
by: Xiao, Linhui, et al.
Published: (2024)
Composition-Grounded Data Synthesis for Visual Reasoning
by: Gu, Xinyi, et al.
Published: (2025)
by: Gu, Xinyi, et al.
Published: (2025)
VGR: Visual Grounded Reasoning
by: Wang, Jiacong, et al.
Published: (2025)
by: Wang, Jiacong, et al.
Published: (2025)
Similar Items
-
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
by: Liu, Zhenyang, et al.
Published: (2025) -
EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment
by: Wang, Zhaoyang, et al.
Published: (2025) -
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
by: Sun, Wenfang, et al.
Published: (2026) -
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026) -
Reasoning Matters for 3D Visual Grounding
by: Huang, Hsiang-Wei, et al.
Published: (2026)