:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Marsili, Damiano, Agrawal, Rohun, Yue, Yisong, Gkioxari, Georgia
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.06787
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
by: Marsili, Damiano, et al.
Published: (2025)

Same or Not? Enhancing Visual Perception in Vision-Language Models
by: Marsili, Damiano, et al.
Published: (2025)

Find Any Part in 3D
by: Ma, Ziqi, et al.
Published: (2024)

Feedforward 3D Editing via Text-Steerable Image-to-3D
by: Ma, Ziqi, et al.
Published: (2025)

Is This Tracker On? A Benchmark Protocol for Dynamic Tracking
by: Demler, Ilona, et al.
Published: (2025)

Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision
by: Sahoo, Aadarsh, et al.
Published: (2026)

Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
by: Kang, Raphi, et al.
Published: (2026)

Aligning Text, Images, and 3D Structure Token-by-Token
by: Sahoo, Aadarsh, et al.
Published: (2025)

Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models
by: Ma, Ziqi, et al.
Published: (2026)

Is CLIP ideal? No. Can we fix it? Yes!
by: Kang, Raphi, et al.
Published: (2025)

Reconstructing Hand-Held Objects in 3D from Images and Videos
by: Wu, Jane, et al.
Published: (2024)

MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation
by: Zuo, Xingxing, et al.
Published: (2025)

STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
by: Agrawal, Palaash, et al.
Published: (2023)

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning
by: Yang, Qian, et al.
Published: (2026)

Caltech Aerial RGB-Thermal Dataset in the Wild
by: Lee, Connor, et al.
Published: (2024)

NitroGen: An Open Foundation Model for Generalist Gaming Agents
by: Magne, Loïc, et al.
Published: (2026)

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization
by: Wang, Yikun, et al.
Published: (2025)

SpatialImaginer: Towards Adaptive Visual Imagination for Spatial Reasoning
by: Li, Yian, et al.
Published: (2026)

Connecting the Dots: Training-Free Visual Grounding via Agentic Reasoning
by: Luo, Liqin, et al.
Published: (2025)

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km
by: Sun, Peiwen, et al.
Published: (2025)

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
by: Jain, Jitesh, et al.
Published: (2025)

Active Exploring like a Pigeon: Reinforcing Spatial Reasoning via Agentic Vision-Language Models
by: Deng, Wei, et al.
Published: (2026)

Self-Evolving Visual Concept Library using Vision-Language Critics
by: Sehgal, Atharva, et al.
Published: (2025)

RadFabric: Agentic AI System with Reasoning Capability for Radiology
by: Chen, Wenting, et al.
Published: (2025)

From Web to Pixels: Bringing Agentic Search into Visual Perception
by: Yang, Bokang, et al.
Published: (2026)

Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models
by: Yoon, Lauren Hyoseo, et al.
Published: (2025)

pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
by: Luo, Zhanpeng, et al.
Published: (2026)

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
by: Ranasinghe, Kanchana, et al.
Published: (2024)

VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
by: Wang, Zhaozhi, et al.
Published: (2025)

Visual-Semantic Graph Matching Net for Zero-Shot Learning
by: Duan, Bowen, et al.
Published: (2024)

Act2See: Emergent Active Visual Perception for Video Reasoning
by: Ma, Martin Q., et al.
Published: (2026)

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
by: Wang, Austin, et al.
Published: (2026)

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning
by: Liu, Shifeng, et al.
Published: (2026)

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning
by: Jeon, Byungwoo, et al.
Published: (2026)

Learning GUI Grounding with Spatial Reasoning from Visual Feedback
by: Zhao, Yu, et al.
Published: (2025)

InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity
by: Wang, Haoming, et al.
Published: (2025)

Unsupervised Representation Learning from Sparse Transformation Analysis
by: Song, Yue, et al.
Published: (2024)

MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation
by: Wang, Haoming, et al.
Published: (2026)

Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)