:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Junhua, Wang, Zhangcheng, Han, Zhike, Wang, Ningli, Liang, Guotao, Kuang, Kun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.10675
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
by: Liang, Guotao, et al.
Published: (2026)

Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
by: Liang, Guotao, et al.
Published: (2026)

Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
by: Xu, Ningli, et al.
Published: (2025)

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026)

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
by: Li, Chengzu, et al.
Published: (2026)

Large-scale DSM registration via motion averaging
by: Xu, Ningli, et al.
Published: (2024)

Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets
by: Xu, Ningli, et al.
Published: (2023)

Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views
by: Xu, Ningli, et al.
Published: (2025)

VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)

Thinking with Geometry: Active Geometry Integration for Spatial Reasoning
by: Li, Haoyuan, et al.
Published: (2026)

Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model
by: Wang, Yuan, et al.
Published: (2026)

3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects
by: Liang, Zhicheng, et al.
Published: (2026)

Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
by: Xu, Ningli, et al.
Published: (2024)

GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content
by: Zhou, Lebin, et al.
Published: (2024)

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)

Federated Domain Generalization with Domain-specific Soft Prompts Generation
by: Wu, Jianhan, et al.
Published: (2025)

PM25Vision: A Large-Scale Benchmark Dataset for Visual Estimation of Air Quality
by: Han, Yang
Published: (2025)

CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025)

Think Proprioceptively: Embodied Visual Reasoning for VLA Manipulation
by: Wang, Fangyuan, et al.
Published: (2026)

Allocentric Perceiver: Disentangling Allocentric Reasoning from Egocentric Visual Priors via Frame Instantiation
by: Wang, Hengyi, et al.
Published: (2026)

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
by: Sima, Bingrui, et al.
Published: (2025)

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
by: Dong, Xinpeng, et al.
Published: (2026)

AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning
by: Xiang, Kun, et al.
Published: (2024)

Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)

Improved Masked Image Generation with Knowledge-Augmented Token Representations
by: Liang, Guotao, et al.
Published: (2025)

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
by: Su, Zhaochen, et al.
Published: (2025)

D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
by: Hu, Zijing, et al.
Published: (2025)

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)

EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use
by: Wen, Siwei, et al.
Published: (2026)

Think3D: Thinking with Space for Spatial Reasoning
by: Zhang, Zaibin, et al.
Published: (2026)

FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization
by: Ma, Nan, et al.
Published: (2023)

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology
by: Li, Wenxuan, et al.
Published: (2026)

Understanding Bias in Large-Scale Visual Datasets
by: Zeng, Boya, et al.
Published: (2024)

Large Language Models are Universal Reasoners for Visual Generation
by: Ren, Sucheng, et al.
Published: (2026)

VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models
by: Ghosal, Soumya Suvra, et al.
Published: (2026)

Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
by: Huang, Yuzhi, et al.
Published: (2026)

Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)