Saved in:
| Main Authors: | Liu, Junhua, Wang, Zhangcheng, Han, Zhike, Wang, Ningli, Liang, Guotao, Kuang, Kun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10675 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
by: Liang, Guotao, et al.
Published: (2026)
by: Liang, Guotao, et al.
Published: (2026)
Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
by: Liang, Guotao, et al.
Published: (2026)
by: Liang, Guotao, et al.
Published: (2026)
Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
by: Xu, Ningli, et al.
Published: (2025)
by: Xu, Ningli, et al.
Published: (2025)
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026)
by: Sun, Xiaokun, et al.
Published: (2026)
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
by: Li, Chengzu, et al.
Published: (2026)
by: Li, Chengzu, et al.
Published: (2026)
Large-scale DSM registration via motion averaging
by: Xu, Ningli, et al.
Published: (2024)
by: Xu, Ningli, et al.
Published: (2024)
Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets
by: Xu, Ningli, et al.
Published: (2023)
by: Xu, Ningli, et al.
Published: (2023)
Satellite to GroundScape -- Large-scale Consistent Ground View Generation from Satellite Views
by: Xu, Ningli, et al.
Published: (2025)
by: Xu, Ningli, et al.
Published: (2025)
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
by: Li, Lingxiao, et al.
Published: (2025)
by: Li, Lingxiao, et al.
Published: (2025)
Thinking with Geometry: Active Geometry Integration for Spatial Reasoning
by: Li, Haoyuan, et al.
Published: (2026)
by: Li, Haoyuan, et al.
Published: (2026)
Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model
by: Wang, Yuan, et al.
Published: (2026)
by: Wang, Yuan, et al.
Published: (2026)
3DReflecNet: A Large-Scale Dataset for 3D Reconstruction of Reflective, Transparent, and Low-Texture Objects
by: Liang, Zhicheng, et al.
Published: (2026)
by: Liang, Zhicheng, et al.
Published: (2026)
Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
by: Xu, Ningli, et al.
Published: (2024)
by: Xu, Ningli, et al.
Published: (2024)
GameIR: A Large-Scale Synthesized Ground-Truth Dataset for Image Restoration over Gaming Content
by: Zhou, Lebin, et al.
Published: (2024)
by: Zhou, Lebin, et al.
Published: (2024)
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2025)
by: Zhan, Yufei, et al.
Published: (2025)
Federated Domain Generalization with Domain-specific Soft Prompts Generation
by: Wu, Jianhan, et al.
Published: (2025)
by: Wu, Jianhan, et al.
Published: (2025)
PM25Vision: A Large-Scale Benchmark Dataset for Visual Estimation of Air Quality
by: Han, Yang
Published: (2025)
by: Han, Yang
Published: (2025)
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
by: Han, Hongyong, et al.
Published: (2025)
by: Han, Hongyong, et al.
Published: (2025)
Think Proprioceptively: Embodied Visual Reasoning for VLA Manipulation
by: Wang, Fangyuan, et al.
Published: (2026)
by: Wang, Fangyuan, et al.
Published: (2026)
Allocentric Perceiver: Disentangling Allocentric Reasoning from Egocentric Visual Priors via Frame Instantiation
by: Wang, Hengyi, et al.
Published: (2026)
by: Wang, Hengyi, et al.
Published: (2026)
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
by: Sima, Bingrui, et al.
Published: (2025)
by: Sima, Bingrui, et al.
Published: (2025)
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
by: Dong, Xinpeng, et al.
Published: (2026)
by: Dong, Xinpeng, et al.
Published: (2026)
AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning
by: Xiang, Kun, et al.
Published: (2024)
by: Xiang, Kun, et al.
Published: (2024)
Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
by: Han, Songhao, et al.
Published: (2024)
by: Han, Songhao, et al.
Published: (2024)
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
by: Wu, Junfei, et al.
Published: (2025)
by: Wu, Junfei, et al.
Published: (2025)
Improved Masked Image Generation with Knowledge-Augmented Token Representations
by: Liang, Guotao, et al.
Published: (2025)
by: Liang, Guotao, et al.
Published: (2025)
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
by: Su, Zhaochen, et al.
Published: (2025)
by: Su, Zhaochen, et al.
Published: (2025)
D-Fusion: Direct Preference Optimization for Aligning Diffusion Models with Visually Consistent Samples
by: Hu, Zijing, et al.
Published: (2025)
by: Hu, Zijing, et al.
Published: (2025)
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)
by: Wang, Yuan, et al.
Published: (2026)
EventMemAgent: Hierarchical Event-Centric Memory for Online Video Understanding with Adaptive Tool Use
by: Wen, Siwei, et al.
Published: (2026)
by: Wen, Siwei, et al.
Published: (2026)
Think3D: Thinking with Space for Spatial Reasoning
by: Zhang, Zaibin, et al.
Published: (2026)
by: Zhang, Zaibin, et al.
Published: (2026)
FF-LOGO: Cross-Modality Point Cloud Registration with Feature Filtering and Local to Global Optimization
by: Ma, Nan, et al.
Published: (2023)
by: Ma, Nan, et al.
Published: (2023)
RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology
by: Li, Wenxuan, et al.
Published: (2026)
by: Li, Wenxuan, et al.
Published: (2026)
Understanding Bias in Large-Scale Visual Datasets
by: Zeng, Boya, et al.
Published: (2024)
by: Zeng, Boya, et al.
Published: (2024)
Large Language Models are Universal Reasoners for Visual Generation
by: Ren, Sucheng, et al.
Published: (2026)
by: Ren, Sucheng, et al.
Published: (2026)
VisRef: Visual Refocusing while Thinking Improves Test-Time Scaling in Multi-Modal Large Reasoning Models
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
by: Ghosal, Soumya Suvra, et al.
Published: (2026)
Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World
by: Huang, Yuzhi, et al.
Published: (2026)
by: Huang, Yuzhi, et al.
Published: (2026)
Let's Think with Images Efficiently! An Interleaved-Modal Chain-of-Thought Reasoning Framework with Dynamic and Precise Visual Thoughts
by: Liu, Xu, et al.
Published: (2026)
by: Liu, Xu, et al.
Published: (2026)
Similar Items
-
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
by: Liang, Guotao, et al.
Published: (2026) -
Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
by: Liang, Guotao, et al.
Published: (2026) -
Cross-view Localization and Synthesis -- Datasets, Challenges and Opportunities
by: Xu, Ningli, et al.
Published: (2025) -
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition
by: Sun, Xiaokun, et al.
Published: (2026) -
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
by: Li, Chengzu, et al.
Published: (2026)