:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Tianyu, Ma, Zhiyuan, Wang, Qian, Zhang, Xinyi, Long, Xinwei, Zhou, Bowen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.19974
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation
by: Pan, Jiadong, et al.
Published: (2025)

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
by: Zhao, Zhonghan, et al.
Published: (2025)

SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
by: Guo, Xiaojun, et al.
Published: (2025)

SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)

Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
by: Ma, Weijian, et al.
Published: (2026)

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
by: Wang, Xinyi, et al.
Published: (2025)

Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking
by: Ma, Zhiyuan, et al.
Published: (2024)

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
by: Ma, Zhiyuan, et al.
Published: (2023)

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation
by: Wang, Hongpeng, et al.
Published: (2026)

SPR-128K: A New Benchmark for Spatial Plausibility Reasoning with Multimodal Large Language Models
by: Hu, Zhiyuan, et al.
Published: (2025)

Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation
by: Jia, Guoli, et al.
Published: (2025)

Neural Residual Diffusion Models for Deep Scalable Vision Generation
by: Ma, Zhiyuan, et al.
Published: (2024)

Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation
by: Guo, Yanjun, et al.
Published: (2026)

Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling
by: Ma, Zhiyuan, et al.
Published: (2025)

GenSpace: Benchmarking Spatially-Aware Image Generation
by: Wang, Zehan, et al.
Published: (2025)

Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)

Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
by: Wang, Zhao, et al.
Published: (2025)

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
by: Wu, Hang, et al.
Published: (2026)

Context-Aware Autoregressive Models for Multi-Conditional Image Generation
by: Chen, Yixiao, et al.
Published: (2025)

Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation
by: Wang, Junjie, et al.
Published: (2026)

UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
by: Lei, Guojun, et al.
Published: (2025)

CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning
by: Ma, Wenxin, et al.
Published: (2026)

SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning
by: Qi, Biqing, et al.
Published: (2024)

PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
by: Ping, Bowen, et al.
Published: (2025)

SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)

Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
by: Huang, Yushi, et al.
Published: (2026)

Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
by: Hu, Zhiyuan, et al.
Published: (2025)

Detecting AI-Generated Video via Frame Consistency
by: Ma, Long, et al.
Published: (2024)

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
by: Yao, Ruilin, et al.
Published: (2025)

Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)

Diffusion-Based Depth Inpainting for Transparent and Reflective Objects
by: Sun, Tianyu, et al.
Published: (2024)

TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment
by: Wang, Zhichuan, et al.
Published: (2025)

Learning Proposes, Geometry Disposes: A Modular Framework for Efficient Spatial Reasoning
by: Zhu, Haichao, et al.
Published: (2026)

Data-Free Generalized Zero-Shot Learning
by: Tang, Bowen, et al.
Published: (2024)

Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields
by: Yang, Jing, et al.
Published: (2024)

TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
by: Fu, Yanping, et al.
Published: (2025)

pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
by: Luo, Zhanpeng, et al.
Published: (2026)

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
by: Liao, Ruotong, et al.
Published: (2024)

Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
by: Lee, Hyundo, et al.
Published: (2025)