Saved in:
| Main Authors: | Wang, Tianyu, Ma, Zhiyuan, Wang, Qian, Zhang, Xinyi, Long, Xinwei, Zhou, Bowen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.19974 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation
by: Pan, Jiadong, et al.
Published: (2025)
by: Pan, Jiadong, et al.
Published: (2025)
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)
by: Long, Xinwei, et al.
Published: (2025)
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
by: Zhao, Zhonghan, et al.
Published: (2025)
by: Zhao, Zhonghan, et al.
Published: (2025)
SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
by: Guo, Xiaojun, et al.
Published: (2025)
by: Guo, Xiaojun, et al.
Published: (2025)
SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)
by: Long, Yancheng, et al.
Published: (2026)
Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation
by: Ma, Weijian, et al.
Published: (2026)
by: Ma, Weijian, et al.
Published: (2026)
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
by: Ma, Zhiyuan, et al.
Published: (2023)
by: Ma, Zhiyuan, et al.
Published: (2023)
MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation
by: Wang, Hongpeng, et al.
Published: (2026)
by: Wang, Hongpeng, et al.
Published: (2026)
SPR-128K: A New Benchmark for Spatial Plausibility Reasoning with Multimodal Large Language Models
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
Emotion-Director: Bridging Affective Shortcut in Emotion-Oriented Image Generation
by: Jia, Guoli, et al.
Published: (2025)
by: Jia, Guoli, et al.
Published: (2025)
Neural Residual Diffusion Models for Deep Scalable Vision Generation
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
Memorize When Needed: Decoupled Memory Control for Spatially Consistent Long-Horizon Video Generation
by: Guo, Yanjun, et al.
Published: (2026)
by: Guo, Yanjun, et al.
Published: (2026)
Flow Diverse and Efficient: Learning Momentum Flow Matching via Stochastic Velocity Field Sampling
by: Ma, Zhiyuan, et al.
Published: (2025)
by: Ma, Zhiyuan, et al.
Published: (2025)
GenSpace: Benchmarking Spatially-Aware Image Generation
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
Spatial Chain-of-Thought: Bridging Understanding and Generation Models for Spatial Reasoning Generation
by: Chen, Wei, et al.
Published: (2026)
by: Chen, Wei, et al.
Published: (2026)
Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
by: Wang, Zhao, et al.
Published: (2025)
by: Wang, Zhao, et al.
Published: (2025)
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
by: Wu, Hang, et al.
Published: (2026)
by: Wu, Hang, et al.
Published: (2026)
Context-Aware Autoregressive Models for Multi-Conditional Image Generation
by: Chen, Yixiao, et al.
Published: (2025)
by: Chen, Yixiao, et al.
Published: (2025)
Benchmarking and Evolving Reason-Reflect-Rectify for Reflective Visual Generation
by: Wang, Junjie, et al.
Published: (2026)
by: Wang, Junjie, et al.
Published: (2026)
UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
by: Lei, Guojun, et al.
Published: (2025)
by: Lei, Guojun, et al.
Published: (2025)
CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning
by: Ma, Wenxin, et al.
Published: (2026)
by: Ma, Wenxin, et al.
Published: (2026)
SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning
by: Qi, Biqing, et al.
Published: (2024)
by: Qi, Biqing, et al.
Published: (2024)
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
by: Ping, Bowen, et al.
Published: (2025)
by: Ping, Bowen, et al.
Published: (2025)
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning
by: Ma, Wufei, et al.
Published: (2025)
by: Ma, Wufei, et al.
Published: (2025)
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
by: Huang, Yushi, et al.
Published: (2026)
by: Huang, Yushi, et al.
Published: (2026)
Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
Detecting AI-Generated Video via Frame Consistency
by: Ma, Long, et al.
Published: (2024)
by: Ma, Long, et al.
Published: (2024)
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
by: Yao, Ruilin, et al.
Published: (2025)
by: Yao, Ruilin, et al.
Published: (2025)
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Diffusion-Based Depth Inpainting for Transparent and Reflective Objects
by: Sun, Tianyu, et al.
Published: (2024)
by: Sun, Tianyu, et al.
Published: (2024)
TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment
by: Wang, Zhichuan, et al.
Published: (2025)
by: Wang, Zhichuan, et al.
Published: (2025)
Learning Proposes, Geometry Disposes: A Modular Framework for Efficient Spatial Reasoning
by: Zhu, Haichao, et al.
Published: (2026)
by: Zhu, Haichao, et al.
Published: (2026)
Data-Free Generalized Zero-Shot Learning
by: Tang, Bowen, et al.
Published: (2024)
by: Tang, Bowen, et al.
Published: (2024)
Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields
by: Yang, Jing, et al.
Published: (2024)
by: Yang, Jing, et al.
Published: (2024)
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
by: Fu, Yanping, et al.
Published: (2025)
by: Fu, Yanping, et al.
Published: (2025)
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
by: Luo, Zhanpeng, et al.
Published: (2026)
by: Luo, Zhanpeng, et al.
Published: (2026)
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
by: Liao, Ruotong, et al.
Published: (2024)
by: Liao, Ruotong, et al.
Published: (2024)
Towards Spatially Consistent Image Generation: On Incorporating Intrinsic Scene Properties into Diffusion Models
by: Lee, Hyundo, et al.
Published: (2025)
by: Lee, Hyundo, et al.
Published: (2025)
Similar Items
-
Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation
by: Pan, Jiadong, et al.
Published: (2025) -
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025) -
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
by: Zhao, Zhonghan, et al.
Published: (2025) -
SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning
by: Guo, Xiaojun, et al.
Published: (2025) -
SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning
by: Long, Yancheng, et al.
Published: (2026)