:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Qunzhong, Liu, Jie, Liang, Jiajun, Jiang, Yilei, Zhang, Yuanxing, Zheng, Yaozhi, Wang, Xintao, Wan, Pengfei, Yue, Xiangyu, Liu, Jiaheng
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.10518
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
by: Jiang, Yilei, et al.
Published: (2025)

Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)

OneThinker: All-in-one Reasoning Model for Image and Video
by: Feng, Kaituo, et al.
Published: (2025)

VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)

Flow-GRPO: Training Flow Matching Models via Online RL
by: Liu, Jie, et al.
Published: (2025)

GARDO: Reinforcing Diffusion Models without Reward Hacking
by: He, Haoran, et al.
Published: (2025)

A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)

Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)

QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
by: Yang, Yiliu, et al.
Published: (2025)

V-Thinker: Interactive Thinking with Images
by: Qiao, Runqi, et al.
Published: (2025)

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)

VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
by: Cai, Minghong, et al.
Published: (2025)

SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking
by: Liu, Junnan, et al.
Published: (2025)

GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
by: Cheng, Zixu, et al.
Published: (2026)

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
by: Wang, Shijian, et al.
Published: (2025)

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)

Thinker: Learning to Think Fast and Slow
by: Chung, Stephen, et al.
Published: (2025)

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)

Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning
by: Li, Xueheng, et al.
Published: (2026)

In-Context Audio Control of Video Diffusion Transformers
by: Liu, Wenze, et al.
Published: (2025)

Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)

SemanticGen: Video Generation in Semantic Space
by: Bai, Jianhong, et al.
Published: (2025)

TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
by: Wang, Danqing, et al.
Published: (2024)

SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
by: Fan, Kaixuan, et al.
Published: (2025)

GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping
by: Wang, Jing, et al.
Published: (2025)

GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)

Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)

KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation
by: Zhang, Dalong, et al.
Published: (2025)

Vero: An Open RL Recipe for General Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2026)

Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)

Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)

PreferThinker: Reasoning-based Personalized Image Preference Assessment
by: Xu, Shengqi, et al.
Published: (2025)

LightThinker++: From Reasoning Compression to Memory Management
by: Zhu, Yuqi, et al.
Published: (2026)

UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
by: Wang, Zengzhi, et al.
Published: (2025)

EditThinker: Unlocking Iterative Reasoning for Any Image Editor
by: Li, Hongyu, et al.
Published: (2025)

Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning
by: Gan, Siyuan, et al.
Published: (2026)

SketchVideo: Sketch-based Video Generation and Editing
by: Liu, Feng-Lin, et al.
Published: (2025)

UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
by: Du, Shian, et al.
Published: (2025)

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis
by: Wang, Qunzhong, et al.
Published: (2024)