Saved in:
| Main Authors: | Wang, Qunzhong, Liu, Jie, Liang, Jiajun, Jiang, Yilei, Zhang, Yuanxing, Zheng, Yaozhi, Wang, Xintao, Wan, Pengfei, Yue, Xiangyu, Liu, Jiaheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.10518 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
by: Jiang, Yilei, et al.
Published: (2025)
by: Jiang, Yilei, et al.
Published: (2025)
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026)
by: Wang, Yuan, et al.
Published: (2026)
OneThinker: All-in-one Reasoning Model for Image and Video
by: Feng, Kaituo, et al.
Published: (2025)
by: Feng, Kaituo, et al.
Published: (2025)
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026)
by: Wang, Chaoyang, et al.
Published: (2026)
Flow-GRPO: Training Flow Matching Models via Online RL
by: Liu, Jie, et al.
Published: (2025)
by: Liu, Jie, et al.
Published: (2025)
GARDO: Reinforcing Diffusion Models without Reward Hacking
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
by: Wu, Shengqiong, et al.
Published: (2025)
by: Wu, Shengqiong, et al.
Published: (2025)
Scaling Image and Video Generation via Test-Time Evolutionary Search
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
QuadSentinel: Sequent Safety for Machine-Checkable Control in Multi-agent Systems
by: Yang, Yiliu, et al.
Published: (2025)
by: Yang, Yiliu, et al.
Published: (2025)
V-Thinker: Interactive Thinking with Images
by: Qiao, Runqi, et al.
Published: (2025)
by: Qiao, Runqi, et al.
Published: (2025)
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
by: Cai, Minghong, et al.
Published: (2025)
by: Cai, Minghong, et al.
Published: (2025)
SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking
by: Liu, Junnan, et al.
Published: (2025)
by: Liu, Junnan, et al.
Published: (2025)
GraphThinker: Reinforcing Temporally Grounded Video Reasoning with Event Graph Thinking
by: Cheng, Zixu, et al.
Published: (2026)
by: Cheng, Zixu, et al.
Published: (2026)
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
by: Wang, Shijian, et al.
Published: (2025)
by: Wang, Shijian, et al.
Published: (2025)
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
by: Peng, Tianhao, et al.
Published: (2025)
by: Peng, Tianhao, et al.
Published: (2025)
Thinker: Learning to Think Fast and Slow
by: Chung, Stephen, et al.
Published: (2025)
by: Chung, Stephen, et al.
Published: (2025)
Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models
by: Ye, Zixuan, et al.
Published: (2025)
by: Ye, Zixuan, et al.
Published: (2025)
Pest-Thinker: Learning to Think and Reason like Entomologists via Reinforcement Learning
by: Li, Xueheng, et al.
Published: (2026)
by: Li, Xueheng, et al.
Published: (2026)
In-Context Audio Control of Video Diffusion Transformers
by: Liu, Wenze, et al.
Published: (2025)
by: Liu, Wenze, et al.
Published: (2025)
Monet: Reasoning in Latent Visual Space Beyond Images and Language
by: Wang, Qixun, et al.
Published: (2025)
by: Wang, Qixun, et al.
Published: (2025)
SemanticGen: Video Generation in Semantic Space
by: Bai, Jianhong, et al.
Published: (2025)
by: Bai, Jianhong, et al.
Published: (2025)
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
by: Wang, Danqing, et al.
Published: (2024)
by: Wang, Danqing, et al.
Published: (2024)
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
by: Fan, Kaixuan, et al.
Published: (2025)
by: Fan, Kaixuan, et al.
Published: (2025)
GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping
by: Wang, Jing, et al.
Published: (2025)
by: Wang, Jing, et al.
Published: (2025)
GameFactory: Creating New Games with Generative Interactive Videos
by: Yu, Jiwen, et al.
Published: (2025)
by: Yu, Jiwen, et al.
Published: (2025)
Improving Video Generation with Human Feedback
by: Liu, Jie, et al.
Published: (2025)
by: Liu, Jie, et al.
Published: (2025)
KAG-Thinker: Interactive Thinking and Deep Reasoning in LLMs via Knowledge-Augmented Generation
by: Zhang, Dalong, et al.
Published: (2025)
by: Zhang, Dalong, et al.
Published: (2025)
Vero: An Open RL Recipe for General Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2026)
by: Sarch, Gabriel, et al.
Published: (2026)
Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)
by: Fan, Kaixuan, et al.
Published: (2026)
Thinking with Drafts: Speculative Temporal Reasoning for Efficient Long Video Understanding
by: Hu, Pengfei, et al.
Published: (2025)
by: Hu, Pengfei, et al.
Published: (2025)
PreferThinker: Reasoning-based Personalized Image Preference Assessment
by: Xu, Shengqi, et al.
Published: (2025)
by: Xu, Shengqi, et al.
Published: (2025)
LightThinker++: From Reasoning Compression to Memory Management
by: Zhu, Yuqi, et al.
Published: (2026)
by: Zhu, Yuqi, et al.
Published: (2026)
UniVideo: Unified Understanding, Generation, and Editing for Videos
by: Wei, Cong, et al.
Published: (2025)
by: Wei, Cong, et al.
Published: (2025)
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
by: Wang, Zengzhi, et al.
Published: (2025)
by: Wang, Zengzhi, et al.
Published: (2025)
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
by: Li, Hongyu, et al.
Published: (2025)
by: Li, Hongyu, et al.
Published: (2025)
Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning
by: Gan, Siyuan, et al.
Published: (2026)
by: Gan, Siyuan, et al.
Published: (2026)
SketchVideo: Sketch-based Video Generation and Editing
by: Liu, Feng-Lin, et al.
Published: (2025)
by: Liu, Feng-Lin, et al.
Published: (2025)
UniMMVSR: A Unified Multi-Modal Framework for Cascaded Video Super-Resolution
by: Du, Shian, et al.
Published: (2025)
by: Du, Shian, et al.
Published: (2025)
Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis
by: Wang, Qunzhong, et al.
Published: (2024)
by: Wang, Qunzhong, et al.
Published: (2024)
Similar Items
-
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
by: Jiang, Yilei, et al.
Published: (2025) -
Think, then Score: Decoupled Reasoning and Scoring for Video Reward Modeling
by: Wang, Yuan, et al.
Published: (2026) -
OneThinker: All-in-one Reasoning Model for Image and Video
by: Feng, Kaituo, et al.
Published: (2025) -
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
by: Wang, Chaoyang, et al.
Published: (2026) -
Flow-GRPO: Training Flow Matching Models via Online RL
by: Liu, Jie, et al.
Published: (2025)