Saved in:
| Main Authors: | Wang, Xiaokun, Wang, Peiyu, Pei, Jiangbo, Shen, Wei, Peng, Yi, Hao, Yunzhuo, Qiu, Weijie, Jian, Ai, Xie, Tianyidan, Song, Xuchen, Liu, Yang, Zhou, Yahui |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.07263 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
by: Wang, Peiyu, et al.
Published: (2025)
by: Wang, Peiyu, et al.
Published: (2025)
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
by: Peng, Yi, et al.
Published: (2025)
by: Peng, Yi, et al.
Published: (2025)
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
by: Jian, Ai, et al.
Published: (2025)
by: Jian, Ai, et al.
Published: (2025)
Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
by: Wang, Peiyu, et al.
Published: (2025)
by: Wang, Peiyu, et al.
Published: (2025)
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
by: Liu, Chris Yuhao, et al.
Published: (2024)
by: Liu, Chris Yuhao, et al.
Published: (2024)
Skywork UniPic 3.0: Unified Multi-Image Composition via Sequence Modeling
by: Wei, Hongyang, et al.
Published: (2026)
by: Wei, Hongyang, et al.
Published: (2026)
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
by: Liu, Chris Yuhao, et al.
Published: (2025)
by: Liu, Chris Yuhao, et al.
Published: (2025)
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model
by: Wei, Hongyang, et al.
Published: (2025)
by: Wei, Hongyang, et al.
Published: (2025)
Skywork Open Reasoner 1 Technical Report
by: He, Jujie, et al.
Published: (2025)
by: He, Jujie, et al.
Published: (2025)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
by: Zeng, Liang, et al.
Published: (2025)
by: Zeng, Liang, et al.
Published: (2025)
Unified Reward Model for Multimodal Understanding and Generation
by: Wang, Yibin, et al.
Published: (2025)
by: Wang, Yibin, et al.
Published: (2025)
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
by: Wang, Weiyun, et al.
Published: (2025)
by: Wang, Weiyun, et al.
Published: (2025)
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
by: Fan, Kaixuan, et al.
Published: (2025)
by: Fan, Kaixuan, et al.
Published: (2025)
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
by: Chen, Honghao, et al.
Published: (2025)
by: Chen, Honghao, et al.
Published: (2025)
DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning
by: Wu, Bin, et al.
Published: (2025)
by: Wu, Bin, et al.
Published: (2025)
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
by: Wei, Tianwen, et al.
Published: (2024)
by: Wei, Tianwen, et al.
Published: (2024)
VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding
by: Zhang, Zhihong, et al.
Published: (2025)
by: Zhang, Zhihong, et al.
Published: (2025)
Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)
by: Cao, Qi, et al.
Published: (2025)
Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)
by: Fan, Kaixuan, et al.
Published: (2026)
Learning to Generate via Understanding: Understanding-Driven Intrinsic Rewarding for Unified Multimodal Models
by: Pan, Jiadong, et al.
Published: (2026)
by: Pan, Jiadong, et al.
Published: (2026)
Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
by: Yang, Shidong, et al.
Published: (2026)
by: Yang, Shidong, et al.
Published: (2026)
Spatial Preference Rewarding for MLLMs Spatial Understanding
by: Qiu, Han, et al.
Published: (2025)
by: Qiu, Han, et al.
Published: (2025)
ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding
by: Sun, Zhongxiang, et al.
Published: (2025)
by: Sun, Zhongxiang, et al.
Published: (2025)
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
by: Lou, Xingzhou, et al.
Published: (2024)
by: Lou, Xingzhou, et al.
Published: (2024)
PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
by: Xie, Tianyidan, et al.
Published: (2026)
by: Xie, Tianyidan, et al.
Published: (2026)
BaseReward: A Strong Baseline for Multimodal Reward Model
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning
by: Si, Haotian, et al.
Published: (2026)
by: Si, Haotian, et al.
Published: (2026)
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On
by: Zeng, Liang, et al.
Published: (2024)
by: Zeng, Liang, et al.
Published: (2024)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
WildReward: Learning Reward Models from In-the-Wild Human Interactions
by: Peng, Hao, et al.
Published: (2026)
by: Peng, Hao, et al.
Published: (2026)
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
by: Song, Ruike, et al.
Published: (2025)
by: Song, Ruike, et al.
Published: (2025)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
Reward Collapse in Aligning Large Language Models
by: Song, Ziang, et al.
Published: (2023)
by: Song, Ziang, et al.
Published: (2023)
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
by: Tian, Changyao, et al.
Published: (2026)
by: Tian, Changyao, et al.
Published: (2026)
Similar Items
-
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
by: Wang, Peiyu, et al.
Published: (2025) -
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
by: Peng, Yi, et al.
Published: (2025) -
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
by: Jian, Ai, et al.
Published: (2025) -
Skywork-R1V3 Technical Report
by: Shen, Wei, et al.
Published: (2025) -
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
by: Zhang, Yifan, et al.
Published: (2025)