Guardado en:
| Autores principales: | Xie, Weichu, Zhao, Haozhe, Liu, Wenpu, Zhu, Yongfu, Chen, Liang, Ye, Minghao, Chen, Zirong, Xu, Yuqi, Dong, Shuai, Wang, Ziyue, Xu, Xinbo, Shi, Kean, Wu, Ruoyu, Zhang, Xiaoying, Shao, Wenqi, Chang, Baobao, Duan, Nan, Wang, Jiaqi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.17291 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
por: Liu, Wenpu, et al.
Publicado: (2026)
por: Liu, Wenpu, et al.
Publicado: (2026)
Improving MLLM Training Efficiency via Stage-Aware Sparsity
por: Shi, Kean, et al.
Publicado: (2025)
por: Shi, Kean, et al.
Publicado: (2025)
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
por: Chen, Liang, et al.
Publicado: (2025)
por: Chen, Liang, et al.
Publicado: (2025)
RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades
por: Xu, Xinbo, et al.
Publicado: (2026)
por: Xu, Xinbo, et al.
Publicado: (2026)
Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
por: Yuan, Youliang, et al.
Publicado: (2025)
por: Yuan, Youliang, et al.
Publicado: (2025)
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
por: Xu, Huimin, et al.
Publicado: (2025)
por: Xu, Huimin, et al.
Publicado: (2025)
Visual Preference Optimization with Rubric Rewards
por: Yu, Ya-Qi, et al.
Publicado: (2026)
por: Yu, Ya-Qi, et al.
Publicado: (2026)
Reinforcement Learning with Robust Rubric Rewards
por: Yu, Ya-Qi, et al.
Publicado: (2026)
por: Yu, Ya-Qi, et al.
Publicado: (2026)
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
por: Chen, Liang, et al.
Publicado: (2024)
por: Chen, Liang, et al.
Publicado: (2024)
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
por: Liu, Tianci, et al.
Publicado: (2025)
por: Liu, Tianci, et al.
Publicado: (2025)
DRM: Diffusion-based Reward Model With Step-wise Guidance
por: Zhang, Jaxon, et al.
Publicado: (2026)
por: Zhang, Jaxon, et al.
Publicado: (2026)
Advancing Radar Hand Gesture Recognition: A Hybrid Spectrum Synthetic Framework Merging Simulation with Neural Networks
por: Tang, Jiaqi, et al.
Publicado: (2025)
por: Tang, Jiaqi, et al.
Publicado: (2025)
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
por: Li, Gaotang, et al.
Publicado: (2026)
por: Li, Gaotang, et al.
Publicado: (2026)
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
por: Chen, Liang, et al.
Publicado: (2024)
por: Chen, Liang, et al.
Publicado: (2024)
PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation
por: Xue, Yongfu
Publicado: (2025)
por: Xue, Yongfu
Publicado: (2025)
UMM-RM: An Upcycle-and-Merge MoE Reward Model for Mitigating Reward Hacking
por: Fu, Lingling, et al.
Publicado: (2025)
por: Fu, Lingling, et al.
Publicado: (2025)
Auto-Rubric: Learning From Implicit Weights to Explicit Rubrics for Reward Modeling
por: Xie, Lipeng, et al.
Publicado: (2025)
por: Xie, Lipeng, et al.
Publicado: (2025)
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
por: Du, Huifang, et al.
Publicado: (2024)
por: Du, Huifang, et al.
Publicado: (2024)
Reward Hacking in Rubric-Based Reinforcement Learning
por: Mahmoud, Anas, et al.
Publicado: (2026)
por: Mahmoud, Anas, et al.
Publicado: (2026)
Rubric-Guided Process Reward for Stepwise Model Routing
por: Ye, Shenghao, et al.
Publicado: (2026)
por: Ye, Shenghao, et al.
Publicado: (2026)
Uncertainty-Aware Step-wise Verification with Generative Reward Models
por: Ye, Zihuiwen, et al.
Publicado: (2025)
por: Ye, Zihuiwen, et al.
Publicado: (2025)
Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks
por: Xu, Tianze, et al.
Publicado: (2026)
por: Xu, Tianze, et al.
Publicado: (2026)
Building Rubrics: A Step-By-Step Process
por: Brown, Carol A.
Publicado: (2008)
por: Brown, Carol A.
Publicado: (2008)
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
por: Wang, Haozhe, et al.
Publicado: (2026)
por: Wang, Haozhe, et al.
Publicado: (2026)
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
por: Tyagi, Utkarsh, et al.
Publicado: (2026)
por: Tyagi, Utkarsh, et al.
Publicado: (2026)
AutoRubric: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
por: Jia, Mengzhao, et al.
Publicado: (2025)
por: Jia, Mengzhao, et al.
Publicado: (2025)
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
por: Bi, Baolong, et al.
Publicado: (2025)
por: Bi, Baolong, et al.
Publicado: (2025)
StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling
por: Xu, Zhe
Publicado: (2026)
por: Xu, Zhe
Publicado: (2026)
RubricBench: Aligning Model-Generated Rubrics with Human Standards
por: Zhang, Qiyuan, et al.
Publicado: (2026)
por: Zhang, Qiyuan, et al.
Publicado: (2026)
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
por: Gunjal, Anisha, et al.
Publicado: (2025)
por: Gunjal, Anisha, et al.
Publicado: (2025)
FedUMM: A General Framework for Federated Learning with Unified Multimodal Models
por: Su, Zhaolong, et al.
Publicado: (2026)
por: Su, Zhaolong, et al.
Publicado: (2026)
Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning
por: Fei, Wu, et al.
Publicado: (2025)
por: Fei, Wu, et al.
Publicado: (2025)
RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains
por: Jiang, Haoxiang, et al.
Publicado: (2026)
por: Jiang, Haoxiang, et al.
Publicado: (2026)
RubricRL: Simple Generalizable Rewards for Text-to-Image Generation
por: Feng, Xuelu, et al.
Publicado: (2025)
por: Feng, Xuelu, et al.
Publicado: (2025)
Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation
por: Zhao, Haozhe, et al.
Publicado: (2024)
por: Zhao, Haozhe, et al.
Publicado: (2024)
Subsampled One-Step Estimation for Fast Statistical Inference
por: Su, Miaomiao, et al.
Publicado: (2024)
por: Su, Miaomiao, et al.
Publicado: (2024)
Subsampled One‐Step Estimation for Fast Statistical Inference
por: Miaomiao Su, et al.
Publicado: (2025)
por: Miaomiao Su, et al.
Publicado: (2025)
AdaRubric: Task-Adaptive Rubrics for Reliable LLM Agent Evaluation and Reward Learning
por: Ding, Liang
Publicado: (2026)
por: Ding, Liang
Publicado: (2026)
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
por: Wang, Haozhe, et al.
Publicado: (2026)
por: Wang, Haozhe, et al.
Publicado: (2026)
Modeling, Parameters and Synaptic Plasticity Analysis of Lateral‐Ionic‐Gated Graphene Synaptic FETs
por: Xiaoying He, et al.
Publicado: (2024)
por: Xiaoying He, et al.
Publicado: (2024)
Ejemplares similares
-
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
por: Liu, Wenpu, et al.
Publicado: (2026) -
Improving MLLM Training Efficiency via Stage-Aware Sparsity
por: Shi, Kean, et al.
Publicado: (2025) -
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
por: Chen, Liang, et al.
Publicado: (2025) -
RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades
por: Xu, Xinbo, et al.
Publicado: (2026) -
Curing Miracle Steps in LLM Mathematical Reasoning with Rubric Rewards
por: Yuan, Youliang, et al.
Publicado: (2025)