Guardado en:
| Autores principales: | Liu, Wenpu, Xu, Yuqi, Xie, Weichu, Zhu, Yongfu, Dong, Shuai, Wang, Ziyue, Shao, Wenqi, Zhang, Xiaoying, Yang, Tong, Duan, Nan, Wang, Jiaqi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.17333 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Step-wise Rubric Rewards for LLM Reasoning
por: Xie, Weichu, et al.
Publicado: (2026)
por: Xie, Weichu, et al.
Publicado: (2026)
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
por: Zhang, Zhi, et al.
Publicado: (2026)
por: Zhang, Zhi, et al.
Publicado: (2026)
ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning
por: Song, Jingwei, et al.
Publicado: (2026)
por: Song, Jingwei, et al.
Publicado: (2026)
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
por: Xu, Yixuan Even, et al.
Publicado: (2025)
por: Xu, Yixuan Even, et al.
Publicado: (2025)
QuRL: Efficient Reinforcement Learning with Quantized Rollout
por: Li, Yuhang, et al.
Publicado: (2026)
por: Li, Yuhang, et al.
Publicado: (2026)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
por: Lu, Xiaodong, et al.
Publicado: (2026)
por: Lu, Xiaodong, et al.
Publicado: (2026)
Where to Spend Rollouts: Hit-Utility Optimal Rollout Allocation for Group-Based RLVR
por: Wang, Tao, et al.
Publicado: (2026)
por: Wang, Tao, et al.
Publicado: (2026)
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
por: Liu, Bingshuai, et al.
Publicado: (2025)
por: Liu, Bingshuai, et al.
Publicado: (2025)
On Rollouts in Model-Based Reinforcement Learning
por: Frauenknecht, Bernd, et al.
Publicado: (2025)
por: Frauenknecht, Bernd, et al.
Publicado: (2025)
FedUMM: A General Framework for Federated Learning with Unified Multimodal Models
por: Su, Zhaolong, et al.
Publicado: (2026)
por: Su, Zhaolong, et al.
Publicado: (2026)
EchoRL: Reinforcement Learning via Rollout Echoing
por: Bi, Jinhe, et al.
Publicado: (2026)
por: Bi, Jinhe, et al.
Publicado: (2026)
Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine
por: Xie, Jiacheng, et al.
Publicado: (2025)
por: Xie, Jiacheng, et al.
Publicado: (2025)
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
por: Wang, Haoran, et al.
Publicado: (2023)
por: Wang, Haoran, et al.
Publicado: (2023)
Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts
por: Pang, Jing-Cheng, et al.
Publicado: (2024)
por: Pang, Jing-Cheng, et al.
Publicado: (2024)
Group Relative Policy Optimization for Robust Blind Interference Alignment with Fluid Antennas
por: Peng, Jianqiu, et al.
Publicado: (2026)
por: Peng, Jianqiu, et al.
Publicado: (2026)
Portfolio Reinforcement Learning with Scenario-Context Rollout
por: Bendatu, Vanya Priscillia, et al.
Publicado: (2026)
por: Bendatu, Vanya Priscillia, et al.
Publicado: (2026)
MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning
por: Kim, Youngeun
Publicado: (2026)
por: Kim, Youngeun
Publicado: (2026)
ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
por: Pang, Jing-Cheng, et al.
Publicado: (2025)
por: Pang, Jing-Cheng, et al.
Publicado: (2025)
NeuronMLP: Efficient LLM Inference via Singular Value Decomposition Compression and Tiling on AWS Trainium
por: Song, Dinghong, et al.
Publicado: (2025)
por: Song, Dinghong, et al.
Publicado: (2025)
SLMFix: Leveraging Small Language Models for Error Fixing with Reinforcement Learning
por: Fu, David Jiahao, et al.
Publicado: (2025)
por: Fu, David Jiahao, et al.
Publicado: (2025)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
por: Zhou, Yuzhen, et al.
Publicado: (2025)
por: Zhou, Yuzhen, et al.
Publicado: (2025)
Dual‐Stabilization Strategy for Inhibiting Metal Nanoparticle Sintering Over Ceria Surfaces
por: Wenpu Fan, et al.
Publicado: (2025)
por: Wenpu Fan, et al.
Publicado: (2025)
Learning with Errors over Group Rings Constructed by Semi-direct Product
por: Liu, Jiaqi, et al.
Publicado: (2023)
por: Liu, Jiaqi, et al.
Publicado: (2023)
SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache
por: Chang, Chi-Chih, et al.
Publicado: (2026)
por: Chang, Chi-Chih, et al.
Publicado: (2026)
Asymmetric dynamics of GABAergic system and paradoxical responses of GABAergic neurons in piriform seizures
por: Yan Tao, et al.
Publicado: (2024)
por: Yan Tao, et al.
Publicado: (2024)
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
por: Liu, Xiangyan, et al.
Publicado: (2025)
por: Liu, Xiangyan, et al.
Publicado: (2025)
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
por: Cong, Peizhuang, et al.
Publicado: (2025)
por: Cong, Peizhuang, et al.
Publicado: (2025)
Reinforcement Learning Enabled Nanophotonic Devices
por: Zi Wang, et al.
Publicado: (2025)
por: Zi Wang, et al.
Publicado: (2025)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
por: Nguyen, Hieu Trung, et al.
Publicado: (2026)
por: Nguyen, Hieu Trung, et al.
Publicado: (2026)
IA2: Leveraging Instance-Aware Index Advisor with Reinforcement Learning for Diverse Workloads
por: Wang, Taiyi, et al.
Publicado: (2024)
por: Wang, Taiyi, et al.
Publicado: (2024)
The Structural Influence of Low-Credibility Narratives During the COVID-19 Vaccine Rollout
por: Ng, Lynnette Hui Xian, et al.
Publicado: (2026)
por: Ng, Lynnette Hui Xian, et al.
Publicado: (2026)
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
por: Wang, Yujie, et al.
Publicado: (2026)
por: Wang, Yujie, et al.
Publicado: (2026)
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
por: Zhao, Hanye, et al.
Publicado: (2024)
por: Zhao, Hanye, et al.
Publicado: (2024)
AdaCodec: A Predictive Visual Code for Video MLLMs
por: Hou, Haowen, et al.
Publicado: (2026)
por: Hou, Haowen, et al.
Publicado: (2026)
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
por: Surana, Rohan, et al.
Publicado: (2026)
por: Surana, Rohan, et al.
Publicado: (2026)
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
por: Xing, Shangyu, et al.
Publicado: (2025)
por: Xing, Shangyu, et al.
Publicado: (2025)
A Note on Loss Functions and Error Compounding in Model-based Reinforcement Learning
por: Jiang, Nan
Publicado: (2024)
por: Jiang, Nan
Publicado: (2024)
Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
por: Zhu, Tianshu, et al.
Publicado: (2026)
por: Zhu, Tianshu, et al.
Publicado: (2026)
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
por: Zhang, Xiaoying, et al.
Publicado: (2026)
por: Zhang, Xiaoying, et al.
Publicado: (2026)
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
por: Li, Gang, et al.
Publicado: (2025)
por: Li, Gang, et al.
Publicado: (2025)
Ejemplares similares
-
Step-wise Rubric Rewards for LLM Reasoning
por: Xie, Weichu, et al.
Publicado: (2026) -
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
por: Zhang, Zhi, et al.
Publicado: (2026) -
ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning
por: Song, Jingwei, et al.
Publicado: (2026) -
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
por: Xu, Yixuan Even, et al.
Publicado: (2025) -
QuRL: Efficient Reinforcement Learning with Quantized Rollout
por: Li, Yuhang, et al.
Publicado: (2026)