Saved in:
| Main Authors: | Wang, Tao, Li, Shuo, Sun, Yan, Ding, Dongsheng, Dobriban, Edgar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.07114 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
by: Kim, Soeun, et al.
Published: (2026)
by: Kim, Soeun, et al.
Published: (2026)
Optimal Decision-Making Based on Prediction Sets
by: Wang, Tao, et al.
Published: (2026)
by: Wang, Tao, et al.
Published: (2026)
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
by: Zhang, Yuheng, et al.
Published: (2025)
by: Zhang, Yuheng, et al.
Published: (2025)
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
by: Wu, Jianghao, et al.
Published: (2026)
by: Wu, Jianghao, et al.
Published: (2026)
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
by: Wang, Xinglin, et al.
Published: (2025)
by: Wang, Xinglin, et al.
Published: (2025)
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)
by: Huang, Xinmeng, et al.
Published: (2024)
OptPO: Optimal Rollout Allocation for Test-time Policy Optimization
by: Wang, Youkang, et al.
Published: (2025)
by: Wang, Youkang, et al.
Published: (2025)
Singleton-Optimized Conformal Prediction
by: Wang, Tao, et al.
Published: (2025)
by: Wang, Tao, et al.
Published: (2025)
Bayes-Optimal Classifiers under Group Fairness
by: Zeng, Xianli, et al.
Published: (2022)
by: Zeng, Xianli, et al.
Published: (2022)
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
by: Liu, Wenpu, et al.
Published: (2026)
by: Liu, Wenpu, et al.
Published: (2026)
On Rollouts in Model-Based Reinforcement Learning
by: Frauenknecht, Bernd, et al.
Published: (2025)
by: Frauenknecht, Bernd, et al.
Published: (2025)
A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes
by: Middelhuis, Jeroen, et al.
Published: (2025)
by: Middelhuis, Jeroen, et al.
Published: (2025)
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024)
by: Zhai, Yuanzhao, et al.
Published: (2024)
DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
by: Yoo, Seungwoo, et al.
Published: (2026)
by: Yoo, Seungwoo, et al.
Published: (2026)
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)
by: Xu, Yixuan Even, et al.
Published: (2025)
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)
by: Nguyen, Hieu Trung, et al.
Published: (2026)
How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)
by: Fang, Yangyi, et al.
Published: (2026)
Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
by: Frauenknecht, Bernd, et al.
Published: (2024)
by: Frauenknecht, Bernd, et al.
Published: (2024)
RTMC: Step-Level Credit Assignment via Rollout Trees
by: Wang, Tao, et al.
Published: (2026)
by: Wang, Tao, et al.
Published: (2026)
SymmPI: Predictive Inference for Data with Group Symmetries
by: Dobriban, Edgar, et al.
Published: (2023)
by: Dobriban, Edgar, et al.
Published: (2023)
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)
by: Zhang, Zhi, et al.
Published: (2026)
Maximum Entropy Exploration Without the Rollouts
by: Adamczyk, Jacob, et al.
Published: (2026)
by: Adamczyk, Jacob, et al.
Published: (2026)
WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning
by: Mundada, Gagan, et al.
Published: (2026)
by: Mundada, Gagan, et al.
Published: (2026)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)
by: Li, Yuhang, et al.
Published: (2026)
Statistical Methods in Generative AI
by: Dobriban, Edgar
Published: (2025)
by: Dobriban, Edgar
Published: (2025)
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
by: Tian, Jindong, et al.
Published: (2025)
by: Tian, Jindong, et al.
Published: (2025)
HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
Solving a Research Problem in Mathematical Statistics with AI Assistance
by: Dobriban, Edgar
Published: (2025)
by: Dobriban, Edgar
Published: (2025)
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)
by: Wang, Haoran, et al.
Published: (2023)
Learning Self-Correction in Vision-Language Models via Rollout Augmentation
by: Ding, Yi, et al.
Published: (2026)
by: Ding, Yi, et al.
Published: (2026)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
by: Cohen, Lior, et al.
Published: (2026)
by: Cohen, Lior, et al.
Published: (2026)
Controlling Transient Amplification Improves Long-horizon Rollouts
by: Pervez, Adeel, et al.
Published: (2026)
by: Pervez, Adeel, et al.
Published: (2026)
ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2025)
by: Pang, Jing-Cheng, et al.
Published: (2025)
Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
by: Zhai, Zhiyuan, et al.
Published: (2026)
by: Zhai, Zhiyuan, et al.
Published: (2026)
Minimax Optimal Fair Classification with Bounded Demographic Disparity
by: Zeng, Xianli, et al.
Published: (2024)
by: Zeng, Xianli, et al.
Published: (2024)
MultiRisk: Multiple Risk Control via Iterative Score Thresholding
by: Joshi, Sunay, et al.
Published: (2025)
by: Joshi, Sunay, et al.
Published: (2025)
SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
by: Li, Jiayi, et al.
Published: (2026)
by: Li, Jiayi, et al.
Published: (2026)
ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)
by: Su, Xuanbo, et al.
Published: (2026)
Similar Items
-
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
by: Kim, Soeun, et al.
Published: (2026) -
Optimal Decision-Making Based on Prediction Sets
by: Wang, Tao, et al.
Published: (2026) -
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
by: Zhang, Yuheng, et al.
Published: (2025) -
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
by: Wu, Jianghao, et al.
Published: (2026) -
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
by: Wang, Xinglin, et al.
Published: (2025)