:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Tao, Li, Shuo, Sun, Yan, Ding, Dongsheng, Dobriban, Edgar
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.07114
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
by: Kim, Soeun, et al.
Published: (2026)

Optimal Decision-Making Based on Prediction Sets
by: Wang, Tao, et al.
Published: (2026)

Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
by: Zhang, Yuheng, et al.
Published: (2025)

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
by: Wu, Jianghao, et al.
Published: (2026)

Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
by: Wang, Xinglin, et al.
Published: (2025)

One-Shot Safety Alignment for Large Language Models via Optimal Dualization
by: Huang, Xinmeng, et al.
Published: (2024)

OptPO: Optimal Rollout Allocation for Test-time Policy Optimization
by: Wang, Youkang, et al.
Published: (2025)

Singleton-Optimized Conformal Prediction
by: Wang, Tao, et al.
Published: (2025)

Bayes-Optimal Classifiers under Group Fairness
by: Zeng, Xianli, et al.
Published: (2022)

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
by: Liu, Wenpu, et al.
Published: (2026)

On Rollouts in Model-Based Reinforcement Learning
by: Frauenknecht, Bernd, et al.
Published: (2025)

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes
by: Middelhuis, Jeroen, et al.
Published: (2025)

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
by: Zhai, Yuanzhao, et al.
Published: (2024)

DiffusionRollout: Uncertainty-Aware Rollout Planning in Long-Horizon PDE Solving
by: Yoo, Seungwoo, et al.
Published: (2026)

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)

Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
by: Nguyen, Hieu Trung, et al.
Published: (2026)

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)

Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
by: Frauenknecht, Bernd, et al.
Published: (2024)

RTMC: Step-Level Credit Assignment via Rollout Trees
by: Wang, Tao, et al.
Published: (2026)

SymmPI: Predictive Inference for Data with Group Symmetries
by: Dobriban, Edgar, et al.
Published: (2023)

Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)

Maximum Entropy Exploration Without the Rollouts
by: Adamczyk, Jacob, et al.
Published: (2026)

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning
by: Mundada, Gagan, et al.
Published: (2026)

EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)

QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)

Statistical Methods in Generative AI
by: Dobriban, Edgar
Published: (2025)

ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
by: Tian, Jindong, et al.
Published: (2025)

HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness
by: Wang, Xinyi, et al.
Published: (2025)

Solving a Research Problem in Mathematical Statistics with AI Assistance
by: Dobriban, Edgar
Published: (2025)

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)

Learning Self-Correction in Vision-Language Models via Rollout Augmentation
by: Ding, Yi, et al.
Published: (2026)

Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)

Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
by: Cohen, Lior, et al.
Published: (2026)

Controlling Transient Amplification Improves Long-horizon Rollouts
by: Pervez, Adeel, et al.
Published: (2026)

ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts
by: Pang, Jing-Cheng, et al.
Published: (2025)

Selective Rollout: Mid-Trajectory Termination for Multi-Sample Agent RL
by: Zhai, Zhiyuan, et al.
Published: (2026)

Minimax Optimal Fair Classification with Bounded Demographic Disparity
by: Zeng, Xianli, et al.
Published: (2024)

MultiRisk: Multiple Risk Control via Iterative Score Thresholding
by: Joshi, Sunay, et al.
Published: (2025)

SGNO: Spectral Generator Neural Operators for Stable Long Horizon PDE Rollouts
by: Li, Jiayi, et al.
Published: (2026)

ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)