Saved in:
| Main Authors: | Shu, Dong, Zhang, Denghui, Hullman, Jessica |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.01597 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning
by: Jiang, Zhida, et al.
Published: (2026)
by: Jiang, Zhida, et al.
Published: (2026)
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
by: Liu, Ziwen, et al.
Published: (2026)
by: Liu, Ziwen, et al.
Published: (2026)
Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
by: Guo, Ziyang, et al.
Published: (2025)
by: Guo, Ziyang, et al.
Published: (2025)
A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024)
by: Gupta, Neha R., et al.
Published: (2024)
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)
by: Xu, Yixuan Even, et al.
Published: (2025)
Probe-Based Data Attribution: Discovering and Mitigating Undesirable Behaviors in LLM Post-Training
by: Xiao, Frank, et al.
Published: (2026)
by: Xiao, Frank, et al.
Published: (2026)
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
by: Ran, Yide, et al.
Published: (2026)
by: Ran, Yide, et al.
Published: (2026)
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
by: Iso, Hayate, et al.
Published: (2026)
by: Iso, Hayate, et al.
Published: (2026)
Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training
by: Gong, Xue, et al.
Published: (2026)
by: Gong, Xue, et al.
Published: (2026)
Explaining and Improving Information Complementarities in Multi-Agent Decision-making
by: Guo, Ziyang, et al.
Published: (2025)
by: Guo, Ziyang, et al.
Published: (2025)
Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling
by: Zhang, Dongping, et al.
Published: (2024)
by: Zhang, Dongping, et al.
Published: (2024)
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
by: Wu, Jianghao, et al.
Published: (2026)
by: Wu, Jianghao, et al.
Published: (2026)
QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)
by: Li, Yuhang, et al.
Published: (2026)
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025)
by: Gao, Wei, et al.
Published: (2025)
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
by: Liu, Wenpu, et al.
Published: (2026)
by: Liu, Wenpu, et al.
Published: (2026)
Mode-Dependent Rectification for Stable PPO Training
by: Mohamad, Mohamad, et al.
Published: (2026)
by: Mohamad, Mohamad, et al.
Published: (2026)
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)
by: Hu, Pingbang, et al.
Published: (2026)
Conformal Prediction and Human Decision Making
by: Hullman, Jessica, et al.
Published: (2025)
by: Hullman, Jessica, et al.
Published: (2025)
Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)
by: Zhang, Zhi, et al.
Published: (2026)
ISACL: Internal State Analyzer for Copyrighted Training Data Leakage
by: Zhang, Guangwei, et al.
Published: (2025)
by: Zhang, Guangwei, et al.
Published: (2025)
On Rollouts in Model-Based Reinforcement Learning
by: Frauenknecht, Bernd, et al.
Published: (2025)
by: Frauenknecht, Bernd, et al.
Published: (2025)
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)
by: Wang, Haoran, et al.
Published: (2023)
ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)
by: Su, Xuanbo, et al.
Published: (2026)
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
by: Pan, Yanzhou, et al.
Published: (2025)
by: Pan, Yanzhou, et al.
Published: (2025)
The Limits of Preference Data for Post-Training
by: Zhao, Eric, et al.
Published: (2025)
by: Zhao, Eric, et al.
Published: (2025)
Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training
by: Wang, Wenshuo, et al.
Published: (2026)
by: Wang, Wenshuo, et al.
Published: (2026)
Directional-Clamp PPO
by: Karpel, Gilad, et al.
Published: (2025)
by: Karpel, Gilad, et al.
Published: (2025)
VinePPO: Refining Credit Assignment in RL Training of LLMs
by: Kazemnejad, Amirhossein, et al.
Published: (2024)
by: Kazemnejad, Amirhossein, et al.
Published: (2024)
Learning to Weight Parameters for Training Data Attribution
by: Li, Shuangqi, et al.
Published: (2025)
by: Li, Shuangqi, et al.
Published: (2025)
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
by: Surana, Rohan, et al.
Published: (2026)
by: Surana, Rohan, et al.
Published: (2026)
Sampling Complexity of TD and PPO in RKHS
by: Zou, Lu, et al.
Published: (2025)
by: Zou, Lu, et al.
Published: (2025)
Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts
by: Luo, Sijia, et al.
Published: (2026)
by: Luo, Sijia, et al.
Published: (2026)
Efficient Ensembles Improve Training Data Attribution
by: Deng, Junwei, et al.
Published: (2024)
by: Deng, Junwei, et al.
Published: (2024)
Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)
by: Gao, Zhaolin, et al.
Published: (2025)
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
by: Lin, Jinxu, et al.
Published: (2024)
by: Lin, Jinxu, et al.
Published: (2024)
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)
by: Xu, Ran, et al.
Published: (2026)
DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
by: Wang, Yujie, et al.
Published: (2026)
by: Wang, Yujie, et al.
Published: (2026)
Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
by: Chen, Jianhui, et al.
Published: (2026)
by: Chen, Jianhui, et al.
Published: (2026)
Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)
by: Zhang, Zheyu, et al.
Published: (2026)
Exploring Training Data Attribution under Limited Access Constraints
by: Zhang, Shiyuan, et al.
Published: (2025)
by: Zhang, Shiyuan, et al.
Published: (2025)
Similar Items
-
Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning
by: Jiang, Zhida, et al.
Published: (2026) -
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
by: Liu, Ziwen, et al.
Published: (2026) -
Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
by: Guo, Ziyang, et al.
Published: (2025) -
A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024) -
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)