:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shu, Dong, Zhang, Denghui, Hullman, Jessica
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.01597
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning
by: Jiang, Zhida, et al.
Published: (2026)

Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
by: Liu, Ziwen, et al.
Published: (2026)

Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
by: Guo, Ziyang, et al.
Published: (2025)

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024)

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
by: Xu, Yixuan Even, et al.
Published: (2025)

Probe-Based Data Attribution: Discovering and Mitigating Undesirable Behaviors in LLM Post-Training
by: Xiao, Frank, et al.
Published: (2026)

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
by: Ran, Yide, et al.
Published: (2026)

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
by: Iso, Hayate, et al.
Published: (2026)

Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training
by: Gong, Xue, et al.
Published: (2026)

Explaining and Improving Information Complementarities in Multi-Agent Decision-making
by: Guo, Ziyang, et al.
Published: (2025)

Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling
by: Zhang, Dongping, et al.
Published: (2024)

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
by: Wu, Jianghao, et al.
Published: (2026)

QuRL: Efficient Reinforcement Learning with Quantized Rollout
by: Li, Yuhang, et al.
Published: (2026)

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025)

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
by: Liu, Wenpu, et al.
Published: (2026)

Mode-Dependent Rectification for Stable PPO Training
by: Mohamad, Mohamad, et al.
Published: (2026)

Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)

Conformal Prediction and Human Decision Making
by: Hullman, Jessica, et al.
Published: (2025)

Train Less, Learn More: Adaptive Efficient Rollout Optimization for Group-Based Reinforcement Learning
by: Zhang, Zhi, et al.
Published: (2026)

ISACL: Internal State Analyzer for Copyrighted Training Data Leakage
by: Zhang, Guangwei, et al.
Published: (2025)

On Rollouts in Model-Based Reinforcement Learning
by: Frauenknecht, Bernd, et al.
Published: (2025)

Guided Cooperation in Hierarchical Reinforcement Learning via Model-based Rollout
by: Wang, Haoran, et al.
Published: (2023)

ROAST: Rollout-based On-distribution Activation Steering Technique
by: Su, Xuanbo, et al.
Published: (2026)

ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
by: Pan, Yanzhou, et al.
Published: (2025)

The Limits of Preference Data for Post-Training
by: Zhao, Eric, et al.
Published: (2025)

Gradient-Informed Temporal Sampling Improves Rollout Accuracy in PDE Surrogate Training
by: Wang, Wenshuo, et al.
Published: (2026)

Directional-Clamp PPO
by: Karpel, Gilad, et al.
Published: (2025)

VinePPO: Refining Credit Assignment in RL Training of LLMs
by: Kazemnejad, Amirhossein, et al.
Published: (2024)

Learning to Weight Parameters for Training Data Attribution
by: Li, Shuangqi, et al.
Published: (2025)

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
by: Surana, Rohan, et al.
Published: (2026)

Sampling Complexity of TD and PPO in RKHS
by: Zou, Lu, et al.
Published: (2025)

Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts
by: Luo, Sijia, et al.
Published: (2026)

Efficient Ensembles Improve Training Data Attribution
by: Deng, Junwei, et al.
Published: (2024)

Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
by: Lin, Jinxu, et al.
Published: (2024)

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)

DARTS: Distribution-Aware Active Rollout Trajectory Shaping for Accelerating LLM Reinforcement Learning
by: Wang, Yujie, et al.
Published: (2026)

Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
by: Chen, Jianhui, et al.
Published: (2026)

Consolidating Rewarded Perturbations for LLM Post-Training
by: Zhang, Zheyu, et al.
Published: (2026)

Exploring Training Data Attribution under Limited Access Constraints
by: Zhang, Shiyuan, et al.
Published: (2025)