Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gu, Zhengyao, Light, Jonathan, Astudillo, Raul, Ye, Ziyu, He, Langzhou, Zou, Henry Peng, Cheng, Wei, Paternain, Santiago, Yu, Philip S., Yue, Yisong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computation and Language 68T05, 68T20, 90C40, 62L05 I.2.6; I.2.8; I.2.1; F.1.1
Online Access:	https://arxiv.org/abs/2602.20532
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this work, we propose ACTOR-CURATOR, a scalable and fully automated curriculum learning framework for reinforcement learning post-training of large language models (LLMs). ACTOR-CURATOR learns a neural curator that dynamically selects training problems from large problem banks by directly optimizing for expected policy performance improvement. We formulate problem selection as a non-stationary stochastic bandit problem, derive a principled loss function based on online stochastic mirror descent, and establish regret guarantees under partial feedback. Empirically, ACTOR-CURATOR consistently outperforms uniform sampling and strong curriculum baselines across a wide range of challenging reasoning benchmarks, demonstrating improved training stability and efficiency. Notably, it achieves relative gains of 28.6% on AIME2024 and 30.5% on ARC-1D over the strongest baseline and up to 80% speedup. These results suggest that ACTOR-CURATOR is a powerful and practical approach for scalable LLM post-training.

Similar Items