Table des matières: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Yu, Zhiyin, Mou, Yuchen, Yan, Juncheng, Luo, Junyu, Chen, Chunchun, Wei, Xing, Liu, Yunhui, Sun, Hongru, Zhang, Yuxing, Xu, Jun, Bian, Yatao, Zhang, Ming, Ye, Wei, He, Tieke, Yang, Jie, Zheng, Guanjie, Wu, Zhonghai, Zhang, Bo, Bai, Lei, Luo, Xiao
Format:	Preprint
Publié:	2026
Sujets:	Machine Learning Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2604.17312
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Table des matières:

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework built around three complementary perspectives: the data-centric perspective, the training-centric perspective, and the framework-centric perspective. We develop a taxonomy of existing methods, summarize representative approaches in each category, and analyze their strengths and limitations. Our taxonomy aims to provide a clear conceptual foundation for understanding the design space of data-efficient RL for LLMs and to guide researchers working in this emerging area. We hope this survey offers a comprehensive roadmap for future research and inspires new directions toward more efficient and scalable reinforcement learning post-training for LLMs.

Documents similaires