Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Lian, Yongsheng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2503.09512
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929756242444288
author	Lian, Yongsheng
author_facet	Lian, Yongsheng
contents	Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_09512
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Reinforcement Learning is all You Need Lian, Yongsheng Machine Learning Computation and Language Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy.
title	Reinforcement Learning is all You Need
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2503.09512

Similar Items