Saved in:
Bibliographic Details
Main Author: Lian, Yongsheng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.09512
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929756242444288
author Lian, Yongsheng
author_facet Lian, Yongsheng
contents Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2503_09512
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Reinforcement Learning is all You Need
Lian, Yongsheng
Machine Learning
Computation and Language
Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy.
title Reinforcement Learning is all You Need
topic Machine Learning
Computation and Language
url https://arxiv.org/abs/2503.09512