Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.09512 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866929756242444288 |
|---|---|
| author | Lian, Yongsheng |
| author_facet | Lian, Yongsheng |
| contents | Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_09512 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Reinforcement Learning is all You Need Lian, Yongsheng Machine Learning Computation and Language Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they do not always yield correct answers. These findings highlight the potential of RL-only training for reasoning enhancement and suggest future work on refining reward structures to bridge emergent insights with accuracy. |
| title | Reinforcement Learning is all You Need |
| topic | Machine Learning Computation and Language |
| url | https://arxiv.org/abs/2503.09512 |