Staff View: :: Library Catalog

$Cover Image$

Saved in:

Bibliographic Details
Main Authors:	Di, Xinhan, JoyJiaoW
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2508.01604
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909719189258240
author	Di, Xinhan JoyJiaoW
author_facet	Di, Xinhan JoyJiaoW
contents	Reinforcement learning scaling enhances the reasoning capabilities of large language models, with reinforcement learning serving as the key technique to draw out complex reasoning. However, key technical details of state-of-the-art reasoning LLMs, such as those in the OpenAI O series, Claude 3 series, DeepMind's Gemini 2.5 series, and Grok 3 series, remain undisclosed, making it difficult for the research community to replicate their reinforcement learning training results. Therefore, we start our study from an Early Preview Reinforcement Learning (EPRLI) algorithm built on the open-source GRPO framework, incorporating difficulty-aware intervention for math problems. Applied to a 1.5B-parameter LLM, our method achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, superpass O1-Preview and is comparable to O1-mini within standard school-lab settings.
format	Preprint
id	arxiv_https___arxiv_org_abs_2508_01604
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention Di, Xinhan JoyJiaoW Machine Learning Reinforcement learning scaling enhances the reasoning capabilities of large language models, with reinforcement learning serving as the key technique to draw out complex reasoning. However, key technical details of state-of-the-art reasoning LLMs, such as those in the OpenAI O series, Claude 3 series, DeepMind's Gemini 2.5 series, and Grok 3 series, remain undisclosed, making it difficult for the research community to replicate their reinforcement learning training results. Therefore, we start our study from an Early Preview Reinforcement Learning (EPRLI) algorithm built on the open-source GRPO framework, incorporating difficulty-aware intervention for math problems. Applied to a 1.5B-parameter LLM, our method achieves 50.0% on AIME24, 89.2% on Math500, 77.1% on AMC, 35.3% on Minerva, and 51.9% on OBench, superpass O1-Preview and is comparable to O1-mini within standard school-lab settings.
title	Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention
topic	Machine Learning
url	https://arxiv.org/abs/2508.01604