Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Niu, Lujie, Shen, Lei, Jiang, Yi, Yuan, Caixia, Wang, Xiaojie, Su, Wenbo, zheng, Bo
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2511.01470
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918263712120832
author	Niu, Lujie Shen, Lei Jiang, Yi Yuan, Caixia Wang, Xiaojie Su, Wenbo zheng, Bo
author_facet	Niu, Lujie Shen, Lei Jiang, Yi Yuan, Caixia Wang, Xiaojie Su, Wenbo zheng, Bo
contents	While long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models, the reasoning process often remains redundant and computational budget uncontrollable, leading to inefficient resource usage. To address this limitation, we propose \textbf{Budget-Aware Reasoning Distillation (BARD)}, a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length. BARD uses the thinking budget as a user-specified control signal, allowing the model to dynamically balance reasoning performance and computational efficiency. To achieve this concept, BARD introduces a two-phase training regimen. The first phase, Supervised Fine-Tuning (SFT) on teacher-generated long CoT data compressed to various budget levels, bootstrapping the model's understanding of budget constraints. The second phase leverages Reinforcement Learning (RL) from a reward signal in consideration of reasoning performance and budget fidelity simultaneously. Incorporating the two-phase regimen is crucial to avoiding policy degradation and ensuring that both objectives are optimized jointly. Extensive experiments demonstrate that our method empowers an 8B student model to achieve strong performance on challenging reasoning benchmarks (\textit{AIME24, AIME25, GPQA}) while providing precise and adaptive control over its reasoning length across a wide range of budgets.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_01470
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	BARD: budget-aware reasoning distillation Niu, Lujie Shen, Lei Jiang, Yi Yuan, Caixia Wang, Xiaojie Su, Wenbo zheng, Bo Computation and Language While long Chain-of-Thought (CoT) distillation effectively transfers reasoning capability to smaller language models, the reasoning process often remains redundant and computational budget uncontrollable, leading to inefficient resource usage. To address this limitation, we propose \textbf{Budget-Aware Reasoning Distillation (BARD)}, a novel framework that simultaneously distills reasoning capability and enables fine-grained control over the reasoning length. BARD uses the thinking budget as a user-specified control signal, allowing the model to dynamically balance reasoning performance and computational efficiency. To achieve this concept, BARD introduces a two-phase training regimen. The first phase, Supervised Fine-Tuning (SFT) on teacher-generated long CoT data compressed to various budget levels, bootstrapping the model's understanding of budget constraints. The second phase leverages Reinforcement Learning (RL) from a reward signal in consideration of reasoning performance and budget fidelity simultaneously. Incorporating the two-phase regimen is crucial to avoiding policy degradation and ensuring that both objectives are optimized jointly. Extensive experiments demonstrate that our method empowers an 8B student model to achieve strong performance on challenging reasoning benchmarks (\textit{AIME24, AIME25, GPQA}) while providing precise and adaptive control over its reasoning length across a wide range of budgets.
title	BARD: budget-aware reasoning distillation
topic	Computation and Language
url	https://arxiv.org/abs/2511.01470

Similar Items