:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Padarha, Shreyansh
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2507.00054
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
by: Chen, Jennifer, et al.
Published: (2025)

Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
by: Liang, Zhuowen, et al.
Published: (2026)

Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
by: Tian, Yijun, et al.
Published: (2024)

Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
by: Cui, Yingqian, et al.
Published: (2025)

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
by: Zheng, Congmin, et al.
Published: (2025)

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
by: Chen, Haolin, et al.
Published: (2024)

CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
by: Mao, Yujun, et al.
Published: (2024)

CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
by: Kumar, Abhas, et al.
Published: (2024)

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)

ARGS: Alignment as Reward-Guided Search
by: Khanov, Maxim, et al.
Published: (2024)

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
by: Lin, Zicheng, et al.
Published: (2024)

Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS)
by: Ashani, Mahdi Nazari, et al.
Published: (2025)

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
by: Zhou, Qinhao, et al.
Published: (2024)

RM-R1: Reward Modeling as Reasoning
by: Chen, Xiusi, et al.
Published: (2025)

Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
by: Phan, Phuc, et al.
Published: (2024)

FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)

The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)

ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation
by: Oh, Jungwoo, et al.
Published: (2026)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)

How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
by: Feng, Guhao, et al.
Published: (2024)

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
by: Liu, Wei, et al.
Published: (2025)

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
by: Damani, Mehul, et al.
Published: (2025)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)

On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)

Scalable LLM Reasoning Acceleration with Low-rank Distillation
by: Dong, Harry, et al.
Published: (2025)

Reasoning Distillation and Structural Alignment for Improved Code Generation
by: Jalilifard, Amir, et al.
Published: (2025)

Agentic-R1: Distilled Dual-Strategy Reasoning
by: Du, Weihua, et al.
Published: (2025)

Structural Rationale Distillation via Reasoning Space Compression
by: Yang, Jialin, et al.
Published: (2026)

A Structure-Agnostic Co-Tuning Framework for LLMs and SLMs in Cloud-Edge Systems
by: Liu, Yuze, et al.
Published: (2025)

Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
by: Hua, Wenyue, et al.
Published: (2024)

mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
by: Anugraha, David, et al.
Published: (2025)

Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)

AgentRM: Enhancing Agent Generalization with Reward Modeling
by: Xia, Yu, et al.
Published: (2025)

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
by: Liu, Yang, et al.
Published: (2026)

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)

SAGE-32B: Agentic Reasoning via Iterative Distillation
by: Jha, Basab, et al.
Published: (2026)