Saved in:
| Main Author: | Padarha, Shreyansh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.00054 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
by: Chen, Jennifer, et al.
Published: (2025)
by: Chen, Jennifer, et al.
Published: (2025)
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
by: Liang, Zhuowen, et al.
Published: (2026)
by: Liang, Zhuowen, et al.
Published: (2026)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
by: Tian, Yijun, et al.
Published: (2024)
by: Tian, Yijun, et al.
Published: (2024)
Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
by: Cui, Yingqian, et al.
Published: (2025)
by: Cui, Yingqian, et al.
Published: (2025)
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
by: Zheng, Congmin, et al.
Published: (2025)
by: Zheng, Congmin, et al.
Published: (2025)
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
by: Chen, Haolin, et al.
Published: (2024)
by: Chen, Haolin, et al.
Published: (2024)
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
by: Mao, Yujun, et al.
Published: (2024)
by: Mao, Yujun, et al.
Published: (2024)
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs
by: Kumar, Abhas, et al.
Published: (2024)
by: Kumar, Abhas, et al.
Published: (2024)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
ARGS: Alignment as Reward-Guided Search
by: Khanov, Maxim, et al.
Published: (2024)
by: Khanov, Maxim, et al.
Published: (2024)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
by: Lin, Zicheng, et al.
Published: (2024)
by: Lin, Zicheng, et al.
Published: (2024)
Fine-Tuning Small Language Models (SLMs) for Autonomous Web-based Geographical Information Systems (AWebGIS)
by: Ashani, Mahdi Nazari, et al.
Published: (2025)
by: Ashani, Mahdi Nazari, et al.
Published: (2025)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning
by: Zhou, Qinhao, et al.
Published: (2024)
by: Zhou, Qinhao, et al.
Published: (2024)
RM-R1: Reward Modeling as Reasoning
by: Chen, Xiusi, et al.
Published: (2025)
by: Chen, Xiusi, et al.
Published: (2025)
Evaluating Robustness of Reward Models for Mathematical Reasoning
by: Kim, Sunghwan, et al.
Published: (2024)
by: Kim, Sunghwan, et al.
Published: (2024)
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)
by: Peng, Miao, et al.
Published: (2025)
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
by: Phan, Phuc, et al.
Published: (2024)
by: Phan, Phuc, et al.
Published: (2024)
FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)
by: Zhu, Xuekai, et al.
Published: (2025)
The Lessons of Developing Process Reward Models in Mathematical Reasoning
by: Zhang, Zhenru, et al.
Published: (2025)
by: Zhang, Zhenru, et al.
Published: (2025)
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
ECG-Reasoning-Benchmark: A Benchmark for Evaluating Clinical Reasoning Capabilities in ECG Interpretation
by: Oh, Jungwoo, et al.
Published: (2026)
by: Oh, Jungwoo, et al.
Published: (2026)
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)
by: Deng, Wenhao, et al.
Published: (2025)
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
by: Feng, Guhao, et al.
Published: (2024)
by: Feng, Guhao, et al.
Published: (2024)
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
by: Damani, Mehul, et al.
Published: (2025)
by: Damani, Mehul, et al.
Published: (2025)
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)
by: Luo, Ruilin, et al.
Published: (2025)
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
by: Stojanovski, Zafir, et al.
Published: (2025)
by: Stojanovski, Zafir, et al.
Published: (2025)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
Scalable LLM Reasoning Acceleration with Low-rank Distillation
by: Dong, Harry, et al.
Published: (2025)
by: Dong, Harry, et al.
Published: (2025)
Reasoning Distillation and Structural Alignment for Improved Code Generation
by: Jalilifard, Amir, et al.
Published: (2025)
by: Jalilifard, Amir, et al.
Published: (2025)
Agentic-R1: Distilled Dual-Strategy Reasoning
by: Du, Weihua, et al.
Published: (2025)
by: Du, Weihua, et al.
Published: (2025)
Structural Rationale Distillation via Reasoning Space Compression
by: Yang, Jialin, et al.
Published: (2026)
by: Yang, Jialin, et al.
Published: (2026)
A Structure-Agnostic Co-Tuning Framework for LLMs and SLMs in Cloud-Edge Systems
by: Liu, Yuze, et al.
Published: (2025)
by: Liu, Yuze, et al.
Published: (2025)
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
by: Hua, Wenyue, et al.
Published: (2024)
by: Hua, Wenyue, et al.
Published: (2024)
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
by: Anugraha, David, et al.
Published: (2025)
by: Anugraha, David, et al.
Published: (2025)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
AgentRM: Enhancing Agent Generalization with Reward Modeling
by: Xia, Yu, et al.
Published: (2025)
by: Xia, Yu, et al.
Published: (2025)
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
SAGE-32B: Agentic Reasoning via Iterative Distillation
by: Jha, Basab, et al.
Published: (2026)
by: Jha, Basab, et al.
Published: (2026)
Similar Items
-
DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
by: Chen, Jennifer, et al.
Published: (2025) -
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
by: Liang, Zhuowen, et al.
Published: (2026) -
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
by: Tian, Yijun, et al.
Published: (2024) -
Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
by: Cui, Yingqian, et al.
Published: (2025) -
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
by: Zheng, Congmin, et al.
Published: (2025)