Saved in:
| Main Author: | Li, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.06813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
by: Yang, Yuxiao, et al.
Published: (2026)
by: Yang, Yuxiao, et al.
Published: (2026)
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026)
by: Wang, Junzhe, et al.
Published: (2026)
Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
by: Cinquin, Tristan, et al.
Published: (2025)
by: Cinquin, Tristan, et al.
Published: (2025)
Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents
by: Li, Yushu, et al.
Published: (2026)
by: Li, Yushu, et al.
Published: (2026)
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
by: Zhang, Yifei, et al.
Published: (2026)
by: Zhang, Yifei, et al.
Published: (2026)
LLM-Guided Reinforcement Learning: Addressing Training Bottlenecks through Policy Modulation
by: Tan, Heng, et al.
Published: (2025)
by: Tan, Heng, et al.
Published: (2025)
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
by: Li, Xuan, et al.
Published: (2026)
by: Li, Xuan, et al.
Published: (2026)
Policy-Guided Search on Tree-of-Thoughts for Efficient Problem Solving with Bounded Language Model Queries
by: Pendurkar, Sumedh, et al.
Published: (2026)
by: Pendurkar, Sumedh, et al.
Published: (2026)
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
by: Yang, Zhaohui, et al.
Published: (2025)
by: Yang, Zhaohui, et al.
Published: (2025)
Offline Model-Based Optimization via Policy-Guided Gradient Search
by: Chemingui, Yassine, et al.
Published: (2024)
by: Chemingui, Yassine, et al.
Published: (2024)
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
by: Li, Shuangtao, et al.
Published: (2025)
by: Li, Shuangtao, et al.
Published: (2025)
From Atoms to Chains: Divergence-Guided Reasoning Curriculum for Unlabeled LLM Domain Adaptation
by: Wang, Yongqi, et al.
Published: (2026)
by: Wang, Yongqi, et al.
Published: (2026)
LLM Reasoning with Process Rewards for Outcome-Guided Steps
by: Rezaei, Mohammad, et al.
Published: (2026)
by: Rezaei, Mohammad, et al.
Published: (2026)
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
by: Zhang, Xichen, et al.
Published: (2025)
by: Zhang, Xichen, et al.
Published: (2025)
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
by: Mou, Zhiyu, et al.
Published: (2025)
by: Mou, Zhiyu, et al.
Published: (2025)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
by: Zhou, Yifan, et al.
Published: (2025)
by: Zhou, Yifan, et al.
Published: (2025)
In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search
by: Demirović, Emir, et al.
Published: (2024)
by: Demirović, Emir, et al.
Published: (2024)
Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)
by: Chen, Peter, et al.
Published: (2025)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection
by: Zhao, Zihui, et al.
Published: (2025)
by: Zhao, Zihui, et al.
Published: (2025)
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
by: Melo, Luckeciano C., et al.
Published: (2025)
by: Melo, Luckeciano C., et al.
Published: (2025)
Value-Guided Search for Efficient Chain-of-Thought Reasoning
by: Wang, Kaiwen, et al.
Published: (2025)
by: Wang, Kaiwen, et al.
Published: (2025)
V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization
by: Jiang, Yubo, et al.
Published: (2026)
by: Jiang, Yubo, et al.
Published: (2026)
Beyond Alignment: Expanding Reasoning Capacity via Manifold-Reshaping Policy Optimization
by: Wang, Dayu, et al.
Published: (2026)
by: Wang, Dayu, et al.
Published: (2026)
PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
by: Li, Jiawei, et al.
Published: (2024)
by: Li, Jiawei, et al.
Published: (2024)
Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient
by: Li, Changming, et al.
Published: (2026)
by: Li, Changming, et al.
Published: (2026)
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
Tree Search for LLM Agent Reinforcement Learning
by: Ji, Yuxiang, et al.
Published: (2025)
by: Ji, Yuxiang, et al.
Published: (2025)
Beyond KL Divergence: Policy Optimization with Flexible Bregman Divergences for LLM Reasoning
by: Yuan, Rui, et al.
Published: (2026)
by: Yuan, Rui, et al.
Published: (2026)
Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization
by: Liu, Shengchao, et al.
Published: (2025)
by: Liu, Shengchao, et al.
Published: (2025)
ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models
by: Yu, Song, et al.
Published: (2026)
by: Yu, Song, et al.
Published: (2026)
The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
by: Yin, Chenlong, et al.
Published: (2025)
by: Yin, Chenlong, et al.
Published: (2025)
LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search
by: Weichart, Maximilian
Published: (2025)
by: Weichart, Maximilian
Published: (2025)
RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning
by: Mao, Yixiu, et al.
Published: (2026)
by: Mao, Yixiu, et al.
Published: (2026)
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
by: Han, Dongge, et al.
Published: (2025)
by: Han, Dongge, et al.
Published: (2025)
Similar Items
-
OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning
by: Yang, Yuxiao, et al.
Published: (2026) -
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026) -
Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
by: Cinquin, Tristan, et al.
Published: (2025) -
Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents
by: Li, Yushu, et al.
Published: (2026) -
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
by: Zhang, Yifei, et al.
Published: (2026)