Saved in:
| Main Authors: | Liu, Runze, Wang, Jiakang, Shi, Yuling, Xie, Zhihui, An, Chenxin, Zhang, Kaiyan, Zhao, Jian, Gu, Xiaodong, Lin, Lei, Hu, Wenping, Li, Xiu, Zhang, Fuzheng, Zhou, Guorui, Gai, Kun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.26628 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL
by: Wang, Jiakang, et al.
Published: (2025)
by: Wang, Jiakang, et al.
Published: (2025)
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
by: Wang, Jiakang, et al.
Published: (2025)
by: Wang, Jiakang, et al.
Published: (2025)
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
by: Zhang, Jingyuan, et al.
Published: (2025)
by: Zhang, Jingyuan, et al.
Published: (2025)
Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning
by: Ji, Xingguang, et al.
Published: (2025)
by: Ji, Xingguang, et al.
Published: (2025)
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
by: Zhao, Jian, et al.
Published: (2025)
by: Zhao, Jian, et al.
Published: (2025)
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning
by: Zhang, Hongzhi, et al.
Published: (2025)
by: Zhang, Hongzhi, et al.
Published: (2025)
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
by: Fang, Yixiong, et al.
Published: (2025)
by: Fang, Yixiong, et al.
Published: (2025)
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
by: Ou, Jiao, et al.
Published: (2024)
by: Ou, Jiao, et al.
Published: (2024)
ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
by: Tang, Yihong, et al.
Published: (2024)
by: Tang, Yihong, et al.
Published: (2024)
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
by: Lin, Lei, et al.
Published: (2023)
by: Lin, Lei, et al.
Published: (2023)
MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning
by: Chen, Zhihui, et al.
Published: (2026)
by: Chen, Zhihui, et al.
Published: (2026)
Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective
by: Li, Huan, et al.
Published: (2025)
by: Li, Huan, et al.
Published: (2025)
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
by: Liu, Runze, et al.
Published: (2025)
by: Liu, Runze, et al.
Published: (2025)
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
by: Ou, Jiao, et al.
Published: (2023)
by: Ou, Jiao, et al.
Published: (2023)
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
by: Xie, Can, et al.
Published: (2025)
by: Xie, Can, et al.
Published: (2025)
ReviewRL: Towards Automated Scientific Review with RL
by: Zeng, Sihang, et al.
Published: (2025)
by: Zeng, Sihang, et al.
Published: (2025)
Rethinking Code Complexity Through the Lens of Large Language Models
by: Xie, Chen, et al.
Published: (2026)
by: Xie, Chen, et al.
Published: (2026)
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)
by: Guo, Zichun, et al.
Published: (2026)
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
by: Shi, Yuling, et al.
Published: (2024)
by: Shi, Yuling, et al.
Published: (2024)
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
by: Wang, Xu, et al.
Published: (2024)
by: Wang, Xu, et al.
Published: (2024)
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
by: Sun, Yuchong, et al.
Published: (2023)
by: Sun, Yuchong, et al.
Published: (2023)
ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
by: Nie, Shuaiyi, et al.
Published: (2026)
by: Nie, Shuaiyi, et al.
Published: (2026)
DARL: Encouraging Diverse Answers for General Reasoning without Verifiers
by: Huang, Chongxuan, et al.
Published: (2026)
by: Huang, Chongxuan, et al.
Published: (2026)
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
by: Cheng, Xiaoxue, et al.
Published: (2024)
by: Cheng, Xiaoxue, et al.
Published: (2024)
Pruning the Unsurprising: Efficient LLM Reasoning via First-Token Surprisal
by: Zeng, Wenhao, et al.
Published: (2025)
by: Zeng, Wenhao, et al.
Published: (2025)
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025)
by: Su, Zhenpeng, et al.
Published: (2025)
Klear-AgentForge: Forging Agentic Intelligence through Posttraining Scaling
by: Wang, Qi, et al.
Published: (2025)
by: Wang, Qi, et al.
Published: (2025)
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
by: Guo, Chengcheng, et al.
Published: (2026)
by: Guo, Chengcheng, et al.
Published: (2026)
DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts
by: Gai, Jiading, et al.
Published: (2026)
by: Gai, Jiading, et al.
Published: (2026)
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
by: Hu, Xiao, et al.
Published: (2025)
by: Hu, Xiao, et al.
Published: (2025)
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
by: Fu, Jia, et al.
Published: (2025)
by: Fu, Jia, et al.
Published: (2025)
SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning
by: Liu, Yihao, et al.
Published: (2025)
by: Liu, Yihao, et al.
Published: (2025)
FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)
by: Zhu, Xuekai, et al.
Published: (2025)
LongCodeZip: Compress Long Context for Code Language Models
by: Shi, Yuling, et al.
Published: (2025)
by: Shi, Yuling, et al.
Published: (2025)
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
by: Ji, Xingguang, et al.
Published: (2025)
by: Ji, Xingguang, et al.
Published: (2025)
AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning
by: Yuan, Shihao, et al.
Published: (2025)
by: Yuan, Shihao, et al.
Published: (2025)
ChorusCVR: Chorus Supervision for Entire Space Post-Click Conversion Rate Modeling
by: Cheng, Wei, et al.
Published: (2025)
by: Cheng, Wei, et al.
Published: (2025)
Similar Items
-
When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL
by: Wang, Jiakang, et al.
Published: (2025) -
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
by: Wang, Jiakang, et al.
Published: (2025) -
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
by: Su, Zhenpeng, et al.
Published: (2025) -
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
by: Su, Zhenpeng, et al.
Published: (2025) -
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
by: Zhang, Jingyuan, et al.
Published: (2025)