Saved in:
| Main Authors: | Wang, Ante, Song, Linfeng, Tian, Ye, Yu, Dian, Mi, Haitao, Duan, Xiangyu, Tu, Zhaopeng, Su, Jinsong, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.11183 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Self-Consistency Boosts Calibration for Math Reasoning
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
Teaching LLMs to Refine with Tools
by: Yu, Dian, et al.
Published: (2024)
by: Yu, Dian, et al.
Published: (2024)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)
by: Yu, Dian, et al.
Published: (2024)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)
by: Yue, Murong, et al.
Published: (2024)
Response Enhanced Semi-supervised Dialogue Query Generation
by: Huang, Jianheng, et al.
Published: (2023)
by: Huang, Jianheng, et al.
Published: (2023)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)
by: Zhang, Ziyin, et al.
Published: (2025)
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)
by: He, Zhiwei, et al.
Published: (2025)
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)
by: Yu, Dian, et al.
Published: (2025)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
by: Li, Yansi, et al.
Published: (2025)
by: Li, Yansi, et al.
Published: (2025)
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
by: Zhang, Yuheng, et al.
Published: (2025)
by: Zhang, Yuheng, et al.
Published: (2025)
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
by: Chen, Xingyu, et al.
Published: (2024)
by: Chen, Xingyu, et al.
Published: (2024)
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
by: Wang, Yue, et al.
Published: (2025)
by: Wang, Yue, et al.
Published: (2025)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Don't Get Stuck: A Deadlock Recovery Approach
by: Baldini, Francesca, et al.
Published: (2024)
by: Baldini, Francesca, et al.
Published: (2024)
Don't Get Too Excited -- Eliciting Emotions in LLMs
by: Fazzi, Gino Franco, et al.
Published: (2025)
by: Fazzi, Gino Franco, et al.
Published: (2025)
When Sourcing Donors, Don't Get in a Rut
Published: (2024)
Published: (2024)
Don't despair, not today
by: Zhaohui Su
Published: (2025)
by: Zhaohui Su
Published: (2025)
Rectify, Don't Regret: Avoiding Pitfalls of Differentiable Simulation in Trajectory Prediction
by: Yadav, Harsh, et al.
Published: (2026)
by: Yadav, Harsh, et al.
Published: (2026)
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)
by: Shi, Yucheng, et al.
Published: (2026)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty
by: Ren, Jingyi, et al.
Published: (2026)
by: Ren, Jingyi, et al.
Published: (2026)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)
by: Zhang, Yuheng, et al.
Published: (2024)
On the Information Redundancy in Non-Autoregressive Translation
by: Wang, Zhihao, et al.
Published: (2024)
by: Wang, Zhihao, et al.
Published: (2024)
Don't Get Left Behind: Moving Library Instruction Online
by: Hillman, Christina R., et al.
Published: (2016)
by: Hillman, Christina R., et al.
Published: (2016)
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)
by: Liu, Haolin, et al.
Published: (2026)
One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)
by: Zhao, Yulai, et al.
Published: (2025)
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
by: Zhou, Jin Peng, et al.
Published: (2024)
by: Zhou, Jin Peng, et al.
Published: (2024)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Don't Miss the Forest for the Trees: How Abstracting Nature Can Get Us Closer to Our Goals
by: Jake Lawlor
Published: (2025)
by: Jake Lawlor
Published: (2025)
Mitigating the Negative Impact of Over-association for Conversational Query Production
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Reasoning Models Reason Well, Until They Don't
by: Rameshkumar, Revanth, et al.
Published: (2025)
by: Rameshkumar, Revanth, et al.
Published: (2025)
Love Don't Need a Reason
by: Jones, Matthew J.
Published: (2020)
by: Jones, Matthew J.
Published: (2020)
Similar Items
-
LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024) -
Self-Consistency Boosts Calibration for Math Reasoning
by: Wang, Ante, et al.
Published: (2024) -
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
by: Wang, Ante, et al.
Published: (2024) -
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025) -
Teaching LLMs to Refine with Tools
by: Yu, Dian, et al.
Published: (2024)