:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Ante, Song, Linfeng, Tian, Ye, Yu, Dian, Mi, Haitao, Duan, Xiangyu, Tu, Zhaopeng, Su, Jinsong, Yu, Dong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.11183
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024)

Self-Consistency Boosts Calibration for Math Reasoning
by: Wang, Ante, et al.
Published: (2024)

Fine-Grained Self-Endorsement Improves Factuality and Reasoning
by: Wang, Ante, et al.
Published: (2024)

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)

Teaching LLMs to Refine with Tools
by: Yu, Dian, et al.
Published: (2024)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)

Response Enhanced Semi-supervised Dialogue Query Generation
by: Huang, Jianheng, et al.
Published: (2023)

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)

Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
by: Li, Yansi, et al.
Published: (2025)

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
by: Zhang, Yuheng, et al.
Published: (2025)

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
by: Chen, Xingyu, et al.
Published: (2024)

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
by: Wang, Yue, et al.
Published: (2025)

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)

Don't Get Stuck: A Deadlock Recovery Approach
by: Baldini, Francesca, et al.
Published: (2024)

Don't Get Too Excited -- Eliciting Emotions in LLMs
by: Fazzi, Gino Franco, et al.
Published: (2025)

When Sourcing Donors, Don't Get in a Rut
Published: (2024)

Don't despair, not today
by: Zhaohui Su
Published: (2025)

Rectify, Don't Regret: Avoiding Pitfalls of Differentiable Simulation in Trajectory Prediction
by: Yadav, Harsh, et al.
Published: (2026)

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty
by: Ren, Jingyi, et al.
Published: (2026)

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)

On the Information Redundancy in Non-Autoregressive Translation
by: Wang, Zhihao, et al.
Published: (2024)

Don't Get Left Behind: Moving Library Instruction Online
by: Hillman, Christina R., et al.
Published: (2016)

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)

One Token to Fool LLM-as-a-Judge
by: Zhao, Yulai, et al.
Published: (2025)

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
by: Zhou, Jin Peng, et al.
Published: (2024)

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)

Don't Miss the Forest for the Trees: How Abstracting Nature Can Get Us Closer to Our Goals
by: Jake Lawlor
Published: (2025)

Mitigating the Negative Impact of Over-association for Conversational Query Production
by: Wang, Ante, et al.
Published: (2024)

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)

Reasoning Models Reason Well, Until They Don't
by: Rameshkumar, Revanth, et al.
Published: (2025)

Love Don't Need a Reason
by: Jones, Matthew J.
Published: (2020)