Saved in:
| Main Authors: | Li, Mukai, Zeng, Qingcheng, Fang, Tianqing, Liang, Zhenwen, Song, Linfeng, Liu, Qi, Mi, Haitao, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03412 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025)
by: Li, Mukai, et al.
Published: (2025)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)
by: Yu, Wenhao, et al.
Published: (2025)
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)
by: Zhou, Yujun, et al.
Published: (2025)
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
by: Fang, Tianqing, et al.
Published: (2025)
by: Fang, Tianqing, et al.
Published: (2025)
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)
by: He, Zhiwei, et al.
Published: (2025)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)
by: Zhang, Ziyin, et al.
Published: (2025)
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)
by: Lu, Sidi, et al.
Published: (2026)
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
by: Zhang, Zhisong, et al.
Published: (2025)
by: Zhang, Zhisong, et al.
Published: (2025)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
by: Ma, Junyu, et al.
Published: (2025)
by: Ma, Junyu, et al.
Published: (2025)
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)
by: Liu, Haolin, et al.
Published: (2026)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
Inconsistent dialogue responses and how to recover from them
by: Zhang, Mian, et al.
Published: (2024)
by: Zhang, Mian, et al.
Published: (2024)
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
by: Zhang, Yuheng, et al.
Published: (2025)
by: Zhang, Yuheng, et al.
Published: (2025)
A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
by: Li, Xiangci, et al.
Published: (2024)
by: Li, Xiangci, et al.
Published: (2024)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)
by: Yu, Dian, et al.
Published: (2024)
Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models
by: Das, Souvik, et al.
Published: (2024)
by: Das, Souvik, et al.
Published: (2024)
Collaborative decoding of critical tokens for boosting factuality of large language models
by: Jin, Lifeng, et al.
Published: (2024)
by: Jin, Lifeng, et al.
Published: (2024)
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)
by: Yu, Dian, et al.
Published: (2025)
LiteSearch: Efficacious Tree Search for LLM
by: Wang, Ante, et al.
Published: (2024)
by: Wang, Ante, et al.
Published: (2024)
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
by: Fang, Tianqing, et al.
Published: (2025)
by: Fang, Tianqing, et al.
Published: (2025)
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
by: Liu, Xiaoyuan, et al.
Published: (2025)
by: Liu, Xiaoyuan, et al.
Published: (2025)
Teaching LLMs to Refine with Tools
by: Yu, Dian, et al.
Published: (2024)
by: Yu, Dian, et al.
Published: (2024)
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls
by: Wang, Ante, et al.
Published: (2025)
by: Wang, Ante, et al.
Published: (2025)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback
by: Hu, Minda, et al.
Published: (2025)
by: Hu, Minda, et al.
Published: (2025)
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
by: Yao, Wenlin, et al.
Published: (2024)
by: Yao, Wenlin, et al.
Published: (2024)
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
by: Deng, Chenlong, et al.
Published: (2025)
by: Deng, Chenlong, et al.
Published: (2025)
InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing
by: Li, Shuaiyi, et al.
Published: (2025)
by: Li, Shuaiyi, et al.
Published: (2025)
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
Using LLM to select the right SQL Query from candidates
by: Li, Zhenwen, et al.
Published: (2024)
by: Li, Zhenwen, et al.
Published: (2024)
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)
by: Zhang, Yuheng, et al.
Published: (2024)
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Similar Items
-
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025) -
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026) -
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025) -
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025) -
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)