Saved in:
| Main Authors: | Li, Zongxia, Yu, Wenhao, Huang, Chengsong, Liang, Zhenwen, Liu, Rui, Liu, Fuxiao, Che, Jingxi, Yu, Dian, Boyd-Graber, Jordan, Mi, Haitao, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.19652 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)
by: Yu, Wenhao, et al.
Published: (2025)
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)
by: Shi, Yucheng, et al.
Published: (2026)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)
by: Huang, Chengsong, et al.
Published: (2025)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)
by: Liu, Haolin, et al.
Published: (2026)
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
by: Li, Zongxia, et al.
Published: (2026)
by: Li, Zongxia, et al.
Published: (2026)
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)
by: Zhou, Yujun, et al.
Published: (2025)
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
by: Zhang, Zhihan, et al.
Published: (2024)
by: Zhang, Zhihan, et al.
Published: (2024)
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
by: Liang, Zhenwen, et al.
Published: (2024)
by: Liang, Zhenwen, et al.
Published: (2024)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)
by: Yu, Dian, et al.
Published: (2024)
VisPlay: Self-Evolving Vision-Language Models from Images
by: He, Yicheng, et al.
Published: (2025)
by: He, Yicheng, et al.
Published: (2025)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data
by: Liang, Zhenwen, et al.
Published: (2026)
by: Liang, Zhenwen, et al.
Published: (2026)
CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering
by: Li, Zongxia, et al.
Published: (2024)
by: Li, Zongxia, et al.
Published: (2024)
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
by: Li, Zongxia, et al.
Published: (2024)
by: Li, Zongxia, et al.
Published: (2024)
Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)
by: Yue, Murong, et al.
Published: (2024)
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)
by: Su, Yi, et al.
Published: (2025)
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)
by: Lu, Sidi, et al.
Published: (2026)
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
by: Zhang, Zheyuan, et al.
Published: (2025)
by: Zhang, Zheyuan, et al.
Published: (2025)
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)
by: He, Zhiwei, et al.
Published: (2025)
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
by: Gu, Feng, et al.
Published: (2025)
by: Gu, Feng, et al.
Published: (2025)
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)
by: Liang, Tian, et al.
Published: (2025)
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)
by: Zhang, Ziyin, et al.
Published: (2025)
SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement
by: Mondal, Ishani, et al.
Published: (2024)
by: Mondal, Ishani, et al.
Published: (2024)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025)
by: Li, Mukai, et al.
Published: (2025)
Verified Critical Step Optimization for LLM Agents
by: Li, Mukai, et al.
Published: (2026)
by: Li, Mukai, et al.
Published: (2026)
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Reinforcing Multimodal Reasoning Against Visual Degradation
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
Similar Items
-
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025) -
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026) -
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025) -
R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025) -
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)