:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Zongxia, Yu, Wenhao, Huang, Chengsong, Liang, Zhenwen, Liu, Rui, Liu, Fuxiao, Che, Jingxi, Yu, Dian, Boyd-Graber, Jordan, Mi, Haitao, Yu, Dong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.19652
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)

Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data
by: Li, Zongxia, et al.
Published: (2026)

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification
by: Liu, Rui, et al.
Published: (2026)

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
by: Zhang, Zhihan, et al.
Published: (2024)

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
by: Liang, Zhenwen, et al.
Published: (2024)

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
by: Yu, Dian, et al.
Published: (2024)

VisPlay: Self-Evolving Vision-Language Models from Images
by: He, Yicheng, et al.
Published: (2025)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data
by: Liang, Zhenwen, et al.
Published: (2026)

CFMatch: Aligning Automated Answer Equivalence Evaluation with Expert Judgments For Open-Domain Question Answering
by: Li, Zongxia, et al.
Published: (2024)

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
by: Li, Zongxia, et al.
Published: (2025)

Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
by: Li, Zongxia, et al.
Published: (2025)

PEDANTS: Cheap but Effective and Interpretable Answer Equivalence
by: Li, Zongxia, et al.
Published: (2024)

Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving
by: Liang, Zhenwen, et al.
Published: (2025)

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)

Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
by: Su, Yi, et al.
Published: (2025)

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)

NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization
by: Zhang, Zheyuan, et al.
Published: (2025)

A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges
by: Li, Zongxia, et al.
Published: (2025)

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
by: He, Zhiwei, et al.
Published: (2025)

Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
by: Gu, Feng, et al.
Published: (2025)

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
by: Liang, Tian, et al.
Published: (2025)

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
by: Zhang, Ziyin, et al.
Published: (2025)

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement
by: Mondal, Ishani, et al.
Published: (2024)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
by: Li, Mukai, et al.
Published: (2025)

Verified Critical Step Optimization for LLM Agents
by: Li, Mukai, et al.
Published: (2026)

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
by: Liang, Zhenwen, et al.
Published: (2025)

Reinforcing Multimodal Reasoning Against Visual Degradation
by: Liu, Rui, et al.
Published: (2026)

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)