:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Zhenwen, Zhou, Yujun, Lu, Sidi, Zhang, Xiangliang, Mi, Haitao, Yu, Dong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2604.18493
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)

Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)

Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
by: Liang, Zhenwen, et al.
Published: (2024)

Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)

Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)

In-context Exploration-Exploitation for Reinforcement Learning
by: Dai, Zhenwen, et al.
Published: (2024)

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)

Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
by: Wang, Xiangqi, et al.
Published: (2025)

Capability-Oriented Training Induced Alignment Risk
by: Zhou, Yujun, et al.
Published: (2026)

Manipulating Predictions over Discrete Inputs in Machine Teaching
by: Wu, Xiaodong, et al.
Published: (2024)

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)

Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
by: Wang, Wenkai, et al.
Published: (2026)

R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)

Zero-Shot Relational Learning for Multimodal Knowledge Graphs
by: Cai, Rui, et al.
Published: (2024)

Scaling Synthetic Data Creation with 1,000,000,000 Personas
by: Ge, Tao, et al.
Published: (2024)

LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
by: Zhou, Yujun, et al.
Published: (2024)

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)

Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
by: Tang, Mohan, et al.
Published: (2026)

Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
by: Lang, Yicheng, et al.
Published: (2025)

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)

TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
by: Zhang, Mingxuan, et al.
Published: (2025)

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
by: Zeng, Xinyue, et al.
Published: (2026)

Edge Contrastive Learning: An Augmentation-Free Graph Contrastive Learning Model
by: Li, Yujun, et al.
Published: (2024)

Counterfactual Explanations for Continuous Action Reinforcement Learning
by: Dong, Shuyang, et al.
Published: (2025)

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)

TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction
by: Xu, Tommy, et al.
Published: (2025)

Learning to Clean: Reinforcement Learning for Noisy Label Correction
by: Heidari, Marzi, et al.
Published: (2025)

LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
by: Wong, Zhen Hao, et al.
Published: (2025)

GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
by: Li, Yujun, et al.
Published: (2026)

GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy
by: Peng, Tianhao, et al.
Published: (2023)

Learning Molecular Representation in a Cell
by: Liu, Gang, et al.
Published: (2024)