Saved in:
| Main Authors: | Liang, Zhenwen, Zhou, Yujun, Lu, Sidi, Zhang, Xiangliang, Mi, Haitao, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.18493 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)
by: Liu, Haolin, et al.
Published: (2026)
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025)
by: Zhou, Yujun, et al.
Published: (2025)
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Defending Jailbreak Prompts via In-Context Adversarial Game
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
by: Liang, Zhenwen, et al.
Published: (2024)
by: Liang, Zhenwen, et al.
Published: (2024)
Dual-Uncertainty Guided Policy Learning for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
by: Dai, Runpeng, et al.
Published: (2025)
by: Dai, Runpeng, et al.
Published: (2025)
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)
by: Yu, Wenhao, et al.
Published: (2025)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
by: Yue, Murong, et al.
Published: (2024)
by: Yue, Murong, et al.
Published: (2024)
In-context Exploration-Exploitation for Reinforcement Learning
by: Dai, Zhenwen, et al.
Published: (2024)
by: Dai, Zhenwen, et al.
Published: (2024)
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)
by: Yu, Dian, et al.
Published: (2025)
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking in Large Language Models
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
Capability-Oriented Training Induced Alignment Risk
by: Zhou, Yujun, et al.
Published: (2026)
by: Zhou, Yujun, et al.
Published: (2026)
Manipulating Predictions over Discrete Inputs in Machine Teaching
by: Wu, Xiaodong, et al.
Published: (2024)
by: Wu, Xiaodong, et al.
Published: (2024)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
by: Lu, Sidi, et al.
Published: (2026)
by: Lu, Sidi, et al.
Published: (2026)
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
by: Wang, Wenkai, et al.
Published: (2026)
by: Wang, Wenkai, et al.
Published: (2026)
R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)
by: Huang, Chengsong, et al.
Published: (2025)
Zero-Shot Relational Learning for Multimodal Knowledge Graphs
by: Cai, Rui, et al.
Published: (2024)
by: Cai, Rui, et al.
Published: (2024)
Scaling Synthetic Data Creation with 1,000,000,000 Personas
by: Ge, Tao, et al.
Published: (2024)
by: Ge, Tao, et al.
Published: (2024)
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
by: Zhou, Yujun, et al.
Published: (2024)
by: Zhou, Yujun, et al.
Published: (2024)
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)
by: Shi, Yucheng, et al.
Published: (2026)
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)
by: Wu, Mingqi, et al.
Published: (2025)
Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers
by: Tang, Mohan, et al.
Published: (2026)
by: Tang, Mohan, et al.
Published: (2026)
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
by: Lang, Yicheng, et al.
Published: (2025)
by: Lang, Yicheng, et al.
Published: (2025)
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
by: Wang, Xiyao, et al.
Published: (2024)
by: Wang, Xiyao, et al.
Published: (2024)
TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning
by: Zhang, Mingxuan, et al.
Published: (2025)
by: Zhang, Mingxuan, et al.
Published: (2025)
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
by: Zeng, Xinyue, et al.
Published: (2026)
by: Zeng, Xinyue, et al.
Published: (2026)
Edge Contrastive Learning: An Augmentation-Free Graph Contrastive Learning Model
by: Li, Yujun, et al.
Published: (2024)
by: Li, Yujun, et al.
Published: (2024)
Counterfactual Explanations for Continuous Action Reinforcement Learning
by: Dong, Shuyang, et al.
Published: (2025)
by: Dong, Shuyang, et al.
Published: (2025)
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
TabReason: A Reinforcement Learning-Enhanced Reasoning LLM for Explainable Tabular Data Prediction
by: Xu, Tommy, et al.
Published: (2025)
by: Xu, Tommy, et al.
Published: (2025)
Learning to Clean: Reinforcement Learning for Noisy Label Correction
by: Heidari, Marzi, et al.
Published: (2025)
by: Heidari, Marzi, et al.
Published: (2025)
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
by: Wong, Zhen Hao, et al.
Published: (2025)
by: Wong, Zhen Hao, et al.
Published: (2025)
GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
by: Li, Yujun, et al.
Published: (2026)
by: Li, Yujun, et al.
Published: (2026)
GraphRARE: Reinforcement Learning Enhanced Graph Neural Network with Relative Entropy
by: Peng, Tianhao, et al.
Published: (2023)
by: Peng, Tianhao, et al.
Published: (2023)
Learning Molecular Representation in a Cell
by: Liu, Gang, et al.
Published: (2024)
by: Liu, Gang, et al.
Published: (2024)
Similar Items
-
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025) -
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026) -
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026) -
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
by: Zhou, Yujun, et al.
Published: (2025) -
Stable and Efficient Single-Rollout RL for Multimodal Reasoning
by: Liu, Rui, et al.
Published: (2025)