Saved in:
| Main Authors: | Pang, Yujuan, Li, Jiaxin, Sheng, Xin, Peng, Ran, Ma, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03452 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spurious Rewards: Rethinking Training Signals in RLVR
by: Shao, Rulin, et al.
Published: (2025)
by: Shao, Rulin, et al.
Published: (2025)
Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
by: Mou, Chaoli, et al.
Published: (2026)
by: Mou, Chaoli, et al.
Published: (2026)
Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff
by: Liu, Junchen, et al.
Published: (2025)
by: Liu, Junchen, et al.
Published: (2025)
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026)
by: Duo, Jiangshan, et al.
Published: (2026)
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
by: Miao, Yuchun, et al.
Published: (2026)
by: Miao, Yuchun, et al.
Published: (2026)
The Path Not Taken: RLVR Provably Learns Off the Principals
by: Zhu, Hanqing, et al.
Published: (2025)
by: Zhu, Hanqing, et al.
Published: (2025)
On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR
by: Ye, Hao, et al.
Published: (2026)
by: Ye, Hao, et al.
Published: (2026)
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
by: Huang, Kexin, et al.
Published: (2026)
by: Huang, Kexin, et al.
Published: (2026)
IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026)
by: Li, Yuhan, et al.
Published: (2026)
Generalization of RLVR Using Causal Reasoning as a Testbed
by: Lu, Brian, et al.
Published: (2025)
by: Lu, Brian, et al.
Published: (2025)
Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics
by: Shihab, Ibne Farabi, et al.
Published: (2026)
by: Shihab, Ibne Farabi, et al.
Published: (2026)
Dual Randomized Smoothing: Beyond Global Noise Variance
by: Sun, Chenhao, et al.
Published: (2025)
by: Sun, Chenhao, et al.
Published: (2025)
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
by: Huang, Zhuoxu, et al.
Published: (2026)
by: Huang, Zhuoxu, et al.
Published: (2026)
VL Norm: Rethink Loss Aggregation in RLVR
by: He, Zhiyuan, et al.
Published: (2025)
by: He, Zhiyuan, et al.
Published: (2025)
Quantifying Empirical Compute-Supervision Tradeoffs in RLVR
by: Mitsuhashi, Ryo, et al.
Published: (2026)
by: Mitsuhashi, Ryo, et al.
Published: (2026)
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
by: Chen, Zhipeng, et al.
Published: (2026)
by: Chen, Zhipeng, et al.
Published: (2026)
GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
by: Zhang, Jiaying, et al.
Published: (2026)
by: Zhang, Jiaying, et al.
Published: (2026)
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)
by: Bounhar, Abdelaziz, et al.
Published: (2025)
RLVR-World: Training World Models with Reinforcement Learning
by: Wu, Jialong, et al.
Published: (2025)
by: Wu, Jialong, et al.
Published: (2025)
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
by: Hao, Zhezheng, et al.
Published: (2025)
by: Hao, Zhezheng, et al.
Published: (2025)
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
by: Wu, Junkang, et al.
Published: (2025)
by: Wu, Junkang, et al.
Published: (2025)
The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
by: Yao, Xinhao, et al.
Published: (2025)
by: Yao, Xinhao, et al.
Published: (2025)
Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective
by: Chen, Kun, et al.
Published: (2026)
by: Chen, Kun, et al.
Published: (2026)
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
by: Lu, Han, et al.
Published: (2025)
by: Lu, Han, et al.
Published: (2025)
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
by: Liu, Zhanyu, et al.
Published: (2026)
by: Liu, Zhanyu, et al.
Published: (2026)
Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events
by: Gu, Jingyi, et al.
Published: (2025)
by: Gu, Jingyi, et al.
Published: (2025)
Event-Aware Prompt Learning for Dynamic Graphs
by: Yu, Xingtong, et al.
Published: (2025)
by: Yu, Xingtong, et al.
Published: (2025)
Temporal Pair Consistency for Variance-Reduced Flow Matching
by: Maduabuchi, Chika, et al.
Published: (2026)
by: Maduabuchi, Chika, et al.
Published: (2026)
Linear Attention for Efficient Bidirectional Sequence Modeling
by: Afzal, Arshia, et al.
Published: (2025)
by: Afzal, Arshia, et al.
Published: (2025)
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
by: Helff, Lukas, et al.
Published: (2026)
by: Helff, Lukas, et al.
Published: (2026)
The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR
by: Adewuyi, Israel, et al.
Published: (2026)
by: Adewuyi, Israel, et al.
Published: (2026)
Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
by: Khalifa, Muhammad, et al.
Published: (2026)
by: Khalifa, Muhammad, et al.
Published: (2026)
VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
by: Cai, Xin-Qiang, et al.
Published: (2026)
by: Cai, Xin-Qiang, et al.
Published: (2026)
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
by: Le, Thanh-Long V., et al.
Published: (2025)
by: Le, Thanh-Long V., et al.
Published: (2025)
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
by: Yang, Zhicheng, et al.
Published: (2025)
by: Yang, Zhicheng, et al.
Published: (2025)
Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning
by: Luo, Yu, et al.
Published: (2026)
by: Luo, Yu, et al.
Published: (2026)
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
by: Qiu, Zhongxi, et al.
Published: (2025)
by: Qiu, Zhongxi, et al.
Published: (2025)
EPLKG: Efficient Prompt Learning with Knowledge Graph
by: Lim, YongTaek, et al.
Published: (2023)
by: Lim, YongTaek, et al.
Published: (2023)
Parameter Efficient Fine-tuning via Explained Variance Adaptation
by: Paischer, Fabian, et al.
Published: (2024)
by: Paischer, Fabian, et al.
Published: (2024)
Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model
by: Chen, Jinyin, et al.
Published: (2024)
by: Chen, Jinyin, et al.
Published: (2024)
Similar Items
-
Spurious Rewards: Rethinking Training Signals in RLVR
by: Shao, Rulin, et al.
Published: (2025) -
Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
by: Mou, Chaoli, et al.
Published: (2026) -
Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff
by: Liu, Junchen, et al.
Published: (2025) -
JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026) -
When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
by: Miao, Yuchun, et al.
Published: (2026)