:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pang, Yujuan, Li, Jiaxin, Sheng, Xin, Peng, Ran, Ma, Yong
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.03452
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spurious Rewards: Rethinking Training Signals in RLVR
by: Shao, Rulin, et al.
Published: (2025)

Beyond Uniform Credit Assignment: Selective Eligibility Traces for RLVR
by: Mou, Chaoli, et al.
Published: (2026)

Beyond One-Way Pruning: Bidirectional Pruning-Regrowth for Extreme Accuracy-Sparsity Tradeoff
by: Liu, Junchen, et al.
Published: (2025)

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning
by: Duo, Jiangshan, et al.
Published: (2026)

When to Stop Reusing: Dynamic Gradient Gating for Sample-Efficient RLVR
by: Miao, Yuchun, et al.
Published: (2026)

The Path Not Taken: RLVR Provably Learns Off the Principals
by: Zhu, Hanqing, et al.
Published: (2025)

On the Implicit Reward Overfitting and the Low-rank Dynamics in RLVR
by: Ye, Hao, et al.
Published: (2026)

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
by: Huang, Kexin, et al.
Published: (2026)

IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage
by: Li, Yuhan, et al.
Published: (2026)

Generalization of RLVR Using Causal Reasoning as a Testbed
by: Lu, Brian, et al.
Published: (2025)

Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics
by: Shihab, Ibne Farabi, et al.
Published: (2026)

Dual Randomized Smoothing: Beyond Global Noise Variance
by: Sun, Chenhao, et al.
Published: (2025)

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
by: Huang, Zhuoxu, et al.
Published: (2026)

VL Norm: Rethink Loss Aggregation in RLVR
by: He, Zhiyuan, et al.
Published: (2025)

Quantifying Empirical Compute-Supervision Tradeoffs in RLVR
by: Mitsuhashi, Ryo, et al.
Published: (2026)

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
by: Chen, Zhipeng, et al.
Published: (2026)

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR
by: Zhang, Jiaying, et al.
Published: (2026)

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)

RLVR-World: Training World Models with Reinforcement Learning
by: Wu, Jialong, et al.
Published: (2025)

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
by: Hao, Zhezheng, et al.
Published: (2025)

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
by: Wu, Junkang, et al.
Published: (2025)

The Debate on RLVR Reasoning Capability Boundary: Shrinkage, Expansion, or Both? A Two-Stage Dynamic View
by: Yao, Xinhao, et al.
Published: (2025)

Flexible Entropy Control in RLVR with a Gradient-Preserving Perspective
by: Chen, Kun, et al.
Published: (2026)

Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
by: Lu, Han, et al.
Published: (2025)

HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment
by: Liu, Zhanyu, et al.
Published: (2026)

Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events
by: Gu, Jingyi, et al.
Published: (2025)

Event-Aware Prompt Learning for Dynamic Graphs
by: Yu, Xingtong, et al.
Published: (2025)

Temporal Pair Consistency for Variance-Reduced Flow Matching
by: Maduabuchi, Chika, et al.
Published: (2026)

Linear Attention for Efficient Bidirectional Sequence Modeling
by: Afzal, Arshia, et al.
Published: (2025)

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
by: Helff, Lukas, et al.
Published: (2026)

The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR
by: Adewuyi, Israel, et al.
Published: (2026)

Countdown-Code: A Testbed for Studying The Emergence and Generalization of Reward Hacking in RLVR
by: Khalifa, Muhammad, et al.
Published: (2026)

VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
by: Cai, Xin-Qiang, et al.
Published: (2026)

No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
by: Le, Thanh-Long V., et al.
Published: (2025)

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
by: Yang, Zhicheng, et al.
Published: (2025)

Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning
by: Luo, Yu, et al.
Published: (2026)

Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
by: Qiu, Zhongxi, et al.
Published: (2025)

EPLKG: Efficient Prompt Learning with Knowledge Graph
by: Lim, YongTaek, et al.
Published: (2023)

Parameter Efficient Fine-tuning via Explained Variance Adaptation
by: Paischer, Fabian, et al.
Published: (2024)

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model
by: Chen, Jinyin, et al.
Published: (2024)