:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Wendi, Li, Sharon
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.20132
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning
by: Li, Wendi, et al.
Published: (2026)

General Exploratory Bonus for Optimistic Exploration in RLHF
by: Li, Wendi, et al.
Published: (2025)

Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms
by: Feng, Lechen, et al.
Published: (2025)

FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection
by: Liao, Yihan, et al.
Published: (2025)

FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning
by: Xiong, Qi, et al.
Published: (2025)

ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning
by: Ren, Qingnan, et al.
Published: (2026)

Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization
by: Yu, Wenrui, et al.
Published: (2024)

Exponential Quantum Communication Advantage in Distributed Inference and Learning
by: Gilboa, Dar, et al.
Published: (2023)

Amortized Network Intervention to Steer the Excitatory Point Processes
by: Song, Zitao, et al.
Published: (2023)

ECoLAD: Deployment-Oriented Evaluation for Automotive Time-Series Anomaly Detection
by: Özer, Kadir-Kaan, et al.
Published: (2026)

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
by: Wiltzer, Harley, et al.
Published: (2024)

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning
by: Li, Ziheng, et al.
Published: (2026)

LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices
by: Lignier, Jean-Philippe
Published: (2025)

Generalized Advantage Estimation for Distributional Policy Gradients
by: Shaik, Shahil, et al.
Published: (2025)

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
by: Gong, Shijin, et al.
Published: (2026)

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
by: Im, Shawn, et al.
Published: (2024)

Prospects of Privacy Advantage in Quantum Machine Learning
by: Heredge, Jamie, et al.
Published: (2024)

RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)

Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)

Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)

AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin
by: Xiong, Jian, et al.
Published: (2025)

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
by: Wu, Junkang, et al.
Published: (2025)

Path Learning with Trajectory Advantage Regression
by: Miyaguchi, Kohei
Published: (2025)

Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs
by: Chen, Xinzhu, et al.
Published: (2025)

Advantage-based Temporal Attack in Reinforcement Learning
by: He, Shenghong
Published: (2026)

Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
by: Maheswaran, Monishwaran, et al.
Published: (2025)

Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models
by: Li, Muxing, et al.
Published: (2025)

Sampling Complexity of TD and PPO in RKHS
by: Zou, Lu, et al.
Published: (2025)

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
by: Liu, Tenglong, et al.
Published: (2024)

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
by: Huang, Yiming, et al.
Published: (2026)

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning
by: Gong, Shijin, et al.
Published: (2026)

Mitigating Think-Answer Mismatch in LLM Reasoning Through Noise-Aware Advantage Reweighting
by: Shen, Si, et al.
Published: (2025)

Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning
by: Peng, Keqin, et al.
Published: (2026)

Your Group-Relative Advantage Is Biased
by: Yang, Fengkai, et al.
Published: (2026)

Advantage Alignment Algorithms
by: Duque, Juan Agustin, et al.
Published: (2024)

How Well Can Preference Optimization Generalize Under Noisy Feedback?
by: Im, Shawn, et al.
Published: (2025)

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
by: Xue, Shuchen, et al.
Published: (2025)

Limitations of Quantum Advantage in Unsupervised Machine Learning
by: Patel, Apoorva D.
Published: (2025)

Classical Verification of Quantum Learning Advantages with Noises
by: Ma, Yinghao, et al.
Published: (2024)

Competitive Advantage Attacks to Decentralized Federated Learning
by: Jia, Yuqi, et al.
Published: (2023)