Saved in:
| Main Authors: | Li, Wendi, Li, Sharon |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20132 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning
by: Li, Wendi, et al.
Published: (2026)
by: Li, Wendi, et al.
Published: (2026)
General Exploratory Bonus for Optimistic Exploration in RLHF
by: Li, Wendi, et al.
Published: (2025)
by: Li, Wendi, et al.
Published: (2025)
Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms
by: Feng, Lechen, et al.
Published: (2025)
by: Feng, Lechen, et al.
Published: (2025)
FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection
by: Liao, Yihan, et al.
Published: (2025)
by: Liao, Yihan, et al.
Published: (2025)
FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning
by: Xiong, Qi, et al.
Published: (2025)
by: Xiong, Qi, et al.
Published: (2025)
ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning
by: Ren, Qingnan, et al.
Published: (2026)
by: Ren, Qingnan, et al.
Published: (2026)
Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization
by: Yu, Wenrui, et al.
Published: (2024)
by: Yu, Wenrui, et al.
Published: (2024)
Exponential Quantum Communication Advantage in Distributed Inference and Learning
by: Gilboa, Dar, et al.
Published: (2023)
by: Gilboa, Dar, et al.
Published: (2023)
Amortized Network Intervention to Steer the Excitatory Point Processes
by: Song, Zitao, et al.
Published: (2023)
by: Song, Zitao, et al.
Published: (2023)
ECoLAD: Deployment-Oriented Evaluation for Automotive Time-Series Anomaly Detection
by: Özer, Kadir-Kaan, et al.
Published: (2026)
by: Özer, Kadir-Kaan, et al.
Published: (2026)
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
by: Wiltzer, Harley, et al.
Published: (2024)
by: Wiltzer, Harley, et al.
Published: (2024)
Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning
by: Li, Ziheng, et al.
Published: (2026)
by: Li, Ziheng, et al.
Published: (2026)
LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices
by: Lignier, Jean-Philippe
Published: (2025)
by: Lignier, Jean-Philippe
Published: (2025)
Generalized Advantage Estimation for Distributional Policy Gradients
by: Shaik, Shahil, et al.
Published: (2025)
by: Shaik, Shahil, et al.
Published: (2025)
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
by: Gong, Shijin, et al.
Published: (2026)
by: Gong, Shijin, et al.
Published: (2026)
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
by: Im, Shawn, et al.
Published: (2024)
by: Im, Shawn, et al.
Published: (2024)
Prospects of Privacy Advantage in Quantum Machine Learning
by: Heredge, Jamie, et al.
Published: (2024)
by: Heredge, Jamie, et al.
Published: (2024)
RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)
by: Ghosh, Anurag, et al.
Published: (2026)
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)
by: Brantley, Kianté, et al.
Published: (2025)
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin
by: Xiong, Jian, et al.
Published: (2025)
by: Xiong, Jian, et al.
Published: (2025)
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
by: Wu, Junkang, et al.
Published: (2025)
by: Wu, Junkang, et al.
Published: (2025)
Path Learning with Trajectory Advantage Regression
by: Miyaguchi, Kohei
Published: (2025)
by: Miyaguchi, Kohei
Published: (2025)
Beyond High-Entropy Exploration: Correctness-Aware Low-Entropy Segment-Based Advantage Shaping for Reasoning LLMs
by: Chen, Xinzhu, et al.
Published: (2025)
by: Chen, Xinzhu, et al.
Published: (2025)
Advantage-based Temporal Attack in Reinforcement Learning
by: He, Shenghong
Published: (2026)
by: He, Shenghong
Published: (2026)
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
by: Maheswaran, Monishwaran, et al.
Published: (2025)
by: Maheswaran, Monishwaran, et al.
Published: (2025)
Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models
by: Li, Muxing, et al.
Published: (2025)
by: Li, Muxing, et al.
Published: (2025)
Sampling Complexity of TD and PPO in RKHS
by: Zou, Lu, et al.
Published: (2025)
by: Zou, Lu, et al.
Published: (2025)
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
by: Liu, Tenglong, et al.
Published: (2024)
by: Liu, Tenglong, et al.
Published: (2024)
Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs
by: Huang, Yiming, et al.
Published: (2026)
by: Huang, Yiming, et al.
Published: (2026)
BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning
by: Gong, Shijin, et al.
Published: (2026)
by: Gong, Shijin, et al.
Published: (2026)
Mitigating Think-Answer Mismatch in LLM Reasoning Through Noise-Aware Advantage Reweighting
by: Shen, Si, et al.
Published: (2025)
by: Shen, Si, et al.
Published: (2025)
Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning
by: Peng, Keqin, et al.
Published: (2026)
by: Peng, Keqin, et al.
Published: (2026)
Your Group-Relative Advantage Is Biased
by: Yang, Fengkai, et al.
Published: (2026)
by: Yang, Fengkai, et al.
Published: (2026)
Advantage Alignment Algorithms
by: Duque, Juan Agustin, et al.
Published: (2024)
by: Duque, Juan Agustin, et al.
Published: (2024)
How Well Can Preference Optimization Generalize Under Noisy Feedback?
by: Im, Shawn, et al.
Published: (2025)
by: Im, Shawn, et al.
Published: (2025)
Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
by: Xue, Shuchen, et al.
Published: (2025)
by: Xue, Shuchen, et al.
Published: (2025)
Limitations of Quantum Advantage in Unsupervised Machine Learning
by: Patel, Apoorva D.
Published: (2025)
by: Patel, Apoorva D.
Published: (2025)
Classical Verification of Quantum Learning Advantages with Noises
by: Ma, Yinghao, et al.
Published: (2024)
by: Ma, Yinghao, et al.
Published: (2024)
Competitive Advantage Attacks to Decentralized Federated Learning
by: Jia, Yuqi, et al.
Published: (2023)
by: Jia, Yuqi, et al.
Published: (2023)
Similar Items
-
Cyclical Entropy Eruption: Entropy Dynamics in Agent Reinforcement Learning
by: Li, Wendi, et al.
Published: (2026) -
General Exploratory Bonus for Optimistic Exploration in RLHF
by: Li, Wendi, et al.
Published: (2025) -
Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms
by: Feng, Lechen, et al.
Published: (2025) -
FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection
by: Liao, Yihan, et al.
Published: (2025) -
FedLAD: A Linear Algebra Based Data Poisoning Defence for Federated Learning
by: Xiong, Qi, et al.
Published: (2025)