Saved in:
| Main Authors: | Pan, Pei-Chi, Liang, Yingbin, Lin, Sen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09305 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Algorithm Design for Online Meta-Learning with Task Boundary Detection
by: Sow, Daouda, et al.
Published: (2023)
by: Sow, Daouda, et al.
Published: (2023)
Agentic Transformers Provably Learn to Search via Reinforcement Learning
by: Yang, Tong, et al.
Published: (2026)
by: Yang, Tong, et al.
Published: (2026)
Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
by: Huang, Ruiquan, et al.
Published: (2024)
by: Huang, Ruiquan, et al.
Published: (2024)
HIPO: Instruction Hierarchy via Constrained Reinforcement Learning
by: Chen, Keru, et al.
Published: (2026)
by: Chen, Keru, et al.
Published: (2026)
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
by: Yang, Tong, et al.
Published: (2025)
by: Yang, Tong, et al.
Published: (2025)
Mixture-of-Transformers Learn Faster: A Theoretical Study on Classification Problems
by: Li, Hongbo, et al.
Published: (2025)
by: Li, Hongbo, et al.
Published: (2025)
Theory on Mixture-of-Experts in Continual Learning
by: Li, Hongbo, et al.
Published: (2024)
by: Li, Hongbo, et al.
Published: (2024)
Unlocking the Power of Rehearsal in Continual Learning: A Theoretical Perspective
by: Deng, Junze, et al.
Published: (2025)
by: Deng, Junze, et al.
Published: (2025)
Near-Optimal Partially Observable Reinforcement Learning with Partial Online State Information
by: Shi, Ming, et al.
Published: (2023)
by: Shi, Ming, et al.
Published: (2023)
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
by: Huang, Yu, et al.
Published: (2024)
by: Huang, Yu, et al.
Published: (2024)
Regret Bounds for Reinforcement Learning from Multi-Source Imperfect Preferences
by: Shi, Ming, et al.
Published: (2026)
by: Shi, Ming, et al.
Published: (2026)
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
by: Wei, Quan, et al.
Published: (2025)
by: Wei, Quan, et al.
Published: (2025)
Binary Rewards and Reinforcement Learning: Fundamental Challenges
by: Dymetman, Marc
Published: (2026)
by: Dymetman, Marc
Published: (2026)
From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models
by: Liang, Yuchen, et al.
Published: (2026)
by: Liang, Yuchen, et al.
Published: (2026)
Reward Design for Reinforcement Learning Agents
by: Devidze, Rati
Published: (2025)
by: Devidze, Rati
Published: (2025)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
by: Yang, Tong, et al.
Published: (2024)
by: Yang, Tong, et al.
Published: (2024)
Constraint-Rectified Training for Efficient Chain-of-Thought
by: Wu, Qinhang, et al.
Published: (2026)
by: Wu, Qinhang, et al.
Published: (2026)
Provable In-Context Learning of Nonlinear Regression with Transformers
by: Li, Hongbo, et al.
Published: (2025)
by: Li, Hongbo, et al.
Published: (2025)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
by: Huang, Ruiquan, et al.
Published: (2023)
by: Huang, Ruiquan, et al.
Published: (2023)
Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
RDA: Reward Design Agent for Reinforcement Learning
by: Lee, Hojoon, et al.
Published: (2026)
by: Lee, Hojoon, et al.
Published: (2026)
Sharp Convergence Rates for Masked Diffusion Models
by: Liang, Yuchen, et al.
Published: (2026)
by: Liang, Yuchen, et al.
Published: (2026)
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
by: Liang, Yuchen, et al.
Published: (2024)
by: Liang, Yuchen, et al.
Published: (2024)
Decomposable Reward Modeling and Realistic Environment Design for Reinforcement Learning-Based Forex Trading
by: Saidd, Nabeel Ahmad
Published: (2026)
by: Saidd, Nabeel Ahmad
Published: (2026)
Provably Sample-Efficient Robust Reinforcement Learning with Average Reward
by: Roch, Zachary, et al.
Published: (2025)
by: Roch, Zachary, et al.
Published: (2025)
LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning
by: Liang, Aoming, et al.
Published: (2025)
by: Liang, Aoming, et al.
Published: (2025)
Step-wise Rubric Rewards for LLM Reasoning
by: Xie, Weichu, et al.
Published: (2026)
by: Xie, Weichu, et al.
Published: (2026)
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
by: Huang, Ruiquan, et al.
Published: (2025)
by: Huang, Ruiquan, et al.
Published: (2025)
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)
by: Xu, Ran, et al.
Published: (2026)
Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks
by: Dong, Yuxin, et al.
Published: (2024)
by: Dong, Yuxin, et al.
Published: (2024)
R^3: Replay, Reflection, and Ranking Rewards for LLM Reinforcement Learning
by: Jiang, Zhizheng, et al.
Published: (2026)
by: Jiang, Zhizheng, et al.
Published: (2026)
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
by: Miao, Yuchun, et al.
Published: (2025)
by: Miao, Yuchun, et al.
Published: (2025)
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
by: Liang, Yuchen, et al.
Published: (2025)
by: Liang, Yuchen, et al.
Published: (2025)
Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning
by: Jiang, Zhida, et al.
Published: (2026)
by: Jiang, Zhida, et al.
Published: (2026)
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
by: He, Haoran, et al.
Published: (2025)
by: He, Haoran, et al.
Published: (2025)
On Designing Effective RL Reward at Training Time for LLM Reasoning
by: Gao, Jiaxuan, et al.
Published: (2024)
by: Gao, Jiaxuan, et al.
Published: (2024)
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
by: Liang, Yuchen, et al.
Published: (2024)
by: Liang, Yuchen, et al.
Published: (2024)
Similar Items
-
Algorithm Design for Online Meta-Learning with Task Boundary Detection
by: Sow, Daouda, et al.
Published: (2023) -
Agentic Transformers Provably Learn to Search via Reinforcement Learning
by: Yang, Tong, et al.
Published: (2026) -
Robust Offline Reinforcement Learning for Non-Markovian Decision Processes
by: Huang, Ruiquan, et al.
Published: (2024) -
HIPO: Instruction Hierarchy via Constrained Reinforcement Learning
by: Chen, Keru, et al.
Published: (2026) -
The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards
by: Huang, Yu, et al.
Published: (2026)