Saved in:
| Main Authors: | Xie, Zixuan, Liu, Xinyu, Chen, Claire, Liu, Shuze Daniel, Chandra, Rohan, Zhang, Shangtong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.07333 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)
by: Xie, Zixuan, et al.
Published: (2026)
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
by: Xie, Zixuan, et al.
Published: (2025)
by: Xie, Zixuan, et al.
Published: (2025)
Doubly Optimal Policy Evaluation for Reinforcement Learning
by: Liu, Shuze Daniel, et al.
Published: (2024)
by: Liu, Shuze Daniel, et al.
Published: (2024)
Efficient Multi-Policy Evaluation for Reinforcement Learning
by: Liu, Shuze Daniel, et al.
Published: (2024)
by: Liu, Shuze Daniel, et al.
Published: (2024)
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
by: Chen, Claire, et al.
Published: (2024)
by: Chen, Claire, et al.
Published: (2024)
Convergence of Two-Timescale Markovian Stochastic Approximations with Applications in Reinforcement Learning
by: Mahadevan, Vagul, et al.
Published: (2026)
by: Mahadevan, Vagul, et al.
Published: (2026)
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
by: Liu, Shuze Daniel, et al.
Published: (2024)
by: Liu, Shuze Daniel, et al.
Published: (2024)
Towards Provable Emergence of In-Context Reinforcement Learning
by: Wang, Jiuqi, et al.
Published: (2025)
by: Wang, Jiuqi, et al.
Published: (2025)
Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning
by: Liu, Xinyu, et al.
Published: (2025)
by: Liu, Xinyu, et al.
Published: (2025)
Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set
by: Liu, Xinyu, et al.
Published: (2025)
by: Liu, Xinyu, et al.
Published: (2025)
Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift
by: Liu, Xinyu, et al.
Published: (2026)
by: Liu, Xinyu, et al.
Published: (2026)
Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design
by: Liu, Shuze, et al.
Published: (2023)
by: Liu, Shuze, et al.
Published: (2023)
MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
by: Liu, Xinyu, et al.
Published: (2026)
by: Liu, Xinyu, et al.
Published: (2026)
Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise
by: Qian, Xiaochi, et al.
Published: (2024)
by: Qian, Xiaochi, et al.
Published: (2024)
Predicting Plasticity in Deep Continual Learning: A Theoretical Perspective
by: Wang, Jiuqi, et al.
Published: (2026)
by: Wang, Jiuqi, et al.
Published: (2026)
MathlibPR: Pull Request Merge-Readiness Benchmark for Formal Mathematical Libraries
by: Xie, Zixuan, et al.
Published: (2026)
by: Xie, Zixuan, et al.
Published: (2026)
Offline Two-Player Zero-Sum Markov Games with KL Regularization
by: Chen, Claire, et al.
Published: (2026)
by: Chen, Claire, et al.
Published: (2026)
A Survey of In-Context Reinforcement Learning
by: Moeini, Amir, et al.
Published: (2025)
by: Moeini, Amir, et al.
Published: (2025)
Safe In-Context Reinforcement Learning
by: Moeini, Amir, et al.
Published: (2025)
by: Moeini, Amir, et al.
Published: (2025)
Group Fairness in Multi-Task Reinforcement Learning
by: Song, Kefan, et al.
Published: (2025)
by: Song, Kefan, et al.
Published: (2025)
Prompt-Driven Domain Adaptation for End-to-End Autonomous Driving via In-Context RL
by: Khurram, Aleesha, et al.
Published: (2025)
by: Khurram, Aleesha, et al.
Published: (2025)
Experience Replay Addresses Loss of Plasticity in Continual Learning
by: Wang, Jiuqi, et al.
Published: (2025)
by: Wang, Jiuqi, et al.
Published: (2025)
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
by: Wang, Jiuqi, et al.
Published: (2024)
by: Wang, Jiuqi, et al.
Published: (2024)
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
by: Song, Kefan, et al.
Published: (2025)
by: Song, Kefan, et al.
Published: (2025)
Towards Formalizing Reinforcement Learning Theory
by: Zhang, Shangtong
Published: (2025)
by: Zhang, Shangtong
Published: (2025)
Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
by: Boursier, Etienne, et al.
Published: (2025)
by: Boursier, Etienne, et al.
Published: (2025)
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
by: Zhang, Shangtong, et al.
Published: (2021)
by: Zhang, Shangtong, et al.
Published: (2021)
In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness
by: Collins, Liam, et al.
Published: (2024)
by: Collins, Liam, et al.
Published: (2024)
Why Softmax Attention Outperforms Linear Attention
by: Deng, Yichuan, et al.
Published: (2023)
by: Deng, Yichuan, et al.
Published: (2023)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention
by: He, Jianliang, et al.
Published: (2025)
by: He, Jianliang, et al.
Published: (2025)
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features
by: Wang, Jiuqi, et al.
Published: (2024)
by: Wang, Jiuqi, et al.
Published: (2024)
Universal Approximation with Softmax Attention
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
Softmax-free Linear Transformers
by: Lu, Jiachen, et al.
Published: (2022)
by: Lu, Jiachen, et al.
Published: (2022)
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
by: Zhang, Michael, et al.
Published: (2024)
by: Zhang, Michael, et al.
Published: (2024)
Counterfactual Explanations for Continuous Action Reinforcement Learning
by: Dong, Shuyang, et al.
Published: (2025)
by: Dong, Shuyang, et al.
Published: (2025)
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
by: Song, Kefan, et al.
Published: (2025)
by: Song, Kefan, et al.
Published: (2025)
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
by: Saratchandran, Hemanth, et al.
Published: (2024)
by: Saratchandran, Hemanth, et al.
Published: (2024)
CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening
by: Kulkarni, Amar, et al.
Published: (2024)
by: Kulkarni, Amar, et al.
Published: (2024)
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
by: Nishikawa, Naoki, et al.
Published: (2025)
by: Nishikawa, Naoki, et al.
Published: (2025)
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
Similar Items
-
Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026) -
Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
by: Xie, Zixuan, et al.
Published: (2025) -
Doubly Optimal Policy Evaluation for Reinforcement Learning
by: Liu, Shuze Daniel, et al.
Published: (2024) -
Efficient Multi-Policy Evaluation for Reinforcement Learning
by: Liu, Shuze Daniel, et al.
Published: (2024) -
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
by: Chen, Claire, et al.
Published: (2024)