Saved in:
| Main Authors: | He, Jianliang, Wang, Leda, Chen, Siyu, Yang, Zhuoran |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.16849 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
by: Zhang, Yufeng, et al.
Published: (2021)
by: Zhang, Yufeng, et al.
Published: (2021)
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
by: Chen, Siyu, et al.
Published: (2024)
by: Chen, Siyu, et al.
Published: (2024)
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
by: Menezes, Michael, et al.
Published: (2025)
by: Menezes, Michael, et al.
Published: (2025)
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
by: Shen, Han, et al.
Published: (2024)
by: Shen, Han, et al.
Published: (2024)
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
by: Boursier, Etienne, et al.
Published: (2025)
by: Boursier, Etienne, et al.
Published: (2025)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency
by: Cai, Qi, et al.
Published: (2022)
by: Cai, Qi, et al.
Published: (2022)
Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems
by: Kim, Juno, et al.
Published: (2023)
by: Kim, Juno, et al.
Published: (2023)
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
by: Zhang, Yufeng, et al.
Published: (2020)
by: Zhang, Yufeng, et al.
Published: (2020)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization
by: Zhu, Yuchen, et al.
Published: (2024)
by: Zhu, Yuchen, et al.
Published: (2024)
Provably Efficient Exploration in Policy Optimization
by: Cai, Qi, et al.
Published: (2019)
by: Cai, Qi, et al.
Published: (2019)
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
by: Minegishi, Gouki, et al.
Published: (2023)
by: Minegishi, Gouki, et al.
Published: (2023)
Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach
by: Qiu, Shuang, et al.
Published: (2022)
by: Qiu, Shuang, et al.
Published: (2022)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning
by: Li, Zihao, et al.
Published: (2024)
by: Li, Zihao, et al.
Published: (2024)
Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization
by: Yang, Zhuoran, et al.
Published: (2020)
by: Yang, Zhuoran, et al.
Published: (2020)
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
by: Hanqing, Liu, et al.
Published: (2026)
by: Hanqing, Liu, et al.
Published: (2026)
Grokking as Structural Inference: Transformers Need Bayesian Lottery Tickets
by: Hidajat, Kai, et al.
Published: (2026)
by: Hidajat, Kai, et al.
Published: (2026)
Fourier Learning Machines: Nonharmonic Fourier-Based Neural Networks for Scientific Machine Learning
by: Rubel, Mominul, et al.
Published: (2025)
by: Rubel, Mominul, et al.
Published: (2025)
Modular Distributed Nonconvex Learning with Error Feedback
by: Carnevale, Guido, et al.
Published: (2025)
by: Carnevale, Guido, et al.
Published: (2025)
Feature Augmentation of GNNs for ILPs: Local Uniqueness Suffices
by: Han, Qingyu, et al.
Published: (2025)
by: Han, Qingyu, et al.
Published: (2025)
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
by: Liao, Fangshuo, et al.
Published: (2025)
by: Liao, Fangshuo, et al.
Published: (2025)
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity
by: Amortila, Philip, et al.
Published: (2024)
by: Amortila, Philip, et al.
Published: (2024)
Dynamic Controlled Variables Based Dynamic Self-Optimizing Control
by: Zhou, Chenchen, et al.
Published: (2026)
by: Zhou, Chenchen, et al.
Published: (2026)
Which Features are Best for Successor Features?
by: Ollivier, Yann
Published: (2025)
by: Ollivier, Yann
Published: (2025)
A Modular Algorithm for Non-Stationary Online Convex-Concave Optimization
by: Meng, Qing-xin, et al.
Published: (2025)
by: Meng, Qing-xin, et al.
Published: (2025)
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)
by: Sheen, Heejune, et al.
Published: (2024)
Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention
by: Yang, Yingzhen
Published: (2025)
by: Yang, Yingzhen
Published: (2025)
Decision-Dependent Stochastic Optimization: The Role of Distribution Dynamics
by: He, Zhiyu, et al.
Published: (2025)
by: He, Zhiyu, et al.
Published: (2025)
Enhancing Unsupervised Feature Selection via Double Sparsity Constrained Optimization
by: Xiu, Xianchao, et al.
Published: (2025)
by: Xiu, Xianchao, et al.
Published: (2025)
A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
A Theory of Feature Learning in Kernel Models
by: Chen, Yunlu, et al.
Published: (2023)
by: Chen, Yunlu, et al.
Published: (2023)
A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models
by: Gao, Peifeng, et al.
Published: (2026)
by: Gao, Peifeng, et al.
Published: (2026)
Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles
by: Hu, Yifan, et al.
Published: (2024)
by: Hu, Yifan, et al.
Published: (2024)
Bi-Sparse Unsupervised Feature Selection
by: Xiu, Xianchao, et al.
Published: (2024)
by: Xiu, Xianchao, et al.
Published: (2024)
Feature-Based Interpretable Surrogates for Optimization
by: Goerigk, Marc, et al.
Published: (2024)
by: Goerigk, Marc, et al.
Published: (2024)
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
by: Caron, Francois, et al.
Published: (2023)
by: Caron, Francois, et al.
Published: (2023)
Negative Imaginary Neural ODEs: Learning to Control Mechanical Systems with Stability Guarantees
by: Shi, Kanghong, et al.
Published: (2025)
by: Shi, Kanghong, et al.
Published: (2025)
Random Features Approximation for Control-Affine Systems
by: Kazemian, Kimia, et al.
Published: (2024)
by: Kazemian, Kimia, et al.
Published: (2024)
A Compositional Kernel Model for Feature Learning
by: Ruan, Feng, et al.
Published: (2025)
by: Ruan, Feng, et al.
Published: (2025)
Supervised Feature Compression based on Counterfactual Analysis
by: Piccialli, Veronica, et al.
Published: (2022)
by: Piccialli, Veronica, et al.
Published: (2022)
Similar Items
-
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
by: Chen, Siyu, et al.
Published: (2024) -
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
by: Zhang, Yufeng, et al.
Published: (2021) -
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
by: Chen, Siyu, et al.
Published: (2024) -
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
by: Menezes, Michael, et al.
Published: (2025) -
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
by: Shen, Han, et al.
Published: (2024)