:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	He, Jianliang, Pan, Xintian, Chen, Siyu, Yang, Zhuoran
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2503.12734
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
di: Chen, Siyu, et al.
Pubblicazione: (2024)

On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking
di: He, Jianliang, et al.
Pubblicazione: (2026)

Superiority of Multi-Head Attention in In-Context Linear Regression
di: Cui, Yingqian, et al.
Pubblicazione: (2024)

From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
di: He, Jianliang, et al.
Pubblicazione: (2024)

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
di: Chen, Siyu, et al.
Pubblicazione: (2024)

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
di: Guo, Tianyu, et al.
Pubblicazione: (2024)

Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
di: Zuo, Yifei, et al.
Pubblicazione: (2025)

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
di: Xie, Zixuan, et al.
Pubblicazione: (2026)

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
di: He, Jianliang, et al.
Pubblicazione: (2024)

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
di: Chen, Xingwu, et al.
Pubblicazione: (2024)

Why Softmax Attention Outperforms Linear Attention
di: Deng, Yichuan, et al.
Pubblicazione: (2023)

Training Dynamics of In-Context Learning in Linear Attention
di: Zhang, Yedi, et al.
Pubblicazione: (2025)

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
di: Sheen, Heejune, et al.
Pubblicazione: (2024)

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
di: Goel, Gautam, et al.
Pubblicazione: (2026)

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
di: Chou, Yuhong, et al.
Pubblicazione: (2024)

Demystifying the Slash Pattern in Attention: The Role of RoPE
di: Cheng, Yuan, et al.
Pubblicazione: (2026)

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
di: Zhang, Michael, et al.
Pubblicazione: (2024)

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
di: Nishikawa, Naoki, et al.
Pubblicazione: (2025)

Softmax Linear Attention: Reclaiming Global Competition
di: Xu, Mingwei, et al.
Pubblicazione: (2026)

Universal Approximation with Softmax Attention
di: Hu, Jerry Yao-Chieh, et al.
Pubblicazione: (2025)

Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training
di: Hu, Pihe, et al.
Pubblicazione: (2024)

Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units
di: Chen, Jianhui, et al.
Pubblicazione: (2026)

Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
di: Duranthon, O., et al.
Pubblicazione: (2025)

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
di: Boursier, Etienne, et al.
Pubblicazione: (2025)

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness
di: Collins, Liam, et al.
Pubblicazione: (2024)

Dynamics of Transient Structure in In-Context Linear Regression Transformers
di: Carroll, Liam, et al.
Pubblicazione: (2025)

Demystifying Linear MDPs and Novel Dynamics Aggregation Framework
di: Lee, Joongkyu, et al.
Pubblicazione: (2024)

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
di: Jiang, Jiarui, et al.
Pubblicazione: (2025)

Shortcut to Nowhere: Demystifying Deep Spurious Regression
di: Xu, Guanrong, et al.
Pubblicazione: (2026)

Scalable-Softmax Is Superior for Attention
di: Nakanishi, Ken M.
Pubblicazione: (2025)

On the Invariants of Softmax Attention
di: Lee, Wonsuk
Pubblicazione: (2026)

Forgetting Transformer: Softmax Attention with a Forget Gate
di: Lin, Zhixuan, et al.
Pubblicazione: (2025)

Model Collapse Demystified: The Case of Regression
di: Dohmatob, Elvis, et al.
Pubblicazione: (2024)

Mechanistic Interpretability of Fine-Tuned Vision Transformers on Distorted Images: Decoding Attention Head Behavior for Transparent and Trustworthy AI
di: Bahador, Nooshin
Pubblicazione: (2025)

Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
di: Li, Ed, et al.
Pubblicazione: (2025)

Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
di: Zhang, Yufeng, et al.
Pubblicazione: (2021)

Softmax-free Linear Transformers
di: Lu, Jiachen, et al.
Pubblicazione: (2022)

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
di: Zuhri, Zayd M. K., et al.
Pubblicazione: (2025)

Multi-Head Low-Rank Attention
di: Liu, Songtao, et al.
Pubblicazione: (2026)

Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning
di: Altabaa, Awni, et al.
Pubblicazione: (2025)