:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Brian K, Hu, Tianyang, Jin, Hui, Lee, Hwee Kuan, Kawaguchi, Kenji
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.02847
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
by: Wang, Haonan, et al.
Published: (2025)

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
by: Liu, Xuantong, et al.
Published: (2024)

Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
by: Mainali, Nischal, et al.
Published: (2025)

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
by: Xie, Zixuan, et al.
Published: (2026)

Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
by: Tang, Zhongpan
Published: (2025)

InAttention: Linear Context Scaling for Transformers
by: Eisner, Joseph
Published: (2024)

Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)

Exact Linear Attention
by: Ou, Weinuo
Published: (2026)

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
by: Hsu, Chih-Chung, et al.
Published: (2026)

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
by: Lee, Chungpa, et al.
Published: (2026)

PMNO: A novel physics guided multi-step neural operator predictor for partial differential equations
by: Song, Jin, et al.
Published: (2025)

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
by: Chen, Xingwu, et al.
Published: (2024)

Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
by: Pandey, Vishal, et al.
Published: (2026)

Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
by: Xue, Shuchen, et al.
Published: (2025)

Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics
by: Lei, Jingdi, et al.
Published: (2025)

Drug Discovery with Dynamic Goal-aware Fragments
by: Lee, Seul, et al.
Published: (2023)

Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators
by: Shi, Zekun, et al.
Published: (2024)

Investigating Layer Importance in Large Language Models
by: Zhang, Yang, et al.
Published: (2024)

Set-based Meta-Interpolation for Few-Task Meta-Learning
by: Lee, Seanie, et al.
Published: (2022)

Self-Supervised Dataset Distillation for Transfer Learning
by: Lee, Dong Bok, et al.
Published: (2023)

In-Context Algorithm Emulation in Fixed-Weight Transformers
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)

The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
by: Ma, Jiajun, et al.
Published: (2024)

Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)

Model Selection with a Shapelet-based Distance Measure for Multi-source Transfer Learning in Time Series Classification
by: Lee, Jiseok, et al.
Published: (2024)

Learning Exactly Linearizable Deep Dynamics Models
by: Moriyasu, Ryuta, et al.
Published: (2023)

Prompt Optimization via Adversarial In-Context Learning
by: Do, Xuan Long, et al.
Published: (2023)

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
by: Cheng, Xiang, et al.
Published: (2023)

Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)

Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
by: Wang, Yezhen, et al.
Published: (2025)

In-Context Deep Learning via Transformer Models
by: Wu, Weimin, et al.
Published: (2024)

Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens
by: Anwar, Usman, et al.
Published: (2024)

Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)

Linear Transformers are Versatile In-Context Learners
by: Vladymyrov, Max, et al.
Published: (2024)

In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks
by: Goel, Ayush, et al.
Published: (2026)

Conversational Dueling Bandits in Generalized Linear Models
by: Yang, Shuhua, et al.
Published: (2024)

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
by: Li, Siquan, et al.
Published: (2026)

Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences
by: Li, Siquan, et al.
Published: (2026)

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
by: Hsu, Alexander, et al.
Published: (2026)

A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation
by: Qi, Tianyang, et al.
Published: (2025)

In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation
by: Cole, Frank, et al.
Published: (2025)