Saved in:
| Main Authors: | Chen, Brian K, Hu, Tianyang, Jin, Hui, Lee, Hwee Kuan, Kawaguchi, Kenji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02847 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
by: Wang, Haonan, et al.
Published: (2025)
by: Wang, Haonan, et al.
Published: (2025)
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
by: Liu, Xuantong, et al.
Published: (2024)
by: Liu, Xuantong, et al.
Published: (2024)
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
by: Mainali, Nischal, et al.
Published: (2025)
by: Mainali, Nischal, et al.
Published: (2025)
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
by: Xie, Zixuan, et al.
Published: (2026)
by: Xie, Zixuan, et al.
Published: (2026)
Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
by: Tang, Zhongpan
Published: (2025)
by: Tang, Zhongpan
Published: (2025)
InAttention: Linear Context Scaling for Transformers
by: Eisner, Joseph
Published: (2024)
by: Eisner, Joseph
Published: (2024)
Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)
by: Zhang, Yedi, et al.
Published: (2025)
Exact Linear Attention
by: Ou, Weinuo
Published: (2026)
by: Ou, Weinuo
Published: (2026)
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
by: Hsu, Chih-Chung, et al.
Published: (2026)
by: Hsu, Chih-Chung, et al.
Published: (2026)
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
by: Lee, Chungpa, et al.
Published: (2026)
by: Lee, Chungpa, et al.
Published: (2026)
PMNO: A novel physics guided multi-step neural operator predictor for partial differential equations
by: Song, Jin, et al.
Published: (2025)
by: Song, Jin, et al.
Published: (2025)
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
by: Chen, Xingwu, et al.
Published: (2024)
by: Chen, Xingwu, et al.
Published: (2024)
Variational Linear Attention: Stable Associative Memory for Long-Context Transformers
by: Pandey, Vishal, et al.
Published: (2026)
by: Pandey, Vishal, et al.
Published: (2026)
Any-Order GPT as Masked Diffusion Model: Decoupling Formulation and Architecture
by: Xue, Shuchen, et al.
Published: (2025)
by: Xue, Shuchen, et al.
Published: (2025)
Exact Flow Linear Attention: Exact Solution from Continuous-Time Dynamics
by: Lei, Jingdi, et al.
Published: (2025)
by: Lei, Jingdi, et al.
Published: (2025)
Drug Discovery with Dynamic Goal-aware Fragments
by: Lee, Seul, et al.
Published: (2023)
by: Lee, Seul, et al.
Published: (2023)
Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators
by: Shi, Zekun, et al.
Published: (2024)
by: Shi, Zekun, et al.
Published: (2024)
Investigating Layer Importance in Large Language Models
by: Zhang, Yang, et al.
Published: (2024)
by: Zhang, Yang, et al.
Published: (2024)
Set-based Meta-Interpolation for Few-Task Meta-Learning
by: Lee, Seanie, et al.
Published: (2022)
by: Lee, Seanie, et al.
Published: (2022)
Self-Supervised Dataset Distillation for Transfer Learning
by: Lee, Dong Bok, et al.
Published: (2023)
by: Lee, Dong Bok, et al.
Published: (2023)
In-Context Algorithm Emulation in Fixed-Weight Transformers
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
by: Ma, Jiajun, et al.
Published: (2024)
by: Ma, Jiajun, et al.
Published: (2024)
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)
by: Hu, Wenjie, et al.
Published: (2025)
Model Selection with a Shapelet-based Distance Measure for Multi-source Transfer Learning in Time Series Classification
by: Lee, Jiseok, et al.
Published: (2024)
by: Lee, Jiseok, et al.
Published: (2024)
Learning Exactly Linearizable Deep Dynamics Models
by: Moriyasu, Ryuta, et al.
Published: (2023)
by: Moriyasu, Ryuta, et al.
Published: (2023)
Prompt Optimization via Adversarial In-Context Learning
by: Do, Xuan Long, et al.
Published: (2023)
by: Do, Xuan Long, et al.
Published: (2023)
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
by: Cheng, Xiang, et al.
Published: (2023)
by: Cheng, Xiang, et al.
Published: (2023)
Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)
by: Emadi, Seyed Morteza
Published: (2026)
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients
by: Wang, Yezhen, et al.
Published: (2025)
by: Wang, Yezhen, et al.
Published: (2025)
In-Context Deep Learning via Transformer Models
by: Wu, Weimin, et al.
Published: (2024)
by: Wu, Weimin, et al.
Published: (2024)
Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens
by: Anwar, Usman, et al.
Published: (2024)
by: Anwar, Usman, et al.
Published: (2024)
Cottention: Linear Transformers With Cosine Attention
by: Mongaras, Gabriel, et al.
Published: (2024)
by: Mongaras, Gabriel, et al.
Published: (2024)
Linear Transformers are Versatile In-Context Learners
by: Vladymyrov, Max, et al.
Published: (2024)
by: Vladymyrov, Max, et al.
Published: (2024)
In-Context Learning in Linear vs. Quadratic Attention Models: An Empirical Study on Regression Tasks
by: Goel, Ayush, et al.
Published: (2026)
by: Goel, Ayush, et al.
Published: (2026)
Conversational Dueling Bandits in Generalized Linear Models
by: Yang, Shuhua, et al.
Published: (2024)
by: Yang, Shuhua, et al.
Published: (2024)
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
by: Li, Siquan, et al.
Published: (2026)
by: Li, Siquan, et al.
Published: (2026)
Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences
by: Li, Siquan, et al.
Published: (2026)
by: Li, Siquan, et al.
Published: (2026)
Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer
by: Hsu, Alexander, et al.
Published: (2026)
by: Hsu, Alexander, et al.
Published: (2026)
A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation
by: Qi, Tianyang, et al.
Published: (2025)
by: Qi, Tianyang, et al.
Published: (2025)
In-Context Learning of Linear Dynamical Systems with Transformers: Approximation Bounds and Depth-Separation
by: Cole, Frank, et al.
Published: (2025)
by: Cole, Frank, et al.
Published: (2025)
Similar Items
-
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
by: Wang, Haonan, et al.
Published: (2025) -
Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
by: Liu, Xuantong, et al.
Published: (2024) -
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
by: Mainali, Nischal, et al.
Published: (2025) -
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
by: Xie, Zixuan, et al.
Published: (2026) -
Rethinking Transformer Connectivity: TLinFormer, A Path to Exact, Full Context-Aware Linear Attention
by: Tang, Zhongpan
Published: (2025)