Saved in:
| Main Authors: | Li, Jie, Yang, Qishun, Li, Nuo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.09165 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
by: Adhikari, Rabin
Published: (2025)
by: Adhikari, Rabin
Published: (2025)
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling
by: Zhao, Weijie, et al.
Published: (2026)
by: Zhao, Weijie, et al.
Published: (2026)
WLFM: A Well-Logs Foundation Model for Multi-Task and Cross-Well Geological Interpretation
by: Qi, Zhenyu, et al.
Published: (2025)
by: Qi, Zhenyu, et al.
Published: (2025)
Neurocircuitry-Inspired Hierarchical Graph Causal Attention Networks for Explainable Depression Identification
by: Chen, Weidao, et al.
Published: (2025)
by: Chen, Weidao, et al.
Published: (2025)
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)
by: Bu, Rui, et al.
Published: (2025)
Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)
by: Freytes, Luis Rosario
Published: (2026)
The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)
by: Agarwal, Naman, et al.
Published: (2025)
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)
by: Saxena, Krati, et al.
Published: (2025)
ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)
by: Lu, Jiecheng, et al.
Published: (2026)
Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification
by: Xu, Jiaxing, et al.
Published: (2024)
by: Xu, Jiaxing, et al.
Published: (2024)
What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)
by: He, Shwai, et al.
Published: (2024)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection
by: Li, Jindong, et al.
Published: (2024)
by: Li, Jindong, et al.
Published: (2024)
Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
by: Lu, Jiecheng, et al.
Published: (2025)
by: Lu, Jiecheng, et al.
Published: (2025)
VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)
by: Zhou, Jingbo, et al.
Published: (2026)
VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification
by: Xi, Wenjie, et al.
Published: (2024)
by: Xi, Wenjie, et al.
Published: (2024)
XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
by: Kimura, Daichi, et al.
Published: (2025)
by: Kimura, Daichi, et al.
Published: (2025)
Synthetic Geology: Structural Geology Meets Deep Learning
by: Ghyselincks, Simon, et al.
Published: (2025)
by: Ghyselincks, Simon, et al.
Published: (2025)
A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction
by: Yan, Zimo, et al.
Published: (2025)
by: Yan, Zimo, et al.
Published: (2025)
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)
by: Peng, Miao, et al.
Published: (2025)
ControlMath: Controllable Data Generation Promotes Math Generalist Models
by: Chen, Nuo, et al.
Published: (2024)
by: Chen, Nuo, et al.
Published: (2024)
UMoE: Unifying Attention and FFN with Shared Experts
by: Yang, Yuanhang, et al.
Published: (2025)
by: Yang, Yuanhang, et al.
Published: (2025)
A Graph Transformer-Driven Approach for Network Robustness Learning
by: Zhang, Yu, et al.
Published: (2023)
by: Zhang, Yu, et al.
Published: (2023)
CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
PMET: Precise Model Editing in a Transformer
by: Li, Xiaopeng, et al.
Published: (2023)
by: Li, Xiaopeng, et al.
Published: (2023)
Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning
by: Dhayalkar, Sahil Rajesh
Published: (2025)
by: Dhayalkar, Sahil Rajesh
Published: (2025)
Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)
by: Emadi, Seyed Morteza
Published: (2026)
Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)
by: Yan, Ruiqing, et al.
Published: (2024)
Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)
by: Choi, Jeongwhan, et al.
Published: (2023)
Higher-Order Transformers With Kronecker-Structured Attention
by: Omranpour, Soroush, et al.
Published: (2024)
by: Omranpour, Soroush, et al.
Published: (2024)
Expanding Expressivity in Transformer Models with MöbiusAttention
by: Halacheva, Anna-Maria, et al.
Published: (2024)
by: Halacheva, Anna-Maria, et al.
Published: (2024)
CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)
by: Pati, Viresh, et al.
Published: (2026)
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)
by: Hu, Wenjie, et al.
Published: (2025)
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
by: Yu, Guoqi, et al.
Published: (2026)
by: Yu, Guoqi, et al.
Published: (2026)
Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond
by: Ke, Yekun, et al.
Published: (2024)
by: Ke, Yekun, et al.
Published: (2024)
AttentionSmithy: A Modular Framework for Rapid Transformer Development and Customization
by: Cranney, Caleb, et al.
Published: (2025)
by: Cranney, Caleb, et al.
Published: (2025)
Scaling Graph Transformers: A Comparative Study of Sparse and Dense Attention
by: Dimitrov, Leon
Published: (2025)
by: Dimitrov, Leon
Published: (2025)
Horizon-wise Learning Paradigm Promotes Gene Splicing Identification
by: Li, Qi-Jie, et al.
Published: (2024)
by: Li, Qi-Jie, et al.
Published: (2024)
CITRAS: Covariate-Informed Transformer for Time Series Forecasting
by: Yamaguchi, Yosuke, et al.
Published: (2025)
by: Yamaguchi, Yosuke, et al.
Published: (2025)
NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)
by: Kumar, Phani, et al.
Published: (2026)
Similar Items
-
Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
by: Adhikari, Rabin
Published: (2025) -
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling
by: Zhao, Weijie, et al.
Published: (2026) -
WLFM: A Well-Logs Foundation Model for Multi-Task and Cross-Well Geological Interpretation
by: Qi, Zhenyu, et al.
Published: (2025) -
Neurocircuitry-Inspired Hierarchical Graph Causal Attention Networks for Explainable Depression Identification
by: Chen, Weidao, et al.
Published: (2025) -
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)