:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Jie, Yang, Qishun, Li, Nuo
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.09165
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers
by: Adhikari, Rabin
Published: (2025)

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling
by: Zhao, Weijie, et al.
Published: (2026)

WLFM: A Well-Logs Foundation Model for Multi-Task and Cross-Well Geological Interpretation
by: Qi, Zhenyu, et al.
Published: (2025)

Neurocircuitry-Inspired Hierarchical Graph Causal Attention Networks for Explainable Depression Identification
by: Chen, Weidao, et al.
Published: (2025)

Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
by: Bu, Rui, et al.
Published: (2025)

Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)

The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)

ZeroS: Zero-Sum Linear Attention for Efficient Transformers
by: Lu, Jiecheng, et al.
Published: (2026)

Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification
by: Xu, Jiaxing, et al.
Published: (2024)

What Matters in Transformers? Not All Attention is Needed
by: He, Shwai, et al.
Published: (2024)

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)

CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-level Anomaly Detection
by: Li, Jindong, et al.
Published: (2024)

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
by: Lu, Jiecheng, et al.
Published: (2025)

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention
by: Zhou, Jingbo, et al.
Published: (2026)

VSFormer: Value and Shape-Aware Transformer with Prior-Enhanced Self-Attention for Multivariate Time Series Classification
by: Xi, Wenjie, et al.
Published: (2024)

XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
by: Kimura, Daichi, et al.
Published: (2025)

Synthetic Geology: Structural Geology Meets Deep Learning
by: Ghyselincks, Simon, et al.
Published: (2025)

A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction
by: Yan, Zimo, et al.
Published: (2025)

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
by: Peng, Miao, et al.
Published: (2025)

ControlMath: Controllable Data Generation Promotes Math Generalist Models
by: Chen, Nuo, et al.
Published: (2024)

UMoE: Unifying Attention and FFN with Shared Experts
by: Yang, Yuanhang, et al.
Published: (2025)

A Graph Transformer-Driven Approach for Network Robustness Learning
by: Zhang, Yu, et al.
Published: (2023)

CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community
by: Liu, Yan, et al.
Published: (2024)

PMET: Precise Model Editing in a Transformer
by: Li, Xiaopeng, et al.
Published: (2023)

Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning
by: Dhayalkar, Sahil Rajesh
Published: (2025)

Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)

Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)

Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)

Higher-Order Transformers With Kronecker-Structured Attention
by: Omranpour, Soroush, et al.
Published: (2024)

Expanding Expressivity in Transformer Models with MöbiusAttention
by: Halacheva, Anna-Maria, et al.
Published: (2024)

CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)

Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
by: Yu, Guoqi, et al.
Published: (2026)

Curse of Attention: A Kernel-Based Perspective for Why Transformers Fail to Generalize on Time Series Forecasting and Beyond
by: Ke, Yekun, et al.
Published: (2024)

AttentionSmithy: A Modular Framework for Rapid Transformer Development and Customization
by: Cranney, Caleb, et al.
Published: (2025)

Scaling Graph Transformers: A Comparative Study of Sparse and Dense Attention
by: Dimitrov, Leon
Published: (2025)

Horizon-wise Learning Paradigm Promotes Gene Splicing Identification
by: Li, Qi-Jie, et al.
Published: (2024)

CITRAS: Covariate-Informed Transformer for Time Series Forecasting
by: Yamaguchi, Yosuke, et al.
Published: (2025)

NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)