:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cranney, Caleb, Meyer, Jesse G.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.09503
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research
by: Howard, Luke
Published: (2025)

M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention
by: Tang, Yiming, et al.
Published: (2025)

Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)

RL + Transformer = A General-Purpose Problem Solver
by: Rentschler, Micah, et al.
Published: (2025)

Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)

FedModule: A Modular Federated Learning Framework
by: Chen, Chuyi, et al.
Published: (2024)

Learning Modular Exponentiation with Transformers
by: Africa, David Demitri, et al.
Published: (2025)

Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)

Spacetime $E(n)$-Transformer: Equivariant Attention for Spatio-temporal Graphs
by: Charles, Sergio G.
Published: (2024)

Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
by: Kerce, J. Clayton
Published: (2026)

The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)

Multi-Perspective Transformers in ARC-AGI-2 Challenge
by: Talley, Caleb, et al.
Published: (2026)

Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)

Enhancing Customer Service Chatbots with Context-Aware NLU through Selective Attention and Multi-task Learning
by: Nandi, Subhadip, et al.
Published: (2025)

Neural Organ Transplantation (NOT): Checkpoint-Based Modular Adaptation for Transformer Models
by: Al-Zuraiqi, Ahmad
Published: (2026)

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)

XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
by: Kimura, Daichi, et al.
Published: (2025)

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
by: Li, Wenbing, et al.
Published: (2025)

ParMod: A Parallel and Modular Framework for Learning Non-Markovian Tasks
by: Miao, Ruixuan, et al.
Published: (2024)

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
by: Zhu, Yuke, et al.
Published: (2020)

Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
by: Furuta, Hiroki, et al.
Published: (2024)

Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics
by: Jore, Caleb, et al.
Published: (2026)

Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning
by: Dhayalkar, Sahil Rajesh
Published: (2025)

GIAT: A Geologically-Informed Attention Transformer for Lithology Identification
by: Li, Jie, et al.
Published: (2026)

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization
by: Zhang, Jinhao, et al.
Published: (2025)

Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)

Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)

Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)

Higher-Order Transformers With Kronecker-Structured Attention
by: Omranpour, Soroush, et al.
Published: (2024)

Expanding Expressivity in Transformer Models with MöbiusAttention
by: Halacheva, Anna-Maria, et al.
Published: (2024)

Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)

Customized Load Profiles Synthesis for Electricity Customers Based on Conditional Diffusion Models
by: Wang, Zhenyi, et al.
Published: (2023)

METHOD: Modular Efficient Transformer for Health Outcome Discovery
by: Qian, Linglong, et al.
Published: (2025)

Interpretable Discovery of One-parameter Subgroups: A Modular Framework for Elliptical, Hyperbolic, and Parabolic Symmetries
by: Karjol, Pavan, et al.
Published: (2025)

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition
by: Khan, Haris, et al.
Published: (2025)

Scaling Graph Transformers: A Comparative Study of Sparse and Dense Attention
by: Dimitrov, Leon
Published: (2025)

Echo State Transformer: Attention Over Finite Memories
by: Bendi-Ouis, Yannis, et al.
Published: (2025)

Strassen Attention, Split VC Dimension and Compositionality in Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2025)

HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers
by: Malhou, Mohamed, et al.
Published: (2025)

NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)