Saved in:
| Main Authors: | Cranney, Caleb, Meyer, Jesse G. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.09503 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research
by: Howard, Luke
Published: (2025)
by: Howard, Luke
Published: (2025)
M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention
by: Tang, Yiming, et al.
Published: (2025)
by: Tang, Yiming, et al.
Published: (2025)
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)
by: Guo, Zhenyu, et al.
Published: (2025)
RL + Transformer = A General-Purpose Problem Solver
by: Rentschler, Micah, et al.
Published: (2025)
by: Rentschler, Micah, et al.
Published: (2025)
Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)
by: Hoogland, Jesse, et al.
Published: (2024)
FedModule: A Modular Federated Learning Framework
by: Chen, Chuyi, et al.
Published: (2024)
by: Chen, Chuyi, et al.
Published: (2024)
Learning Modular Exponentiation with Transformers
by: Africa, David Demitri, et al.
Published: (2025)
by: Africa, David Demitri, et al.
Published: (2025)
Geometric Attention: A Regime-Explicit Operator Semantics for Transformer Attention
by: Freytes, Luis Rosario
Published: (2026)
by: Freytes, Luis Rosario
Published: (2026)
Spacetime $E(n)$-Transformer: Equivariant Attention for Spatio-temporal Graphs
by: Charles, Sergio G.
Published: (2024)
by: Charles, Sergio G.
Published: (2024)
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
by: Kerce, J. Clayton
Published: (2026)
by: Kerce, J. Clayton
Published: (2026)
The Bayesian Geometry of Transformer Attention
by: Agarwal, Naman, et al.
Published: (2025)
by: Agarwal, Naman, et al.
Published: (2025)
Multi-Perspective Transformers in ARC-AGI-2 Challenge
by: Talley, Caleb, et al.
Published: (2026)
by: Talley, Caleb, et al.
Published: (2026)
Attention Schema-based Attention Control (ASAC): A Cognitive-Inspired Approach for Attention Management in Transformers
by: Saxena, Krati, et al.
Published: (2025)
by: Saxena, Krati, et al.
Published: (2025)
Enhancing Customer Service Chatbots with Context-Aware NLU through Selective Attention and Multi-task Learning
by: Nandi, Subhadip, et al.
Published: (2025)
by: Nandi, Subhadip, et al.
Published: (2025)
Neural Organ Transplantation (NOT): Checkpoint-Based Modular Adaptation for Transformer Models
by: Al-Zuraiqi, Ahmad
Published: (2026)
by: Al-Zuraiqi, Ahmad
Published: (2026)
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
by: Wang, George, et al.
Published: (2024)
by: Wang, George, et al.
Published: (2024)
XicorAttention: Time Series Transformer Using Attention with Nonlinear Correlation
by: Kimura, Daichi, et al.
Published: (2025)
by: Kimura, Daichi, et al.
Published: (2025)
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
by: Li, Wenbing, et al.
Published: (2025)
by: Li, Wenbing, et al.
Published: (2025)
ParMod: A Parallel and Modular Framework for Learning Non-Markovian Tasks
by: Miao, Ruixuan, et al.
Published: (2024)
by: Miao, Ruixuan, et al.
Published: (2024)
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
by: Zhu, Yuke, et al.
Published: (2020)
by: Zhu, Yuke, et al.
Published: (2020)
Towards Empirical Interpretation of Internal Circuits and Properties in Grokked Transformers on Modular Polynomials
by: Furuta, Hiroki, et al.
Published: (2024)
by: Furuta, Hiroki, et al.
Published: (2024)
Bifurcation Models: Learning Set-Valued Solution Maps with Weight-Tied Dynamics
by: Jore, Caleb, et al.
Published: (2026)
by: Jore, Caleb, et al.
Published: (2026)
Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning
by: Dhayalkar, Sahil Rajesh
Published: (2025)
by: Dhayalkar, Sahil Rajesh
Published: (2025)
GIAT: A Geologically-Informed Attention Transformer for Lithology Identification
by: Li, Jie, et al.
Published: (2026)
by: Li, Jie, et al.
Published: (2026)
CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization
by: Zhang, Jinhao, et al.
Published: (2025)
by: Zhang, Jinhao, et al.
Published: (2025)
Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)
by: Yan, Ruiqing, et al.
Published: (2024)
Exact Attention Sensitivity and the Geometry of Transformer Stability
by: Emadi, Seyed Morteza
Published: (2026)
by: Emadi, Seyed Morteza
Published: (2026)
Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)
by: Choi, Jeongwhan, et al.
Published: (2023)
Higher-Order Transformers With Kronecker-Structured Attention
by: Omranpour, Soroush, et al.
Published: (2024)
by: Omranpour, Soroush, et al.
Published: (2024)
Expanding Expressivity in Transformer Models with MöbiusAttention
by: Halacheva, Anna-Maria, et al.
Published: (2024)
by: Halacheva, Anna-Maria, et al.
Published: (2024)
Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention
by: Hu, Wenjie, et al.
Published: (2025)
by: Hu, Wenjie, et al.
Published: (2025)
Customized Load Profiles Synthesis for Electricity Customers Based on Conditional Diffusion Models
by: Wang, Zhenyi, et al.
Published: (2023)
by: Wang, Zhenyi, et al.
Published: (2023)
METHOD: Modular Efficient Transformer for Health Outcome Discovery
by: Qian, Linglong, et al.
Published: (2025)
by: Qian, Linglong, et al.
Published: (2025)
Interpretable Discovery of One-parameter Subgroups: A Modular Framework for Elliptical, Hyperbolic, and Parabolic Symmetries
by: Karjol, Pavan, et al.
Published: (2025)
by: Karjol, Pavan, et al.
Published: (2025)
Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition
by: Khan, Haris, et al.
Published: (2025)
by: Khan, Haris, et al.
Published: (2025)
Scaling Graph Transformers: A Comparative Study of Sparse and Dense Attention
by: Dimitrov, Leon
Published: (2025)
by: Dimitrov, Leon
Published: (2025)
Echo State Transformer: Attention Over Finite Memories
by: Bendi-Ouis, Yannis, et al.
Published: (2025)
by: Bendi-Ouis, Yannis, et al.
Published: (2025)
Strassen Attention, Split VC Dimension and Compositionality in Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2025)
by: Kozachinskiy, Alexander, et al.
Published: (2025)
HATSolver: Learning Groebner Bases with Hierarchical Attention Transformers
by: Malhou, Mohamed, et al.
Published: (2025)
by: Malhou, Mohamed, et al.
Published: (2025)
NoiseFormer -- Noise Diffused Symmetric Attention Transformer
by: Kumar, Phani, et al.
Published: (2026)
by: Kumar, Phani, et al.
Published: (2026)
Similar Items
-
GoldenTransformer: A Modular Fault Injection Framework for Transformer Robustness Research
by: Howard, Luke
Published: (2025) -
M3PT: A Transformer for Multimodal, Multi-Party Social Signal Prediction with Person-aware Blockwise Attention
by: Tang, Yiming, et al.
Published: (2025) -
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025) -
RL + Transformer = A General-Purpose Problem Solver
by: Rentschler, Micah, et al.
Published: (2025) -
Loss Landscape Degeneracy and Stagewise Development in Transformers
by: Hoogland, Jesse, et al.
Published: (2024)