Saved in:
| Main Author: | Mehta, Nihal |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.13780 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift
by: Mehta, Sushant
Published: (2025)
by: Mehta, Sushant
Published: (2025)
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)
by: O'Neill, Charles
Published: (2025)
CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture
by: Uğraş, Berat Kutay, et al.
Published: (2025)
by: Uğraş, Berat Kutay, et al.
Published: (2025)
Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models
by: Chang, Heng-Sheng, et al.
Published: (2025)
by: Chang, Heng-Sheng, et al.
Published: (2025)
Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
by: Kiruluta, Andrew
Published: (2025)
by: Kiruluta, Andrew
Published: (2025)
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
by: Su, Zunhai, et al.
Published: (2026)
by: Su, Zunhai, et al.
Published: (2026)
Projection-Free Transformers via Gaussian Kernel Attention
by: Kundu, Debarshi, et al.
Published: (2026)
by: Kundu, Debarshi, et al.
Published: (2026)
On the Universality of Transformer Architectures; How Much Attention Is Enough?
by: Abbasi, Amirreza, et al.
Published: (2025)
by: Abbasi, Amirreza, et al.
Published: (2025)
Interpretable-by-Design Transformers via Architectural Stream Independence
by: Kerce, Clayton, et al.
Published: (2026)
by: Kerce, Clayton, et al.
Published: (2026)
Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
by: Aggarwal, Shubham, et al.
Published: (2026)
by: Aggarwal, Shubham, et al.
Published: (2026)
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
by: Evans, Ethan N., et al.
Published: (2024)
by: Evans, Ethan N., et al.
Published: (2024)
Self-Ablating Transformers: More Interpretability, Less Sparsity
by: Ferrao, Jeremias, et al.
Published: (2025)
by: Ferrao, Jeremias, et al.
Published: (2025)
Scaling Laws and In-Context Learning: A Unified Theoretical Framework
by: Mehta, Sushant, et al.
Published: (2025)
by: Mehta, Sushant, et al.
Published: (2025)
AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers
by: Zhu, Wenhao, et al.
Published: (2024)
by: Zhu, Wenhao, et al.
Published: (2024)
Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
by: El, Batu, et al.
Published: (2025)
by: El, Batu, et al.
Published: (2025)
Understanding Differential Transformer Unchains Pretrained Self-Attentions
by: Kong, Chaerin, et al.
Published: (2025)
by: Kong, Chaerin, et al.
Published: (2025)
TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction
by: Yue, Ling, et al.
Published: (2024)
by: Yue, Ling, et al.
Published: (2024)
Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition
by: Xu, Haoren, et al.
Published: (2026)
by: Xu, Haoren, et al.
Published: (2026)
Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)
by: Yan, Ruiqing, et al.
Published: (2024)
FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)
by: Qiao, Liang, et al.
Published: (2025)
Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)
by: Choi, Jeongwhan, et al.
Published: (2023)
Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)
by: Altafini, Claudio
Published: (2025)
Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs
by: Sharma, Raghav, et al.
Published: (2025)
by: Sharma, Raghav, et al.
Published: (2025)
The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
by: Kerce, J. Clayton, et al.
Published: (2026)
by: Kerce, J. Clayton, et al.
Published: (2026)
Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL)
by: Adde, Nihal Acharya, et al.
Published: (2025)
by: Adde, Nihal Acharya, et al.
Published: (2025)
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
by: Helbling, Alec, et al.
Published: (2025)
by: Helbling, Alec, et al.
Published: (2025)
Triple Attention Transformer Architecture for Time-Dependent Concrete Creep Prediction
by: Dokduea, Warayut, et al.
Published: (2025)
by: Dokduea, Warayut, et al.
Published: (2025)
Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting
by: Aguilera-Martos, Ignacio, et al.
Published: (2024)
by: Aguilera-Martos, Ignacio, et al.
Published: (2024)
DARTS-GT: Differentiable Architecture Search for Graph Transformers with Quantifiable Instance-Specific Interpretability Analysis
by: Chakraborty, Shruti Sarika, et al.
Published: (2025)
by: Chakraborty, Shruti Sarika, et al.
Published: (2025)
Quantum Adaptive Self-Attention for Quantum Transformer Models
by: Chen, Chi-Sheng, et al.
Published: (2025)
by: Chen, Chi-Sheng, et al.
Published: (2025)
Cross-Attention with Confidence Weighting for Multi-Channel Audio Alignment
by: Nihal, Ragib Amin, et al.
Published: (2025)
by: Nihal, Ragib Amin, et al.
Published: (2025)
Self-Supervised Transformer Architecture for Change Detection in Radio Access Networks
by: Kozlov, Igor, et al.
Published: (2023)
by: Kozlov, Igor, et al.
Published: (2023)
CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
by: van Engelenhoven, Adjorn, et al.
Published: (2024)
by: van Engelenhoven, Adjorn, et al.
Published: (2024)
Interpretable Tensor Fusion
by: Varshneya, Saurabh, et al.
Published: (2024)
by: Varshneya, Saurabh, et al.
Published: (2024)
Transformers for Tabular Data: A Training Perspective of Self-Attention via Optimal Transport
by: Quadrio, Alessandro, et al.
Published: (2025)
by: Quadrio, Alessandro, et al.
Published: (2025)
Chessformer: A Unified Architecture for Chess Modeling
by: Monroe, Daniel, et al.
Published: (2026)
by: Monroe, Daniel, et al.
Published: (2026)
CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)
by: Pati, Viresh, et al.
Published: (2026)
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)
by: Neo, Clement, et al.
Published: (2024)
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)
by: Guo, Zhenyu, et al.
Published: (2025)
Probing Information Distribution in Transformer Architectures through Entropy Analysis
by: Buonanno, Amedeo, et al.
Published: (2025)
by: Buonanno, Amedeo, et al.
Published: (2025)
Similar Items
-
When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift
by: Mehta, Sushant
Published: (2025) -
Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025) -
CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture
by: Uğraş, Berat Kutay, et al.
Published: (2025) -
Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models
by: Chang, Heng-Sheng, et al.
Published: (2025) -
Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
by: Kiruluta, Andrew
Published: (2025)