:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Mehta, Nihal
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.13780
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When Are Learning Biases Equivalent? A Unifying Framework for Fairness, Robustness, and Distribution Shift
by: Mehta, Sushant
Published: (2025)

Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures
by: O'Neill, Charles
Published: (2025)

CardioPatternFormer: Pattern-Guided Attention for Interpretable ECG Classification with Transformer Architecture
by: Uğraş, Berat Kutay, et al.
Published: (2025)

Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models
by: Chang, Heng-Sheng, et al.
Published: (2025)

Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
by: Kiruluta, Andrew
Published: (2025)

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
by: Su, Zunhai, et al.
Published: (2026)

Projection-Free Transformers via Gaussian Kernel Attention
by: Kundu, Debarshi, et al.
Published: (2026)

On the Universality of Transformer Architectures; How Much Attention Is Enough?
by: Abbasi, Amirreza, et al.
Published: (2025)

Interpretable-by-Design Transformers via Architectural Stream Independence
by: Kerce, Clayton, et al.
Published: (2026)

Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
by: Aggarwal, Shubham, et al.
Published: (2026)

Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention
by: Evans, Ethan N., et al.
Published: (2024)

Self-Ablating Transformers: More Interpretability, Less Sparsity
by: Ferrao, Jeremias, et al.
Published: (2025)

Scaling Laws and In-Context Learning: A Unified Theoretical Framework
by: Mehta, Sushant, et al.
Published: (2025)

AnchorGT: Efficient and Flexible Attention Architecture for Scalable Graph Transformers
by: Zhu, Wenhao, et al.
Published: (2024)

Towards Mechanistic Interpretability of Graph Transformers via Attention Graphs
by: El, Batu, et al.
Published: (2025)

Understanding Differential Transformer Unchains Pretrained Self-Attentions
by: Kong, Chaerin, et al.
Published: (2025)

TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction
by: Yue, Ling, et al.
Published: (2024)

Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition
by: Xu, Haoren, et al.
Published: (2026)

Unveiling and Controlling Anomalous Attention Distribution in Transformers
by: Yan, Ruiqing, et al.
Published: (2024)

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)

Graph Convolutions Enrich the Self-Attention in Transformers!
by: Choi, Jeongwhan, et al.
Published: (2023)

Multistability of Self-Attention Dynamics in Transformers
by: Altafini, Claudio
Published: (2025)

Small Language Models for Agentic Systems: A Survey of Architectures, Capabilities, and Deployment Trade offs
by: Sharma, Raghav, et al.
Published: (2025)

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
by: Kerce, J. Clayton, et al.
Published: (2026)

Robust Evolutionary Multi-Objective Network Architecture Search for Reinforcement Learning (EMNAS-RL)
by: Adde, Nihal Acharya, et al.
Published: (2025)

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
by: Helbling, Alec, et al.
Published: (2025)

Triple Attention Transformer Architecture for Time-Dependent Concrete Creep Prediction
by: Dokduea, Warayut, et al.
Published: (2025)

Local Attention Mechanism: Boosting the Transformer Architecture for Long-Sequence Time Series Forecasting
by: Aguilera-Martos, Ignacio, et al.
Published: (2024)

DARTS-GT: Differentiable Architecture Search for Graph Transformers with Quantifiable Instance-Specific Interpretability Analysis
by: Chakraborty, Shruti Sarika, et al.
Published: (2025)

Quantum Adaptive Self-Attention for Quantum Transformer Models
by: Chen, Chi-Sheng, et al.
Published: (2025)

Cross-Attention with Confidence Weighting for Multi-Channel Audio Alignment
by: Nihal, Ragib Amin, et al.
Published: (2025)

Self-Supervised Transformer Architecture for Change Detection in Radio Access Networks
by: Kozlov, Igor, et al.
Published: (2023)

CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers
by: van Engelenhoven, Adjorn, et al.
Published: (2024)

Interpretable Tensor Fusion
by: Varshneya, Saurabh, et al.
Published: (2024)

Transformers for Tabular Data: A Training Perspective of Self-Attention via Optimal Transport
by: Quadrio, Alessandro, et al.
Published: (2025)

Chessformer: A Unified Architecture for Chess Modeling
by: Monroe, Daniel, et al.
Published: (2026)

CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
by: Neo, Clement, et al.
Published: (2024)

Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)

Probing Information Distribution in Transformer Architectures through Entropy Analysis
by: Buonanno, Amedeo, et al.
Published: (2025)