:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Forchheimer, Robert
Format:	Preprint
Published:	2026
Subjects:	Machine Learning I.2.7
Online Access:	https://arxiv.org/abs/2601.22852
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors
by: Atad, Ido Andrew, et al.
Published: (2026)

The Hidden Attention of Mamba Models
by: Ali, Ameen, et al.
Published: (2024)

Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025)

RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making
by: Yasuno, Takato
Published: (2026)

Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
by: Bakish, Yarden, et al.
Published: (2025)

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales
by: Salfati, Samuel
Published: (2026)

TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
by: Nauen, Tobias Christian, et al.
Published: (2024)

QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
by: Zhou, Qirui, et al.
Published: (2025)

Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models
by: Moghadasi, Mahdi Naser, et al.
Published: (2026)

Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
by: Mulgund, Abhijeet, et al.
Published: (2025)

OCRR: A Benchmark for Online Correction Recovery under Distribution Shift
by: Grassi, Adrian
Published: (2026)

Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing
by: Sun, Yuhui, et al.
Published: (2025)

Combining Language and Topic Models for Hierarchical Text Classification
by: Toit, Jaco du, et al.
Published: (2025)

Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)

Chain and Causal Attention for Efficient Entity Tracking
by: Fagnou, Erwan, et al.
Published: (2024)

Introducing Three New Benchmark Datasets for Hierarchical Text Classification
by: Toit, Jaco du, et al.
Published: (2024)

Random Heterogeneous Neurochaos Learning Architecture for Data Classification
by: S, Remya Ajai A, et al.
Published: (2024)

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing
by: Fagnou, Erwan, et al.
Published: (2026)

Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
by: Zhao, Hangyue, et al.
Published: (2026)

Attention Drift: What Autoregressive Speculative Decoding Models Learn
by: Eldenk, Doğaç, et al.
Published: (2026)

Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
by: Gao, Heyang, et al.
Published: (2025)

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
by: Szilvasy, Gergely, et al.
Published: (2026)

Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
by: Bianchessi, Arthur S., et al.
Published: (2025)

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
by: Zimerman, Itamar, et al.
Published: (2024)

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
by: Sanovar, Rya, et al.
Published: (2024)

Interpreto: An Explainability Library for Transformers
by: Poché, Antonin, et al.
Published: (2025)

Continuous-Depth Transformers with Learned Control Dynamics
by: Jemley, Peter
Published: (2026)

Beyond Memorization: Violating Privacy Via Inference with Large Language Models
by: Staab, Robin, et al.
Published: (2023)

Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
by: Dalal, Siddhartha, et al.
Published: (2024)

Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time
by: Zhao, Mingkuan, et al.
Published: (2026)

Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic
by: Zhao, Xingyu, et al.
Published: (2026)

Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
by: Avinash, Mynampati Sri Ranganadha
Published: (2026)

Thread Detection and Response Generation using Transformers with Prompt Optimisation
by: T, Kevin Joshua, et al.
Published: (2024)

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)

Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
by: Zhan, Zhihao, et al.
Published: (2025)

Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
by: Gu, Hao, et al.
Published: (2025)

Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026)

Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)

CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
by: Panuganti, Rajkiran
Published: (2026)