Saved in:
| Main Author: | Forchheimer, Robert |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.22852 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors
by: Atad, Ido Andrew, et al.
Published: (2026)
by: Atad, Ido Andrew, et al.
Published: (2026)
The Hidden Attention of Mamba Models
by: Ali, Ameen, et al.
Published: (2024)
by: Ali, Ameen, et al.
Published: (2024)
Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025)
by: Kashyap, Ankit
Published: (2025)
RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making
by: Yasuno, Takato
Published: (2026)
by: Yasuno, Takato
Published: (2026)
Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)
by: Schesch, Benedikt, et al.
Published: (2026)
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
by: Bakish, Yarden, et al.
Published: (2025)
by: Bakish, Yarden, et al.
Published: (2025)
Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales
by: Salfati, Samuel
Published: (2026)
by: Salfati, Samuel
Published: (2026)
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax
by: Nauen, Tobias Christian, et al.
Published: (2024)
by: Nauen, Tobias Christian, et al.
Published: (2024)
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
by: Zhou, Qirui, et al.
Published: (2025)
by: Zhou, Qirui, et al.
Published: (2025)
Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models
by: Moghadasi, Mahdi Naser, et al.
Published: (2026)
by: Moghadasi, Mahdi Naser, et al.
Published: (2026)
Relating Misfit to Gain in Weak-to-Strong Generalization Beyond the Squared Loss
by: Mulgund, Abhijeet, et al.
Published: (2025)
by: Mulgund, Abhijeet, et al.
Published: (2025)
OCRR: A Benchmark for Online Correction Recovery under Distribution Shift
by: Grassi, Adrian
Published: (2026)
by: Grassi, Adrian
Published: (2026)
Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing
by: Sun, Yuhui, et al.
Published: (2025)
by: Sun, Yuhui, et al.
Published: (2025)
Combining Language and Topic Models for Hierarchical Text Classification
by: Toit, Jaco du, et al.
Published: (2025)
by: Toit, Jaco du, et al.
Published: (2025)
Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text
by: Oketunji, Abiodun Finbarrs
Published: (2023)
by: Oketunji, Abiodun Finbarrs
Published: (2023)
Chain and Causal Attention for Efficient Entity Tracking
by: Fagnou, Erwan, et al.
Published: (2024)
by: Fagnou, Erwan, et al.
Published: (2024)
Introducing Three New Benchmark Datasets for Hierarchical Text Classification
by: Toit, Jaco du, et al.
Published: (2024)
by: Toit, Jaco du, et al.
Published: (2024)
Random Heterogeneous Neurochaos Learning Architecture for Data Classification
by: S, Remya Ajai A, et al.
Published: (2024)
by: S, Remya Ajai A, et al.
Published: (2024)
Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing
by: Fagnou, Erwan, et al.
Published: (2026)
by: Fagnou, Erwan, et al.
Published: (2026)
Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
by: Zhao, Hangyue, et al.
Published: (2026)
by: Zhao, Hangyue, et al.
Published: (2026)
Attention Drift: What Autoregressive Speculative Decoding Models Learn
by: Eldenk, Doğaç, et al.
Published: (2026)
by: Eldenk, Doğaç, et al.
Published: (2026)
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
by: Gao, Heyang, et al.
Published: (2025)
by: Gao, Heyang, et al.
Published: (2025)
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
by: Szilvasy, Gergely, et al.
Published: (2026)
by: Szilvasy, Gergely, et al.
Published: (2026)
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
by: Bianchessi, Arthur S., et al.
Published: (2025)
by: Bianchessi, Arthur S., et al.
Published: (2025)
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
by: Zimerman, Itamar, et al.
Published: (2024)
by: Zimerman, Itamar, et al.
Published: (2024)
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
by: Sanovar, Rya, et al.
Published: (2024)
by: Sanovar, Rya, et al.
Published: (2024)
Interpreto: An Explainability Library for Transformers
by: Poché, Antonin, et al.
Published: (2025)
by: Poché, Antonin, et al.
Published: (2025)
Continuous-Depth Transformers with Learned Control Dynamics
by: Jemley, Peter
Published: (2026)
by: Jemley, Peter
Published: (2026)
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
by: Staab, Robin, et al.
Published: (2023)
by: Staab, Robin, et al.
Published: (2023)
Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference
by: Dalal, Siddhartha, et al.
Published: (2024)
by: Dalal, Siddhartha, et al.
Published: (2024)
Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time
by: Zhao, Mingkuan, et al.
Published: (2026)
by: Zhao, Mingkuan, et al.
Published: (2026)
Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic
by: Zhao, Xingyu, et al.
Published: (2026)
by: Zhao, Xingyu, et al.
Published: (2026)
Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
by: Avinash, Mynampati Sri Ranganadha
Published: (2026)
by: Avinash, Mynampati Sri Ranganadha
Published: (2026)
Thread Detection and Response Generation using Transformers with Prompt Optimisation
by: T, Kevin Joshua, et al.
Published: (2024)
by: T, Kevin Joshua, et al.
Published: (2024)
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
by: Hanna, Michael, et al.
Published: (2024)
by: Hanna, Michael, et al.
Published: (2024)
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
by: Zhan, Zhihao, et al.
Published: (2025)
by: Zhan, Zhihao, et al.
Published: (2025)
Discovering Transformer Circuits via a Hybrid Attribution and Pruning Framework
by: Gu, Hao, et al.
Published: (2025)
by: Gu, Hao, et al.
Published: (2025)
Forget Attention: Importance-Aware Attention Is All You Need
by: Shin, Soohyeong, et al.
Published: (2026)
by: Shin, Soohyeong, et al.
Published: (2026)
Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
by: Fadli, Samih
Published: (2025)
by: Fadli, Samih
Published: (2025)
CircuitProbe: Predicting Reasoning Circuits in Transformers via Stability Zone Detection
by: Panuganti, Rajkiran
Published: (2026)
by: Panuganti, Rajkiran
Published: (2026)
Similar Items
-
TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors
by: Atad, Ido Andrew, et al.
Published: (2026) -
The Hidden Attention of Mamba Models
by: Ali, Ameen, et al.
Published: (2024) -
Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025) -
RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making
by: Yasuno, Takato
Published: (2026) -
Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)