:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Steifer, Tomasz
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.16640
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
by: Tumma, Neehal, et al.
Published: (2026)

Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection
by: Lee, Tae-Gyun, et al.
Published: (2026)

Simple online learning with consistent oracle
by: Kozachinskiy, Alexander, et al.
Published: (2023)

A completely uniform transformer for parity
by: Kozachinskiy, Alexander, et al.
Published: (2025)

Computable universal online learning
by: Kalociński, Dariusz, et al.
Published: (2025)

Parity, Sensitivity, and Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2026)

Ehrenfeucht-Haussler Rank and Chain of Thought
by: Barceló, Pablo, et al.
Published: (2025)

Optimal bounds for dissatisfaction in perpetual voting
by: Kozachinskiy, Alexander, et al.
Published: (2024)

Effective Littlestone Dimension
by: Rose, Valentino Delle, et al.
Published: (2024)

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
by: Hatamizadeh, Ali, et al.
Published: (2026)

Strassen Attention, Split VC Dimension and Compositionality in Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2025)

OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026)

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
by: Chen, Yongqiang, et al.
Published: (2026)

How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
by: Abbe, Emmanuel, et al.
Published: (2024)

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
by: Zheng, Lin, et al.
Published: (2026)

Provable Tempered Overfitting of Minimal Nets and Typical Nets
by: Harel, Itamar, et al.
Published: (2024)

Provably Learning Attention with Queries
by: Bhattamishra, Satwik, et al.
Published: (2026)

A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention
by: Ye, Xiaowei, et al.
Published: (2026)

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
by: Willette, Jeffrey, et al.
Published: (2025)

Provably tuning the ElasticNet across instances
by: Balcan, Maria-Florina, et al.
Published: (2022)

Provable Generalization in Overparameterized Neural Nets
by: Dhingra, Aviral
Published: (2025)

Task-Aware Calibration: Provably Optimal Decoding in LLMs
by: Tomov, Tim, et al.
Published: (2026)

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
by: Liu, Hongyi, et al.
Published: (2025)

Delta Attention Residuals
by: Luo, Cheng, et al.
Published: (2026)

Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)

KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
by: Lesens, Damien, et al.
Published: (2025)

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
by: Li, Yichen, et al.
Published: (2024)

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better
by: Min, Yizhou, et al.
Published: (2026)

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models
by: Cai, Changxiao, et al.
Published: (2026)

Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)

Provable Differentially Private Computation of the Cross-Attention Mechanism
by: Ke, Yekun, et al.
Published: (2024)

Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin
by: Cosler, Matthias, et al.
Published: (2026)

G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks
by: Aghasi, Alireza, et al.
Published: (2025)

Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
by: Ran-Milo, Yuval
Published: (2026)

C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
by: Kang, Yu, et al.
Published: (2024)

Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties
by: Arndt, Clemens, et al.
Published: (2024)

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
by: Wang, Zixuan, et al.
Published: (2024)

Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression
by: Tian, Ye, et al.
Published: (2026)

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data
by: Qu, Chengrui, et al.
Published: (2024)