Saved in:
| Main Author: | Steifer, Tomasz |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.16640 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
by: Tumma, Neehal, et al.
Published: (2026)
by: Tumma, Neehal, et al.
Published: (2026)
Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection
by: Lee, Tae-Gyun, et al.
Published: (2026)
by: Lee, Tae-Gyun, et al.
Published: (2026)
Simple online learning with consistent oracle
by: Kozachinskiy, Alexander, et al.
Published: (2023)
by: Kozachinskiy, Alexander, et al.
Published: (2023)
A completely uniform transformer for parity
by: Kozachinskiy, Alexander, et al.
Published: (2025)
by: Kozachinskiy, Alexander, et al.
Published: (2025)
Computable universal online learning
by: Kalociński, Dariusz, et al.
Published: (2025)
by: Kalociński, Dariusz, et al.
Published: (2025)
Parity, Sensitivity, and Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2026)
by: Kozachinskiy, Alexander, et al.
Published: (2026)
Ehrenfeucht-Haussler Rank and Chain of Thought
by: Barceló, Pablo, et al.
Published: (2025)
by: Barceló, Pablo, et al.
Published: (2025)
Optimal bounds for dissatisfaction in perpetual voting
by: Kozachinskiy, Alexander, et al.
Published: (2024)
by: Kozachinskiy, Alexander, et al.
Published: (2024)
Effective Littlestone Dimension
by: Rose, Valentino Delle, et al.
Published: (2024)
by: Rose, Valentino Delle, et al.
Published: (2024)
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
by: Hatamizadeh, Ali, et al.
Published: (2026)
by: Hatamizadeh, Ali, et al.
Published: (2026)
Strassen Attention, Split VC Dimension and Compositionality in Transformers
by: Kozachinskiy, Alexander, et al.
Published: (2025)
by: Kozachinskiy, Alexander, et al.
Published: (2025)
OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention
by: Zhou, Chenyu, et al.
Published: (2026)
by: Zhou, Chenyu, et al.
Published: (2026)
CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
by: Chen, Yongqiang, et al.
Published: (2026)
by: Chen, Yongqiang, et al.
Published: (2026)
How Far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad
by: Abbe, Emmanuel, et al.
Published: (2024)
by: Abbe, Emmanuel, et al.
Published: (2024)
Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models
by: Zheng, Lin, et al.
Published: (2026)
by: Zheng, Lin, et al.
Published: (2026)
Provable Tempered Overfitting of Minimal Nets and Typical Nets
by: Harel, Itamar, et al.
Published: (2024)
by: Harel, Itamar, et al.
Published: (2024)
Provably Learning Attention with Queries
by: Bhattamishra, Satwik, et al.
Published: (2026)
by: Bhattamishra, Satwik, et al.
Published: (2026)
A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention
by: Ye, Xiaowei, et al.
Published: (2026)
by: Ye, Xiaowei, et al.
Published: (2026)
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
by: Willette, Jeffrey, et al.
Published: (2025)
by: Willette, Jeffrey, et al.
Published: (2025)
Provably tuning the ElasticNet across instances
by: Balcan, Maria-Florina, et al.
Published: (2022)
by: Balcan, Maria-Florina, et al.
Published: (2022)
Provable Generalization in Overparameterized Neural Nets
by: Dhingra, Aviral
Published: (2025)
by: Dhingra, Aviral
Published: (2025)
Task-Aware Calibration: Provably Optimal Decoding in LLMs
by: Tomov, Tim, et al.
Published: (2026)
by: Tomov, Tim, et al.
Published: (2026)
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
by: Liu, Hongyi, et al.
Published: (2025)
by: Liu, Hongyi, et al.
Published: (2025)
Delta Attention Residuals
by: Luo, Cheng, et al.
Published: (2026)
by: Luo, Cheng, et al.
Published: (2026)
Attention with Trained Embeddings Provably Selects Important Tokens
by: Wu, Diyuan, et al.
Published: (2025)
by: Wu, Diyuan, et al.
Published: (2025)
KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity
by: Lesens, Damien, et al.
Published: (2025)
by: Lesens, Damien, et al.
Published: (2025)
Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
by: Li, Yichen, et al.
Published: (2024)
by: Li, Yichen, et al.
Published: (2024)
Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better
by: Min, Yizhou, et al.
Published: (2026)
by: Min, Yizhou, et al.
Published: (2026)
Confidence-Based Decoding is Provably Efficient for Diffusion Language Models
by: Cai, Changxiao, et al.
Published: (2026)
by: Cai, Changxiao, et al.
Published: (2026)
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
by: Hu, Jerry Yao-Chieh, et al.
Published: (2025)
Provable Differentially Private Computation of the Cross-Attention Mechanism
by: Ke, Yekun, et al.
Published: (2024)
by: Ke, Yekun, et al.
Published: (2024)
Less Effort, Shorter Proofs: Reinforcement Learning for Security Protocol Analysis in Tamarin
by: Cosler, Matthias, et al.
Published: (2026)
by: Cosler, Matthias, et al.
Published: (2026)
G-Net: A Provably Easy Construction of High-Accuracy Random Binary Neural Networks
by: Aghasi, Alireza, et al.
Published: (2025)
by: Aghasi, Alireza, et al.
Published: (2025)
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
by: Ran-Milo, Yuval
Published: (2026)
by: Ran-Milo, Yuval
Published: (2026)
C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness
by: Kang, Yu, et al.
Published: (2024)
by: Kang, Yu, et al.
Published: (2024)
Invertible ResNets for Inverse Imaging Problems: Competitive Performance with Provable Regularization Properties
by: Arndt, Clemens, et al.
Published: (2024)
by: Arndt, Clemens, et al.
Published: (2024)
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
by: Wang, Zixuan, et al.
Published: (2024)
by: Wang, Zixuan, et al.
Published: (2024)
Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression
by: Tian, Ye, et al.
Published: (2026)
by: Tian, Ye, et al.
Published: (2026)
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
by: Bounhar, Abdelaziz, et al.
Published: (2025)
by: Bounhar, Abdelaziz, et al.
Published: (2025)
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data
by: Qu, Chengrui, et al.
Published: (2024)
by: Qu, Chengrui, et al.
Published: (2024)
Similar Items
-
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
by: Tumma, Neehal, et al.
Published: (2026) -
Patched-DeltaNet: Token-Level Event-Driven Memory for Linear-Time Anomaly Detection
by: Lee, Tae-Gyun, et al.
Published: (2026) -
Simple online learning with consistent oracle
by: Kozachinskiy, Alexander, et al.
Published: (2023) -
A completely uniform transformer for parity
by: Kozachinskiy, Alexander, et al.
Published: (2025) -
Computable universal online learning
by: Kalociński, Dariusz, et al.
Published: (2025)