Saved in:
| Main Authors: | Gelada, Carles, Buckman, Jacob, Zhang, Sean, Bach, Txus |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.04239 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Conformal Transformations for Symmetric Power Transformers
by: Kumar, Saurabh, et al.
Published: (2025)
by: Kumar, Saurabh, et al.
Published: (2025)
Which Attention Heads Matter for In-Context Learning?
by: Yin, Kayo, et al.
Published: (2025)
by: Yin, Kayo, et al.
Published: (2025)
Rethinking Early Stopping: Refine, Then Calibrate
by: Berta, Eugène, et al.
Published: (2025)
by: Berta, Eugène, et al.
Published: (2025)
Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning
by: Bouadi, Mohamed, et al.
Published: (2025)
by: Bouadi, Mohamed, et al.
Published: (2025)
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
by: Al-Tahan, Haider, et al.
Published: (2024)
by: Al-Tahan, Haider, et al.
Published: (2024)
Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
by: Xia, Fanzeng, et al.
Published: (2025)
by: Xia, Fanzeng, et al.
Published: (2025)
SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
by: Lee, Changhun, et al.
Published: (2025)
by: Lee, Changhun, et al.
Published: (2025)
Achieving Time Series Reasoning Requires Rethinking Model Design, Tasks Formulation, and Evaluation
by: Kong, Yaxuan, et al.
Published: (2025)
by: Kong, Yaxuan, et al.
Published: (2025)
Stem: Rethinking Causal Information Flow in Sparse Attention
by: Niu, Lin, et al.
Published: (2026)
by: Niu, Lin, et al.
Published: (2026)
Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks
by: Malacarne, Sara, et al.
Published: (2026)
by: Malacarne, Sara, et al.
Published: (2026)
Continued AI Scaling Requires Repeated Efficiency Doublings
by: Lu, Chien-Ping
Published: (2026)
by: Lu, Chien-Ping
Published: (2026)
Out of Context: Reliability in Multimodal Anomaly Detection Requires Contextual Inference
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning
by: Qiu, Chenghao, et al.
Published: (2026)
by: Qiu, Chenghao, et al.
Published: (2026)
Structured Matrix Scaling for Multi-Class Calibration
by: Berta, Eugène, et al.
Published: (2025)
by: Berta, Eugène, et al.
Published: (2025)
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
by: Feng, Yunzhen, et al.
Published: (2024)
by: Feng, Yunzhen, et al.
Published: (2024)
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
by: Bu, Tao, et al.
Published: (2025)
by: Bu, Tao, et al.
Published: (2025)
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
by: Yu, Guoqi, et al.
Published: (2026)
by: Yu, Guoqi, et al.
Published: (2026)
Rethinking Zero-Shot Time Series Classification: From Task-specific Classifiers to In-Context Inference
by: Fang, Juntao, et al.
Published: (2026)
by: Fang, Juntao, et al.
Published: (2026)
MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
by: Zhou, Ruijie, et al.
Published: (2026)
by: Zhou, Ruijie, et al.
Published: (2026)
Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
by: Joshi, Thomas, et al.
Published: (2025)
by: Joshi, Thomas, et al.
Published: (2025)
Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees
by: Nobaub, Vashista
Published: (2026)
by: Nobaub, Vashista
Published: (2026)
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
by: Berta, Eugène, et al.
Published: (2026)
by: Berta, Eugène, et al.
Published: (2026)
Indirect Attention: Turning Context Misalignment into a Feature
by: Bahaduri, Bissmella, et al.
Published: (2025)
by: Bahaduri, Bissmella, et al.
Published: (2025)
Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)
by: Cui, Yingqian, et al.
Published: (2024)
Scaling Attention via Feature Sparsity
by: Xie, Yan, et al.
Published: (2026)
by: Xie, Yan, et al.
Published: (2026)
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
by: Chen, Hao Mark, et al.
Published: (2025)
by: Chen, Hao Mark, et al.
Published: (2025)
RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
by: Joshi, Sahil, et al.
Published: (2025)
by: Joshi, Sahil, et al.
Published: (2025)
Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling
by: Qiao, Ye, et al.
Published: (2025)
by: Qiao, Ye, et al.
Published: (2025)
Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs
by: Ni, Wentao, et al.
Published: (2026)
by: Ni, Wentao, et al.
Published: (2026)
Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models
by: Wen, Ziting, et al.
Published: (2024)
by: Wen, Ziting, et al.
Published: (2024)
Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning
by: Bouadi, Mohamed, et al.
Published: (2025)
by: Bouadi, Mohamed, et al.
Published: (2025)
Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
by: Delestre, Cyrile, et al.
Published: (2024)
by: Delestre, Cyrile, et al.
Published: (2024)
MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning
by: Liu, Dong, et al.
Published: (2026)
by: Liu, Dong, et al.
Published: (2026)
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
by: Zhou, Han, et al.
Published: (2023)
by: Zhou, Han, et al.
Published: (2023)
The PokeAgent Challenge: Competitive and Long-Context Learning at Scale
by: Karten, Seth, et al.
Published: (2026)
by: Karten, Seth, et al.
Published: (2026)
LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
Understanding Learning with Sliced-Wasserstein Requires Rethinking Informative Slices
by: Tran, Huy, et al.
Published: (2024)
by: Tran, Huy, et al.
Published: (2024)
A Hitchhiker's Guide to Scaling Law Estimation
by: Choshen, Leshem, et al.
Published: (2024)
by: Choshen, Leshem, et al.
Published: (2024)
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
by: Ross, Alexis, et al.
Published: (2024)
by: Ross, Alexis, et al.
Published: (2024)
How do Language Models Bind Entities in Context?
by: Feng, Jiahai, et al.
Published: (2023)
by: Feng, Jiahai, et al.
Published: (2023)
Similar Items
-
Conformal Transformations for Symmetric Power Transformers
by: Kumar, Saurabh, et al.
Published: (2025) -
Which Attention Heads Matter for In-Context Learning?
by: Yin, Kayo, et al.
Published: (2025) -
Rethinking Early Stopping: Refine, Then Calibrate
by: Berta, Eugène, et al.
Published: (2025) -
Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning
by: Bouadi, Mohamed, et al.
Published: (2025) -
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
by: Al-Tahan, Haider, et al.
Published: (2024)