:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gelada, Carles, Buckman, Jacob, Zhang, Sean, Bach, Txus
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.04239
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Conformal Transformations for Symmetric Power Transformers
by: Kumar, Saurabh, et al.
Published: (2025)

Which Attention Heads Matter for In-Context Learning?
by: Yin, Kayo, et al.
Published: (2025)

Rethinking Early Stopping: Refine, Then Calibrate
by: Berta, Eugène, et al.
Published: (2025)

Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning
by: Bouadi, Mohamed, et al.
Published: (2025)

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
by: Al-Tahan, Haider, et al.
Published: (2024)

Rethinking the Unsolvable: When In-Context Search Meets Test-Time Scaling
by: Xia, Fanzeng, et al.
Published: (2025)

SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
by: Lee, Changhun, et al.
Published: (2025)

Achieving Time Series Reasoning Requires Rethinking Model Design, Tasks Formulation, and Evaluation
by: Kong, Yaxuan, et al.
Published: (2025)

Stem: Rethinking Causal Information Flow in Sparse Attention
by: Niu, Lin, et al.
Published: (2026)

Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks
by: Malacarne, Sara, et al.
Published: (2026)

Continued AI Scaling Requires Repeated Efficiency Doublings
by: Lu, Chien-Ping
Published: (2026)

Out of Context: Reliability in Multimodal Anomaly Detection Requires Contextual Inference
by: Wilkinghoff, Kevin, et al.
Published: (2026)

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning
by: Qiu, Chenghao, et al.
Published: (2026)

Structured Matrix Scaling for Multi-Class Calibration
by: Berta, Eugène, et al.
Published: (2025)

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
by: Feng, Yunzhen, et al.
Published: (2024)

Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
by: Bu, Tao, et al.
Published: (2025)

Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
by: Yu, Guoqi, et al.
Published: (2026)

Rethinking Zero-Shot Time Series Classification: From Task-specific Classifiers to In-Context Inference
by: Fang, Juntao, et al.
Published: (2026)

MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
by: Zhou, Ruijie, et al.
Published: (2026)

Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference
by: Joshi, Thomas, et al.
Published: (2025)

Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees
by: Nobaub, Vashista
Published: (2026)

CalArena: A Large-Scale Post-Hoc Calibration Benchmark
by: Berta, Eugène, et al.
Published: (2026)

Indirect Attention: Turning Context Misalignment into a Feature
by: Bahaduri, Bissmella, et al.
Published: (2025)

Superiority of Multi-Head Attention in In-Context Linear Regression
by: Cui, Yingqian, et al.
Published: (2024)

Scaling Attention via Feature Sparsity
by: Xie, Yan, et al.
Published: (2026)

Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
by: Chen, Hao Mark, et al.
Published: (2025)

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
by: Joshi, Sahil, et al.
Published: (2025)

Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling
by: Qiao, Ye, et al.
Published: (2025)

Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs
by: Ni, Wentao, et al.
Published: (2026)

Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models
by: Wen, Ziting, et al.
Published: (2024)

Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning
by: Bouadi, Mohamed, et al.
Published: (2025)

Self-Attention Mechanism in Multimodal Context for Banking Transaction Flow
by: Delestre, Cyrile, et al.
Published: (2024)

MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning
by: Liu, Dong, et al.
Published: (2026)

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
by: Zhou, Han, et al.
Published: (2023)

The PokeAgent Challenge: Competitive and Long-Context Learning at Scale
by: Karten, Seth, et al.
Published: (2026)

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
by: Sun, Weigao, et al.
Published: (2025)

Understanding Learning with Sliced-Wasserstein Requires Rethinking Informative Slices
by: Tran, Huy, et al.
Published: (2024)

A Hitchhiker's Guide to Scaling Law Estimation
by: Choshen, Leshem, et al.
Published: (2024)

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
by: Ross, Alexis, et al.
Published: (2024)

How do Language Models Bind Entities in Context?
by: Feng, Jiahai, et al.
Published: (2023)