:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Dentamaro, Vincenzo
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2507.08637
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal
by: Hamdan, Mohammed, et al.
Published: (2026)

Star Attention: Efficient LLM Inference over Long Sequences
by: Acharya, Shantanu, et al.
Published: (2024)

LASP-2: Rethinking Sequence Parallelism for Linear Attention and Its Hybrid
by: Sun, Weigao, et al.
Published: (2025)

SEAL: Scaling to Emphasize Attention for Long-Context Retrieval
by: Lee, Changhun, et al.
Published: (2025)

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
by: Goldstein, Daniel, et al.
Published: (2025)

Higher-order Linear Attention
by: Zhang, Yifan, et al.
Published: (2025)

Training Tensor Attention Efficiently: From Cubic to Almost Linear Time
by: Cao, Yang, et al.
Published: (2024)

Learning Linear Attention in Polynomial Time
by: Yau, Morris, et al.
Published: (2024)

Native Hybrid Attention for Efficient Sequence Modeling
by: Du, Jusen, et al.
Published: (2025)

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
by: MiniCPM Team, et al.
Published: (2026)

RoPE Attention Can Be Trained in Almost Linear Time
by: Cao, Yang, et al.
Published: (2024)

Scaling Reasoning without Attention
by: Zhao, Xueliang, et al.
Published: (2025)

Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
by: Chen, Yingfa, et al.
Published: (2026)

Parallax: Parameterized Local Linear Attention for Language Modeling
by: Zuo, Yifei, et al.
Published: (2026)

Hallucination Detection in LLMs Using Spectral Features of Attention Maps
by: Binkowski, Jakub, et al.
Published: (2025)

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)

HSR-Enhanced Sparse Attention Acceleration
by: Chen, Bo, et al.
Published: (2024)

Untangling Component Imbalance in Hybrid Linear Attention Conversion Methods
by: Benfeghoul, Martin, et al.
Published: (2025)

Scaling Bidirectional Spans and Span Violations in Attention Mechanism
by: Kim, Jongwook, et al.
Published: (2025)

Enhancing Rare Codes via Probability-Biased Directed Graph Attention for Long-Tail ICD Coding
by: Chen, Tianlei, et al.
Published: (2025)

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
by: Liang, Yingyu, et al.
Published: (2024)

Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)

MoBA: Mixture of Block Attention for Long-Context LLMs
by: Lu, Enzhe, et al.
Published: (2025)

HiCI: Hierarchical Construction-Integration for Long-Context Attention
by: Zeng, Xiangyu, et al.
Published: (2026)

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
by: Tan, Shawn, et al.
Published: (2024)

Aligning Human and Machine Attention for Enhanced Supervised Learning
by: Chriqui, Avihay, et al.
Published: (2025)

Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning
by: Ling Team, et al.
Published: (2025)

How Sparse Attention Approximates Exact Attention? Your Attention is Naturally $n^C$-Sparse
by: Deng, Yichuan, et al.
Published: (2024)

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)

MUPAX: Multidimensional Problem Agnostic eXplainable AI
by: Dentamaro, Vincenzo, et al.
Published: (2025)

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
by: You, Haoran, et al.
Published: (2024)

Evaluating Very Long-Term Conversational Memory of LLM Agents
by: Maharana, Adyasha, et al.
Published: (2024)

Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)

Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
by: Sharma, Agniv, et al.
Published: (2024)

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
by: Zhu, Kan, et al.
Published: (2025)

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
by: Hsieh, Cheng-Yu, et al.
Published: (2024)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
by: Yuan, Jingyang, et al.
Published: (2025)

Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
by: Saxena, Utkarsh, et al.
Published: (2024)

Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves
by: Knupp, Jonas, et al.
Published: (2026)

Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps
by: Waldendorf, Jonas, et al.
Published: (2026)