:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yixing, Nag, Shivank, Li, Dong, Tian, Lu, Barsoum, Emad
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.01039
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
by: Li, Jinze, et al.
Published: (2025)

Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)

SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
by: Liao, Huanxuan, et al.
Published: (2025)

Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
by: Lee, Nakyung, et al.
Published: (2025)

Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference
by: Li, Zeping, et al.
Published: (2024)

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
by: An, Zihao, et al.
Published: (2026)

Partial Convolution Meets Visual Attention
by: Huang, Haiduo, et al.
Published: (2025)

AdaptEvolve: Improving Efficiency of Evolutionary AI Agents through Adaptive Model Selection
by: Ray, Pretam, et al.
Published: (2026)

Sliding Window Attention Training for Efficient Large Language Models
by: Fu, Zichuan, et al.
Published: (2025)

TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)

Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
by: Wang, Shuai, et al.
Published: (2025)

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
by: Bae, Jeongin, et al.
Published: (2026)

CLAWS:Creativity detection for LLM-generated solutions using Attention Window of Sections
by: Kim, Keuntae, et al.
Published: (2025)

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models
by: Lee, Hanna, et al.
Published: (2026)

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models
by: Ke, Wenjin, et al.
Published: (2025)

SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing
by: Yu, Yijiong, et al.
Published: (2025)

Efficient Context Scaling with LongCat ZigZag Attention
by: Zhang, Chen, et al.
Published: (2025)

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
by: Goldstein, Daniel, et al.
Published: (2025)

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)

Scaling Reasoning without Attention
by: Zhao, Xueliang, et al.
Published: (2025)

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
by: Wang, Jianghui, et al.
Published: (2025)

Attention-Aligned Reasoning for Large Language Models
by: Zhang, Hongxiang, et al.
Published: (2025)

Self-Selected Attention Span for Accelerating Large Language Model Inference
by: Jin, Tian, et al.
Published: (2024)

Reflection-Window Decoding: Text Generation with Selective Refinement
by: Tang, Zeyu, et al.
Published: (2025)

Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
by: An, Wenbin, et al.
Published: (2024)

When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
by: Zhu, Shiyi, et al.
Published: (2023)

ReAttention: Training-Free Infinite Context with Finite Attention Scope
by: Liu, Xiaoran, et al.
Published: (2024)

Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023)

Block-Attention for Efficient Prefilling
by: Ma, Dongyang, et al.
Published: (2024)

Multi-granularity Interactive Attention Framework for Residual Hierarchical Pronunciation Assessment
by: Han, Hong, et al.
Published: (2026)

Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion
by: Cheng, Zhen, et al.
Published: (2026)

LoRA-Mini : Adaptation Matrices Decomposition and Selective Training
by: Singh, Ayush, et al.
Published: (2024)

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
by: Xiao, Yang, et al.
Published: (2025)

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models
by: Chen, Hao, et al.
Published: (2025)

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)

Stacked from One: Multi-Scale Self-Injection for Context Window Extension
by: Han, Wei, et al.
Published: (2026)

LongHeads: Multi-Head Attention is Secretly a Long Context Processor
by: Lu, Yi, et al.
Published: (2024)

Dual LoRA: Enhancing LoRA with Magnitude and Direction Updates
by: Xu, Yixing, et al.
Published: (2025)

LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability
by: Xu, Xuan, et al.
Published: (2025)