Saved in:
Bibliographic Details
Main Authors: Lee, Jaehyuk, Kim, Hanyoung, Kim, Yanggee, Lee, Donghun
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.22372
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911704610242560
author Lee, Jaehyuk
Kim, Hanyoung
Kim, Yanggee
Lee, Donghun
author_facet Lee, Jaehyuk
Kim, Hanyoung
Kim, Yanggee
Lee, Donghun
contents Vision Transformers (ViTs) face severe computational bottlenecks due to the quadratic complexity of self-attention at high resolutions. Existing token reduction methods rely on local metrics - such as single-layer attention scores - that are inherently vulnerable to the attention sink phenomenon, where uninformative tokens are paradoxically preserved over salient foreground objects. We propose ASAP (Attention Sink Anchored Pruning), a training-free framework that recasts this sink as a feature. Modeling ViT information flow as a Lazy Random Walk, ASAP identifies the sink as a dominant accumulator of probability mass. By computing the diffusion distance to the sink within the cumulative transition matrix, ASAP partitions tokens via Radial Diffusion Clustering and compresses background redundancy through Transition Weight Pooling in a single shot. Extensive experiments across image, video, and vision-language tasks demonstrate ASAP outperforms state-of-the-art methods, accelerating throughput by up to 48% while maintaining - or even exceeding - baseline accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2605_22372
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle ASAP: Attention Sink Anchored Pruning
Lee, Jaehyuk
Kim, Hanyoung
Kim, Yanggee
Lee, Donghun
Machine Learning
Vision Transformers (ViTs) face severe computational bottlenecks due to the quadratic complexity of self-attention at high resolutions. Existing token reduction methods rely on local metrics - such as single-layer attention scores - that are inherently vulnerable to the attention sink phenomenon, where uninformative tokens are paradoxically preserved over salient foreground objects. We propose ASAP (Attention Sink Anchored Pruning), a training-free framework that recasts this sink as a feature. Modeling ViT information flow as a Lazy Random Walk, ASAP identifies the sink as a dominant accumulator of probability mass. By computing the diffusion distance to the sink within the cumulative transition matrix, ASAP partitions tokens via Radial Diffusion Clustering and compresses background redundancy through Transition Weight Pooling in a single shot. Extensive experiments across image, video, and vision-language tasks demonstrate ASAP outperforms state-of-the-art methods, accelerating throughput by up to 48% while maintaining - or even exceeding - baseline accuracy.
title ASAP: Attention Sink Anchored Pruning
topic Machine Learning
url https://arxiv.org/abs/2605.22372