:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fuller, Anthony, Yassin, Yousef, Kyrollos, Daniel G., Shelhamer, Evan, Green, James R.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.15021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
by: Fuller, Anthony, et al.
Published: (2024)

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
by: Fuller, Anthony, et al.
Published: (2025)

LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute
by: Salamatian, Ali, et al.
Published: (2026)

Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
by: Tseng, Gabriel, et al.
Published: (2025)

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
by: Lowe, Scott C., et al.
Published: (2026)

LookSharp: Attention Entropy Minimization for Test-Time Adaptation
by: Mali, Yash, et al.
Published: (2025)

A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-time Adaptation
by: Li, Zefeng, et al.
Published: (2026)

Octic Vision Transformers: Quicker ViTs Through Equivariance
by: Nordström, David, et al.
Published: (2025)

No One Knows the State of the Art in Geospatial Foundation Models
by: Corley, Isaac, et al.
Published: (2026)

Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)

ProtoTTA: Prototype-Guided Test-Time Adaptation
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2026)

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
by: Norouzi, Narges, et al.
Published: (2024)

ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains
by: Vray, Guillaume, et al.
Published: (2025)

Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
by: Zeng, Fanhu, et al.
Published: (2025)

Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers
by: Gee, Leonidas, et al.
Published: (2024)

Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens
by: Lew, Jaihyun, et al.
Published: (2024)

Wavelet-Based Image Tokenizer for Vision Transformers
by: Zhu, Zhenhai, et al.
Published: (2024)

Towards Real-Time Inference of Thin Liquid Film Thickness Profiles from Interference Patterns Using Vision Transformers
by: Viruthagiri, Gautam A., et al.
Published: (2025)

ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)

VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
by: Tang, Xikai, et al.
Published: (2025)

PPT: Token Pruning and Pooling for Efficient Vision Transformers
by: Wu, Xinjian, et al.
Published: (2023)

Self-Soupervision: Cooking Model Soups without Labels
by: Fuller, Anthony, et al.
Published: (2026)

Rényi Entropy: A New Token Pruning Metric for Vision Transformers
by: Su, Wei-Yuan, et al.
Published: (2026)

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
by: Wu, Junyi, et al.
Published: (2024)

Token-Space Mask Prediction for Efficient Vision Transformer Segmentation
by: Galagain, Calvin, et al.
Published: (2026)

ToSA: Token Selective Attention for Efficient Vision Transformers
by: Singh, Manish Kumar, et al.
Published: (2024)

Context-Aware Token Selection and Packing for Enhanced Vision Transformer
by: Zhang, Tianyi, et al.
Published: (2024)

WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
by: Zhu, Lianghui, et al.
Published: (2023)

PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders
by: Cavagnero, Niccolò, et al.
Published: (2026)

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
by: Nguyen, Duy-Kien, et al.
Published: (2023)

Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
by: Zhan, Wengyi, et al.
Published: (2025)

SPoT: Subpixel Placement of Tokens in Vision Transformers
by: Hjelkrem-Tan, Martine, et al.
Published: (2025)

Lossless Token Merging Even Without Fine-Tuning in Vision Transformers
by: Lee, Jaeyeon, et al.
Published: (2025)

S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens
by: Cai, Rizhao, et al.
Published: (2023)

MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers
by: Bae, Jongseong, et al.
Published: (2024)

Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
by: Peng, Shuai, et al.
Published: (2024)

DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers
by: He, Kaixuan, et al.
Published: (2026)

Neighbor-Aware Token Reduction via Hilbert Curve for Vision Transformers
by: Li, Yunge, et al.
Published: (2025)

Speed-up of Vision Transformer Models by Attention-aware Token Filtering
by: Naruko, Takahiro, et al.
Published: (2025)