Saved in:
| Main Authors: | Fuller, Anthony, Yassin, Yousef, Kyrollos, Daniel G., Shelhamer, Evan, Green, James R. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.15021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
by: Fuller, Anthony, et al.
Published: (2024)
by: Fuller, Anthony, et al.
Published: (2024)
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
by: Fuller, Anthony, et al.
Published: (2025)
by: Fuller, Anthony, et al.
Published: (2025)
LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute
by: Salamatian, Ali, et al.
Published: (2026)
by: Salamatian, Ali, et al.
Published: (2026)
Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
by: Tseng, Gabriel, et al.
Published: (2025)
by: Tseng, Gabriel, et al.
Published: (2025)
Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
by: Lowe, Scott C., et al.
Published: (2026)
by: Lowe, Scott C., et al.
Published: (2026)
LookSharp: Attention Entropy Minimization for Test-Time Adaptation
by: Mali, Yash, et al.
Published: (2025)
by: Mali, Yash, et al.
Published: (2025)
A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-time Adaptation
by: Li, Zefeng, et al.
Published: (2026)
by: Li, Zefeng, et al.
Published: (2026)
Octic Vision Transformers: Quicker ViTs Through Equivariance
by: Nordström, David, et al.
Published: (2025)
by: Nordström, David, et al.
Published: (2025)
No One Knows the State of the Art in Geospatial Foundation Models
by: Corley, Isaac, et al.
Published: (2026)
by: Corley, Isaac, et al.
Published: (2026)
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)
by: Saadi, Ibtissam, et al.
Published: (2024)
ProtoTTA: Prototype-Guided Test-Time Adaptation
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2026)
by: Abootorabi, Mohammad Mahdi, et al.
Published: (2026)
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
by: Norouzi, Narges, et al.
Published: (2024)
by: Norouzi, Narges, et al.
Published: (2024)
ReservoirTTA: Prolonged Test-time Adaptation for Evolving and Recurring Domains
by: Vray, Guillaume, et al.
Published: (2025)
by: Vray, Guillaume, et al.
Published: (2025)
Vision Transformer with Super Token Sampling
by: Huang, Huaibo, et al.
Published: (2022)
by: Huang, Huaibo, et al.
Published: (2022)
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
by: Zeng, Fanhu, et al.
Published: (2025)
by: Zeng, Fanhu, et al.
Published: (2025)
Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers
by: Gee, Leonidas, et al.
Published: (2024)
by: Gee, Leonidas, et al.
Published: (2024)
Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens
by: Lew, Jaihyun, et al.
Published: (2024)
by: Lew, Jaihyun, et al.
Published: (2024)
Wavelet-Based Image Tokenizer for Vision Transformers
by: Zhu, Zhenhai, et al.
Published: (2024)
by: Zhu, Zhenhai, et al.
Published: (2024)
Towards Real-Time Inference of Thin Liquid Film Thickness Profiles from Interference Patterns Using Vision Transformers
by: Viruthagiri, Gautam A., et al.
Published: (2025)
by: Viruthagiri, Gautam A., et al.
Published: (2025)
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)
by: Zhu, Duowang, et al.
Published: (2024)
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer
by: Tang, Xikai, et al.
Published: (2025)
by: Tang, Xikai, et al.
Published: (2025)
PPT: Token Pruning and Pooling for Efficient Vision Transformers
by: Wu, Xinjian, et al.
Published: (2023)
by: Wu, Xinjian, et al.
Published: (2023)
Self-Soupervision: Cooking Model Soups without Labels
by: Fuller, Anthony, et al.
Published: (2026)
by: Fuller, Anthony, et al.
Published: (2026)
Rényi Entropy: A New Token Pruning Metric for Vision Transformers
by: Su, Wei-Yuan, et al.
Published: (2026)
by: Su, Wei-Yuan, et al.
Published: (2026)
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
Token-Space Mask Prediction for Efficient Vision Transformer Segmentation
by: Galagain, Calvin, et al.
Published: (2026)
by: Galagain, Calvin, et al.
Published: (2026)
ToSA: Token Selective Attention for Efficient Vision Transformers
by: Singh, Manish Kumar, et al.
Published: (2024)
by: Singh, Manish Kumar, et al.
Published: (2024)
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
by: Zhang, Tianyi, et al.
Published: (2024)
by: Zhang, Tianyi, et al.
Published: (2024)
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
by: Zhu, Lianghui, et al.
Published: (2023)
by: Zhu, Lianghui, et al.
Published: (2023)
PMT: Plain Mask Transformer for Image and Video Segmentation with Frozen Vision Encoders
by: Cavagnero, Niccolò, et al.
Published: (2026)
by: Cavagnero, Niccolò, et al.
Published: (2026)
SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
by: Nguyen, Duy-Kien, et al.
Published: (2023)
by: Nguyen, Duy-Kien, et al.
Published: (2023)
Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference
by: Zhan, Wengyi, et al.
Published: (2025)
by: Zhan, Wengyi, et al.
Published: (2025)
SPoT: Subpixel Placement of Tokens in Vision Transformers
by: Hjelkrem-Tan, Martine, et al.
Published: (2025)
by: Hjelkrem-Tan, Martine, et al.
Published: (2025)
Lossless Token Merging Even Without Fine-Tuning in Vision Transformers
by: Lee, Jaeyeon, et al.
Published: (2025)
by: Lee, Jaeyeon, et al.
Published: (2025)
S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens
by: Cai, Rizhao, et al.
Published: (2023)
by: Cai, Rizhao, et al.
Published: (2023)
MVFormer: Diversifying Feature Normalization and Token Mixing for Efficient Vision Transformers
by: Bae, Jongseong, et al.
Published: (2024)
by: Bae, Jongseong, et al.
Published: (2024)
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
by: Peng, Shuai, et al.
Published: (2024)
by: Peng, Shuai, et al.
Published: (2024)
DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers
by: He, Kaixuan, et al.
Published: (2026)
by: He, Kaixuan, et al.
Published: (2026)
Neighbor-Aware Token Reduction via Hilbert Curve for Vision Transformers
by: Li, Yunge, et al.
Published: (2025)
by: Li, Yunge, et al.
Published: (2025)
Speed-up of Vision Transformer Models by Attention-aware Token Filtering
by: Naruko, Takahiro, et al.
Published: (2025)
by: Naruko, Takahiro, et al.
Published: (2025)
Similar Items
-
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
by: Fuller, Anthony, et al.
Published: (2024) -
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
by: Fuller, Anthony, et al.
Published: (2025) -
LookWhen? Fast Video Recognition by Learning When, Where, and What to Compute
by: Salamatian, Ali, et al.
Published: (2026) -
Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
by: Tseng, Gabriel, et al.
Published: (2025) -
Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
by: Lowe, Scott C., et al.
Published: (2026)