:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yifan, Li, Xin, Li, Tianqin, He, Wenbin, Kong, Yu, Ren, Liu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.03433
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ViT-AdaLA: Adapting Vision Transformers with Linear Attention
by: Li, Yifan, et al.
Published: (2026)

Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion
by: Dong, Caixia, et al.
Published: (2025)

ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)

Communication Efficient Split Learning of ViTs with Attention-based Double Compression
by: Alvetreti, Federico, et al.
Published: (2025)

ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)

EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)

Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision
by: Li, Tianqin, et al.
Published: (2025)

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
by: Hu, Youbing, et al.
Published: (2024)

VAT: Vision Action Transformer by Unlocking Full Representation of ViT
by: Li, Wenhao, et al.
Published: (2025)

HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
by: Yao, Ting, et al.
Published: (2024)

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
by: Li, Shuaiting, et al.
Published: (2025)

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
by: Xu, Xuwei, et al.
Published: (2023)

HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)

PEANO-ViT: Power-Efficient Approximations of Non-Linearities in Vision Transformers
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
by: Chen, Weiming, et al.
Published: (2026)

ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
by: Dong, Le, et al.
Published: (2024)

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)

MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
by: Tai, Yu-Shan, et al.
Published: (2024)

ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)

Unleashing Foundation Vision Models: Adaptive Transfer for Diverse Data-Limited Scientific Domains
by: Li, Qiankun, et al.
Published: (2025)

SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit
by: Hu, Youbing, et al.
Published: (2025)

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
by: Yuan, Zhengqing, et al.
Published: (2024)

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
by: Pan, Chenbin, et al.
Published: (2025)

EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
by: Liu, Longfei, et al.
Published: (2026)

Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models
by: Rafaeli, Osher, et al.
Published: (2026)

IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)

S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
by: Li, Jinlong, et al.
Published: (2023)

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)

SFMViT: SlowFast Meet ViT in Chaotic World
by: Lin, Jiaying, et al.
Published: (2024)

Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)

ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline
by: Hernandez, Juan Manuel, et al.
Published: (2026)

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)

Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)

DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation
by: Chen, Renqi, et al.
Published: (2024)

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
by: Xia, Chunlong, et al.
Published: (2024)

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by: Li, Zhengang, et al.
Published: (2024)

Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)

ViT-FIQA: Assessing Face Image Quality using Vision Transformers
by: Atzori, Andrea, et al.
Published: (2025)

Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting
by: Yu, Qiyang, et al.
Published: (2025)