Saved in:
| Main Authors: | Li, Yifan, Li, Xin, Li, Tianqin, He, Wenbin, Kong, Yu, Ren, Liu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.03433 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ViT-AdaLA: Adapting Vision Transformers with Linear Attention
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion
by: Dong, Caixia, et al.
Published: (2025)
by: Dong, Caixia, et al.
Published: (2025)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
Communication Efficient Split Learning of ViTs with Attention-based Double Compression
by: Alvetreti, Federico, et al.
Published: (2025)
by: Alvetreti, Federico, et al.
Published: (2025)
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
Learning More by Seeing Less: Structure First Learning for Efficient, Transferable, and Human-Aligned Vision
by: Li, Tianqin, et al.
Published: (2025)
by: Li, Tianqin, et al.
Published: (2025)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)
by: Zhang, Tianfang, et al.
Published: (2024)
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
by: Hu, Youbing, et al.
Published: (2024)
by: Hu, Youbing, et al.
Published: (2024)
VAT: Vision Action Transformer by Unlocking Full Representation of ViT
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
by: Yao, Ting, et al.
Published: (2024)
by: Yao, Ting, et al.
Published: (2024)
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
by: Li, Shuaiting, et al.
Published: (2025)
by: Li, Shuaiting, et al.
Published: (2025)
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
by: Xu, Xuwei, et al.
Published: (2023)
by: Xu, Xuwei, et al.
Published: (2023)
HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)
by: Haberer, Janek, et al.
Published: (2024)
PEANO-ViT: Power-Efficient Approximations of Non-Linearities in Vision Transformers
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
by: Chen, Weiming, et al.
Published: (2026)
by: Chen, Weiming, et al.
Published: (2026)
ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
by: Dong, Le, et al.
Published: (2024)
by: Dong, Le, et al.
Published: (2024)
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)
by: Jiang, Yanfeng, et al.
Published: (2024)
MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer
by: Tai, Yu-Shan, et al.
Published: (2024)
by: Tai, Yu-Shan, et al.
Published: (2024)
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
Unleashing Foundation Vision Models: Adaptive Transfer for Diverse Data-Limited Scientific Domains
by: Li, Qiankun, et al.
Published: (2025)
by: Li, Qiankun, et al.
Published: (2025)
SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit
by: Hu, Youbing, et al.
Published: (2025)
by: Hu, Youbing, et al.
Published: (2025)
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
by: Yuan, Zhengqing, et al.
Published: (2024)
by: Yuan, Zhengqing, et al.
Published: (2024)
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
by: Pan, Chenbin, et al.
Published: (2025)
by: Pan, Chenbin, et al.
Published: (2025)
EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
by: Liu, Longfei, et al.
Published: (2026)
by: Liu, Longfei, et al.
Published: (2026)
Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models
by: Rafaeli, Osher, et al.
Published: (2026)
by: Rafaeli, Osher, et al.
Published: (2026)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)
by: Ma, Xiaochen, et al.
Published: (2023)
S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
by: Li, Jinlong, et al.
Published: (2023)
by: Li, Jinlong, et al.
Published: (2023)
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)
by: Cao, Hanwen, et al.
Published: (2025)
SFMViT: SlowFast Meet ViT in Chaotic World
by: Lin, Jiaying, et al.
Published: (2024)
by: Lin, Jiaying, et al.
Published: (2024)
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)
by: Hong, Sungrae
Published: (2025)
ViT-Explainer: An Interactive Walkthrough of the Vision Transformer Pipeline
by: Hernandez, Juan Manuel, et al.
Published: (2026)
by: Hernandez, Juan Manuel, et al.
Published: (2026)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation
by: Chen, Renqi, et al.
Published: (2024)
by: Chen, Renqi, et al.
Published: (2024)
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
by: Xia, Chunlong, et al.
Published: (2024)
by: Xia, Chunlong, et al.
Published: (2024)
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
by: Li, Zhengang, et al.
Published: (2024)
by: Li, Zhengang, et al.
Published: (2024)
Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)
by: Lao, Dong, et al.
Published: (2023)
ViT-FIQA: Assessing Face Image Quality using Vision Transformers
by: Atzori, Andrea, et al.
Published: (2025)
by: Atzori, Andrea, et al.
Published: (2025)
Dynamic Granularity Matters: Rethinking Vision Transformers Beyond Fixed Patch Splitting
by: Yu, Qiyang, et al.
Published: (2025)
by: Yu, Qiyang, et al.
Published: (2025)
Similar Items
-
ViT-AdaLA: Adapting Vision Transformers with Linear Attention
by: Li, Yifan, et al.
Published: (2026) -
Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion
by: Dong, Caixia, et al.
Published: (2025) -
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025) -
Communication Efficient Split Learning of ViTs with Attention-based Double Compression
by: Alvetreti, Federico, et al.
Published: (2025) -
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)