Saved in:
| Main Authors: | Wang, Cheng, Zhou, Shuisheng, Peng, Fengjiao, Sheng, Jin, Ye, Feng, Dong, Yinli |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.08883 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MLG-Stereo: ViT Based Stereo Matching with Multi-Stage Local-Global Enhancement
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
by: Ramachandran, Akshat, et al.
Published: (2024)
by: Ramachandran, Akshat, et al.
Published: (2024)
SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit
by: Hu, Youbing, et al.
Published: (2025)
by: Hu, Youbing, et al.
Published: (2025)
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
by: Yuan, Zhengqing, et al.
Published: (2024)
by: Yuan, Zhengqing, et al.
Published: (2024)
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)
by: Hong, Sungrae
Published: (2025)
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
by: Peng, Haosong, et al.
Published: (2024)
by: Peng, Haosong, et al.
Published: (2024)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
by: Dey, Sainath, et al.
Published: (2025)
by: Dey, Sainath, et al.
Published: (2025)
DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity
by: Jin, Yeying, et al.
Published: (2022)
by: Jin, Yeying, et al.
Published: (2022)
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)
by: Kim, Ye-eun, et al.
Published: (2025)
One-Shot Multilingual Font Generation Via ViT
by: Wang, Zhiheng, et al.
Published: (2024)
by: Wang, Zhiheng, et al.
Published: (2024)
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
by: Zhang, Dingyuan, et al.
Published: (2024)
by: Zhang, Dingyuan, et al.
Published: (2024)
STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)
by: Chattopadhyay, Nandish, et al.
Published: (2026)
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay
by: Tong, Jin, et al.
Published: (2026)
by: Tong, Jin, et al.
Published: (2026)
UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block
by: Jing, Luoxi, et al.
Published: (2025)
by: Jing, Luoxi, et al.
Published: (2025)
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)
by: Jebraeeli, Vahid, et al.
Published: (2025)
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)
by: Cao, Hanwen, et al.
Published: (2025)
Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)
by: Seong, Jihyeon, et al.
Published: (2025)
YOLO-Former: YOLO Shakes Hand With ViT
by: Khoramdel, Javad, et al.
Published: (2024)
by: Khoramdel, Javad, et al.
Published: (2024)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
by: Li, Ye, et al.
Published: (2024)
by: Li, Ye, et al.
Published: (2024)
Case-Enhanced Vision Transformer: Improving Explanations of Image Similarity with a ViT-based Similarity Metric
by: Zhao, Ziwei, et al.
Published: (2024)
by: Zhao, Ziwei, et al.
Published: (2024)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)
by: Ma, Xiaochen, et al.
Published: (2023)
Filtered-ViT: A Robust Defense Against Multiple Adversarial Patch Attacks
by: Khanal, Aja, et al.
Published: (2025)
by: Khanal, Aja, et al.
Published: (2025)
UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register
by: Qiu, Congpei, et al.
Published: (2026)
by: Qiu, Congpei, et al.
Published: (2026)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
by: Zhao, Wangbo, et al.
Published: (2024)
by: Zhao, Wangbo, et al.
Published: (2024)
DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
by: Feng, Yu, et al.
Published: (2024)
by: Feng, Yu, et al.
Published: (2024)
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)
by: Kang, Ben, et al.
Published: (2025)
ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
by: Dong, Le, et al.
Published: (2024)
by: Dong, Le, et al.
Published: (2024)
Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)
by: Shah, Arya, et al.
Published: (2025)
Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)
by: Lao, Dong, et al.
Published: (2023)
U-REPA: Aligning Diffusion U-Nets to ViTs
by: Tian, Yuchuan, et al.
Published: (2025)
by: Tian, Yuchuan, et al.
Published: (2025)
Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability
by: Liu, Jiani, et al.
Published: (2025)
by: Liu, Jiani, et al.
Published: (2025)
Vanilla ViT for Automotive Point Cloud Semantic Segmentation
by: Puy, Gilles, et al.
Published: (2026)
by: Puy, Gilles, et al.
Published: (2026)
Applying ViT in Generalized Few-shot Semantic Segmentation
by: Geng, Liyuan, et al.
Published: (2024)
by: Geng, Liyuan, et al.
Published: (2024)
Similar Items
-
MLG-Stereo: ViT Based Stereo Matching with Multi-Stage Local-Global Enhancement
by: Zhang, Haoyu, et al.
Published: (2026) -
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
by: Ramachandran, Akshat, et al.
Published: (2024) -
SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit
by: Hu, Youbing, et al.
Published: (2025) -
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026) -
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)