:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Yang, Cao, Jingyi, You, Yanan, Qiao, Yuanyuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.13638
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)

ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024)

UniViTAR: Unified Vision Transformer with Native Resolution
by: Qiao, Limeng, et al.
Published: (2025)

LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
by: Xia, Chunlong, et al.
Published: (2024)

Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
by: Yu, Xiaohang, et al.
Published: (2024)

ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)

DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
by: Anand, Tushar, et al.
Published: (2026)

ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)

SAR-NAS: Lightweight SAR Object Detection with Neural Architecture Search
by: Yu, Xinyi, et al.
Published: (2025)

EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)

Towards Unbiased Source-Free Object Detection via Vision Foundation Models
by: Cai, Zhi, et al.
Published: (2026)

FTerViT: Fully Ternary Vision Transformer
by: Ruciński, Szymon, et al.
Published: (2026)

Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers
by: Shu, Yuyang, et al.
Published: (2024)

RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
by: He, Renjie
Published: (2026)

Dense Vision Transformer Compression with Few Samples
by: Zhang, Hanxiao, et al.
Published: (2024)

ViTGaze: Gaze Following with Interaction Features in Vision Transformers
by: Song, Yuehao, et al.
Published: (2024)

TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection
by: Cheng, Lei, et al.
Published: (2025)

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
by: Ma, Yunsheng, et al.
Published: (2022)

GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
by: Deressa, Deressa Wodajo, et al.
Published: (2023)

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
by: You, Haoran, et al.
Published: (2022)

ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)

ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
by: Suri, Saksham, et al.
Published: (2024)

Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer
by: Zhao, Guibin, et al.
Published: (2024)

OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
by: Inkawhich, Matthew, et al.
Published: (2024)

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)

Vision Transformers: From Semantic Segmentation to Dense Prediction
by: Zhang, Li, et al.
Published: (2022)

ViT-AdaLA: Adapting Vision Transformers with Linear Attention
by: Li, Yifan, et al.
Published: (2026)

IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)

ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy
by: Chandra, Soumyadeep, et al.
Published: (2024)

ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference
by: Hojjat, Ali, et al.
Published: (2025)

FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking
by: You, Jinlin, et al.
Published: (2026)

FineViT: Progressively Unlocking Fine-Grained Perception with Dense Recaptions
by: Zhao, Peisen, et al.
Published: (2026)

ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
by: Sun, Shihua, et al.
Published: (2024)

M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for optical-SAR Object Detection
by: Wang, Chao, et al.
Published: (2025)

Test-Time Adaptive Object Detection with Foundation Model
by: Gao, Yingjie, et al.
Published: (2025)

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)