Saved in:
| Main Authors: | Zhang, Yang, Cao, Jingyi, You, Yanan, Qiao, Yuanyuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.13638 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024)
by: Huang, Feiyang
Published: (2024)
UniViTAR: Unified Vision Transformer with Native Resolution
by: Qiao, Limeng, et al.
Published: (2025)
by: Qiao, Limeng, et al.
Published: (2025)
LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)
by: Li, Yawei, et al.
Published: (2021)
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
by: Xia, Chunlong, et al.
Published: (2024)
by: Xia, Chunlong, et al.
Published: (2024)
Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience
by: Yu, Xiaohang, et al.
Published: (2024)
by: Yu, Xiaohang, et al.
Published: (2024)
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
by: Zhu, Duowang, et al.
Published: (2024)
by: Zhu, Duowang, et al.
Published: (2024)
Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection
by: Yang, Longrong, et al.
Published: (2023)
by: Yang, Longrong, et al.
Published: (2023)
DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation
by: Anand, Tushar, et al.
Published: (2026)
by: Anand, Tushar, et al.
Published: (2026)
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
SAR-NAS: Lightweight SAR Object Detection with Neural Architecture Search
by: Yu, Xinyi, et al.
Published: (2025)
by: Yu, Xinyi, et al.
Published: (2025)
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
Towards Unbiased Source-Free Object Detection via Vision Foundation Models
by: Cai, Zhi, et al.
Published: (2026)
by: Cai, Zhi, et al.
Published: (2026)
FTerViT: Fully Ternary Vision Transformer
by: Ruciński, Szymon, et al.
Published: (2026)
by: Ruciński, Szymon, et al.
Published: (2026)
Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers
by: Shu, Yuyang, et al.
Published: (2024)
by: Shu, Yuyang, et al.
Published: (2024)
RD-ViT: Recurrent-Depth Vision Transformer for Semantic Segmentation with Reduced Data Dependence Extending the Recurrent-Depth Transformer Architecture to Dense Prediction
by: He, Renjie
Published: (2026)
by: He, Renjie
Published: (2026)
Dense Vision Transformer Compression with Few Samples
by: Zhang, Hanxiao, et al.
Published: (2024)
by: Zhang, Hanxiao, et al.
Published: (2024)
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
by: Song, Yuehao, et al.
Published: (2024)
by: Song, Yuehao, et al.
Published: (2024)
TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection
by: Cheng, Lei, et al.
Published: (2025)
by: Cheng, Lei, et al.
Published: (2025)
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
by: Ma, Yunsheng, et al.
Published: (2022)
by: Ma, Yunsheng, et al.
Published: (2022)
GenConViT: Deepfake Video Detection Using Generative Convolutional Vision Transformer
by: Deressa, Deressa Wodajo, et al.
Published: (2023)
by: Deressa, Deressa Wodajo, et al.
Published: (2023)
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
by: You, Haoran, et al.
Published: (2022)
by: You, Haoran, et al.
Published: (2022)
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)
by: Song, Bo, et al.
Published: (2024)
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
by: Suri, Saksham, et al.
Published: (2024)
by: Suri, Saksham, et al.
Published: (2024)
Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer
by: Zhao, Guibin, et al.
Published: (2024)
by: Zhao, Guibin, et al.
Published: (2024)
OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
by: Inkawhich, Matthew, et al.
Published: (2024)
by: Inkawhich, Matthew, et al.
Published: (2024)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)
by: Zhang, Tianfang, et al.
Published: (2024)
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)
by: Jiang, Yanfeng, et al.
Published: (2024)
Vision Transformers: From Semantic Segmentation to Dense Prediction
by: Zhang, Li, et al.
Published: (2022)
by: Zhang, Li, et al.
Published: (2022)
ViT-AdaLA: Adapting Vision Transformers with Linear Attention
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)
by: Ma, Xiaochen, et al.
Published: (2023)
ViTALS: Vision Transformer for Action Localization in Surgical Nephrectomy
by: Chandra, Soumyadeep, et al.
Published: (2024)
by: Chandra, Soumyadeep, et al.
Published: (2024)
ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference
by: Hojjat, Ali, et al.
Published: (2025)
by: Hojjat, Ali, et al.
Published: (2025)
FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking
by: You, Jinlin, et al.
Published: (2026)
by: You, Jinlin, et al.
Published: (2026)
FineViT: Progressively Unlocking Fine-Grained Perception with Dense Recaptions
by: Zhao, Peisen, et al.
Published: (2026)
by: Zhao, Peisen, et al.
Published: (2026)
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
by: Sun, Shihua, et al.
Published: (2024)
by: Sun, Shihua, et al.
Published: (2024)
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for optical-SAR Object Detection
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Test-Time Adaptive Object Detection with Foundation Model
by: Gao, Yingjie, et al.
Published: (2025)
by: Gao, Yingjie, et al.
Published: (2025)
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
by: Cao, Hanwen, et al.
Published: (2025)
by: Cao, Hanwen, et al.
Published: (2025)
Similar Items
-
ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024) -
ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024) -
UniViTAR: Unified Vision Transformer with Native Resolution
by: Qiao, Limeng, et al.
Published: (2025) -
LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021) -
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
by: Xia, Chunlong, et al.
Published: (2024)