Saved in:
| Main Authors: | Dong, Le, Cao, Qixuan, Pu, Lei, Wu, Fangfang, Dong, Weisheng, Li, Xin, Shi, Guangming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.18136 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
by: Lin, Shuchen, et al.
Published: (2025)
by: Lin, Shuchen, et al.
Published: (2025)
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)
by: Kang, Ben, et al.
Published: (2025)
Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement
by: Fang, Huachen, et al.
Published: (2024)
by: Fang, Huachen, et al.
Published: (2024)
Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)
by: Seong, Jihyeon, et al.
Published: (2025)
High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
by: Dong, Le, et al.
Published: (2025)
by: Dong, Le, et al.
Published: (2025)
ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation
by: Li, Siyou, et al.
Published: (2024)
by: Li, Siyou, et al.
Published: (2024)
How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT?
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
by: Hu, Youbing, et al.
Published: (2024)
by: Hu, Youbing, et al.
Published: (2024)
Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)
by: Lao, Dong, et al.
Published: (2023)
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs
by: Chen, Lu, et al.
Published: (2025)
by: Chen, Lu, et al.
Published: (2025)
LoopExpose: An Unsupervised Framework for Arbitrary-Length Exposure Correction
by: Li, Ao, et al.
Published: (2025)
by: Li, Ao, et al.
Published: (2025)
SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images
by: Feng, Jie, et al.
Published: (2024)
by: Feng, Jie, et al.
Published: (2024)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)
by: Fang, Yan, et al.
Published: (2026)
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
by: Zheng, Yunling, et al.
Published: (2024)
by: Zheng, Yunling, et al.
Published: (2024)
DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation
by: Chen, Renqi, et al.
Published: (2024)
by: Chen, Renqi, et al.
Published: (2024)
Dual Distillation for Few-Shot Anomaly Detection
by: Dong, Le, et al.
Published: (2026)
by: Dong, Le, et al.
Published: (2026)
Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)
by: Aubard, Martin, et al.
Published: (2024)
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
by: Wei, Guoyizhe, et al.
Published: (2025)
by: Wei, Guoyizhe, et al.
Published: (2025)
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
by: Wang, Cheng, et al.
Published: (2025)
by: Wang, Cheng, et al.
Published: (2025)
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)
by: Hong, Sungrae
Published: (2025)
S4DL: Shift-sensitive Spatial-Spectral Disentangling Learning for Hyperspectral Image Unsupervised Domain Adaptation
by: Feng, Jie, et al.
Published: (2024)
by: Feng, Jie, et al.
Published: (2024)
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
by: Hu, Zizhao, et al.
Published: (2024)
by: Hu, Zizhao, et al.
Published: (2024)
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)
by: Zhang, Weihang, et al.
Published: (2025)
Robust Remote Sensing Image-Text Retrieval with Noisy Correspondence
by: Song, Qiya, et al.
Published: (2026)
by: Song, Qiya, et al.
Published: (2026)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)
by: Lei, Weixian, et al.
Published: (2023)
ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)
by: Chattopadhyay, Nandish, et al.
Published: (2026)
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
by: Shi, Huihong, et al.
Published: (2024)
by: Shi, Huihong, et al.
Published: (2024)
Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing
by: Tran, Le-Anh, et al.
Published: (2024)
by: Tran, Le-Anh, et al.
Published: (2024)
Pretrained ViTs Yield Versatile Representations For Medical Images
by: Matsoukas, Christos, et al.
Published: (2023)
by: Matsoukas, Christos, et al.
Published: (2023)
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)
by: Ma, Xiaochen, et al.
Published: (2023)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)
by: Jebraeeli, Vahid, et al.
Published: (2025)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)
by: Zhang, Tianfang, et al.
Published: (2024)
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)
by: Ning, Hailong, et al.
Published: (2025)
Similar Items
-
Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
by: Lin, Shuchen, et al.
Published: (2025) -
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025) -
Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement
by: Fang, Huachen, et al.
Published: (2024) -
Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025) -
High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
by: Dong, Le, et al.
Published: (2025)