:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Le, Cao, Qixuan, Pu, Lei, Wu, Fangfang, Dong, Weisheng, Li, Xin, Shi, Guangming
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.18136
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
by: Lin, Shuchen, et al.
Published: (2025)

Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)

Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement
by: Fang, Huachen, et al.
Published: (2024)

Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)

High-Order Progressive Trajectory Matching for Medical Image Dataset Distillation
by: Dong, Le, et al.
Published: (2025)

ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation
by: Li, Siyou, et al.
Published: (2024)

How Can Multimodal Remote Sensing Datasets Transform Classification via SpatialNet-ViT?
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)

LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient Image Recognition
by: Hu, Youbing, et al.
Published: (2024)

Sub-token ViT Embedding via Stochastic Resonance Transformers
by: Lao, Dong, et al.
Published: (2023)

Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)

Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs
by: Chen, Lu, et al.
Published: (2025)

LoopExpose: An Unsupervised Framework for Arbitrary-Length Exposure Correction
by: Li, Ao, et al.
Published: (2025)

SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images
by: Feng, Jie, et al.
Published: (2024)

Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)

Let ViT Speak: Generative Language-Image Pre-training
by: Fang, Yan, et al.
Published: (2026)

AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
by: Zheng, Yunling, et al.
Published: (2024)

DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation
by: Chen, Renqi, et al.
Published: (2024)

Dual Distillation for Few-Shot Anomaly Detection
by: Dong, Le, et al.
Published: (2026)

Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
by: Aubard, Martin, et al.
Published: (2024)

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
by: Wei, Guoyizhe, et al.
Published: (2025)

EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)

Improve Contrastive Clustering Performance by Multiple Fusing-Augmenting ViT Blocks
by: Wang, Cheng, et al.
Published: (2025)

Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)

S4DL: Shift-sensitive Spatial-Spectral Disentangling Learning for Hyperspectral Image Unsupervised Domain Adaptation
by: Feng, Jie, et al.
Published: (2024)

An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
by: Hu, Zizhao, et al.
Published: (2024)

A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
by: Zhang, Weihang, et al.
Published: (2025)

Robust Remote Sensing Image-Text Retrieval with Noisy Correspondence
by: Song, Qiya, et al.
Published: (2026)

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)

ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)

ViT-Split: Unleashing the Power of Vision Foundation Models via Efficient Splitting Heads
by: Li, Yifan, et al.
Published: (2025)

STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
by: Shi, Huihong, et al.
Published: (2024)

Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing
by: Tran, Le-Anh, et al.
Published: (2024)

Pretrained ViTs Yield Versatile Representations For Medical Images
by: Matsoukas, Christos, et al.
Published: (2023)

IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
by: Ma, Xiaochen, et al.
Published: (2023)

ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)

ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
by: Zhang, Tianfang, et al.
Published: (2024)

RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)

Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
by: Ning, Hailong, et al.
Published: (2025)