:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Sun, Bohang
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.22709
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)

MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023)

ViViD: Video Virtual Try-on using Diffusion Models
by: Fang, Zixun, et al.
Published: (2024)

ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)

Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)

I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)

AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
by: Zheng, Yunling, et al.
Published: (2024)

Surface Defect Detection with Gabor Filter Using Reconstruction-Based Blurring U-Net-ViT
by: Si, Jongwook, et al.
Published: (2025)

HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)

Filtered-ViT: A Robust Defense Against Multiple Adversarial Patch Attacks
by: Khanal, Aja, et al.
Published: (2025)

Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
by: Sun, Bohang, et al.
Published: (2025)

MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)

Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay
by: Tong, Jin, et al.
Published: (2026)

YOLO-Former: YOLO Shakes Hand With ViT
by: Khoramdel, Javad, et al.
Published: (2024)

ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)

ViDiC: Video Difference Captioning
by: Wu, Jiangtao, et al.
Published: (2025)

Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)

Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)

ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)

LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)

FTerViT: Fully Ternary Vision Transformer
by: Ruciński, Szymon, et al.
Published: (2026)

STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)

QMaxViT-Unet+: A Query-Based MaxViT-Unet with Edge Enhancement for Scribble-Supervised Segmentation of Medical Images
by: Nguyen-Tat, Thien B., et al.
Published: (2025)

ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)

ViSS-R1: Self-Supervised Reinforcement Video Reasoning
by: Fang, Bo, et al.
Published: (2025)

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
by: Chen, Zerui, et al.
Published: (2024)

ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
by: Han, Xumeng, et al.
Published: (2024)

ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)

ViMU: Benchmarking Video Metaphorical Understanding
by: Li, Qi, et al.
Published: (2026)

ViM-UNet: Vision Mamba for Biomedical Segmentation
by: Archit, Anwai, et al.
Published: (2024)

ViGEO: an Assessment of Vision GNNs in Earth Observation
by: Colomba, Luca, et al.
Published: (2024)

Applying ViT in Generalized Few-shot Semantic Segmentation
by: Geng, Liyuan, et al.
Published: (2024)

One-Shot Multilingual Font Generation Via ViT
by: Wang, Zhiheng, et al.
Published: (2024)

ViDAS: Vision-based Danger Assessment and Scoring
by: Gupta, Pranav, et al.
Published: (2024)

ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024)

ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)

MoViE: Mobile Diffusion for Video Editing
by: Karjauv, Adil, et al.
Published: (2024)

ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)

CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)

Vanilla ViT for Automotive Point Cloud Semantic Segmentation
by: Puy, Gilles, et al.
Published: (2026)