Saved in:
| Main Author: | Sun, Bohang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.22709 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023)
by: Sun, Jian, et al.
Published: (2023)
ViViD: Video Virtual Try-on using Diffusion Models
by: Fang, Zixun, et al.
Published: (2024)
by: Fang, Zixun, et al.
Published: (2024)
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)
by: Jebraeeli, Vahid, et al.
Published: (2025)
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)
by: Hong, Sungrae
Published: (2025)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
by: Zheng, Yunling, et al.
Published: (2024)
by: Zheng, Yunling, et al.
Published: (2024)
Surface Defect Detection with Gabor Filter Using Reconstruction-Based Blurring U-Net-ViT
by: Si, Jongwook, et al.
Published: (2025)
by: Si, Jongwook, et al.
Published: (2025)
HydraViT: Stacking Heads for a Scalable ViT
by: Haberer, Janek, et al.
Published: (2024)
by: Haberer, Janek, et al.
Published: (2024)
Filtered-ViT: A Robust Defense Against Multiple Adversarial Patch Attacks
by: Khanal, Aja, et al.
Published: (2025)
by: Khanal, Aja, et al.
Published: (2025)
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
by: Sun, Bohang, et al.
Published: (2025)
by: Sun, Bohang, et al.
Published: (2025)
MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition
by: Kim, Ye-eun, et al.
Published: (2025)
by: Kim, Ye-eun, et al.
Published: (2025)
Colinearity Decay: Training Quantization-Friendly ViTs with Outlier Decay
by: Tong, Jin, et al.
Published: (2026)
by: Tong, Jin, et al.
Published: (2026)
YOLO-Former: YOLO Shakes Hand With ViT
by: Khoramdel, Javad, et al.
Published: (2024)
by: Khoramdel, Javad, et al.
Published: (2024)
ViTAR: Vision Transformer with Any Resolution
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
ViDiC: Video Difference Captioning
by: Wu, Jiangtao, et al.
Published: (2025)
by: Wu, Jiangtao, et al.
Published: (2025)
Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)
by: Seong, Jihyeon, et al.
Published: (2025)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
LocalViT: Analyzing Locality in Vision Transformers
by: Li, Yawei, et al.
Published: (2021)
by: Li, Yawei, et al.
Published: (2021)
FTerViT: Fully Ternary Vision Transformer
by: Ruciński, Szymon, et al.
Published: (2026)
by: Ruciński, Szymon, et al.
Published: (2026)
STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)
by: Chattopadhyay, Nandish, et al.
Published: (2026)
QMaxViT-Unet+: A Query-Based MaxViT-Unet with Edge Enhancement for Scribble-Supervised Segmentation of Medical Images
by: Nguyen-Tat, Thien B., et al.
Published: (2025)
by: Nguyen-Tat, Thien B., et al.
Published: (2025)
ViT-Lens: Towards Omni-modal Representations
by: Lei, Weixian, et al.
Published: (2023)
by: Lei, Weixian, et al.
Published: (2023)
ViSS-R1: Self-Supervised Reinforcement Video Reasoning
by: Fang, Bo, et al.
Published: (2025)
by: Fang, Bo, et al.
Published: (2025)
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
by: Chen, Zerui, et al.
Published: (2024)
by: Chen, Zerui, et al.
Published: (2024)
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
by: Han, Xumeng, et al.
Published: (2024)
by: Han, Xumeng, et al.
Published: (2024)
ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers
by: Jiang, Yanfeng, et al.
Published: (2024)
by: Jiang, Yanfeng, et al.
Published: (2024)
ViMU: Benchmarking Video Metaphorical Understanding
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
ViM-UNet: Vision Mamba for Biomedical Segmentation
by: Archit, Anwai, et al.
Published: (2024)
by: Archit, Anwai, et al.
Published: (2024)
ViGEO: an Assessment of Vision GNNs in Earth Observation
by: Colomba, Luca, et al.
Published: (2024)
by: Colomba, Luca, et al.
Published: (2024)
Applying ViT in Generalized Few-shot Semantic Segmentation
by: Geng, Liyuan, et al.
Published: (2024)
by: Geng, Liyuan, et al.
Published: (2024)
One-Shot Multilingual Font Generation Via ViT
by: Wang, Zhiheng, et al.
Published: (2024)
by: Wang, Zhiheng, et al.
Published: (2024)
ViDAS: Vision-based Danger Assessment and Scoring
by: Gupta, Pranav, et al.
Published: (2024)
by: Gupta, Pranav, et al.
Published: (2024)
ViTOC: Vision Transformer and Object-aware Captioner
by: Huang, Feiyang
Published: (2024)
by: Huang, Feiyang
Published: (2024)
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
MoViE: Mobile Diffusion for Video Editing
by: Karjauv, Adil, et al.
Published: (2024)
by: Karjauv, Adil, et al.
Published: (2024)
ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)
by: Song, Bo, et al.
Published: (2024)
CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
Vanilla ViT for Automotive Point Cloud Semantic Segmentation
by: Puy, Gilles, et al.
Published: (2026)
by: Puy, Gilles, et al.
Published: (2026)
Similar Items
-
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023) -
MC-ViViT: Multi-branch Classifier-ViViT to detect Mild Cognitive Impairment in older adults using facial videos
by: Sun, Jian, et al.
Published: (2023) -
ViViD: Video Virtual Try-on using Diffusion Models
by: Fang, Zixun, et al.
Published: (2024) -
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025) -
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)