Saved in:
| Main Authors: | Singh, Prateek, Dholey, Moumita, Vinod, P. K. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.05989 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
Vanilla ViT for Automotive Point Cloud Semantic Segmentation
by: Puy, Gilles, et al.
Published: (2026)
by: Puy, Gilles, et al.
Published: (2026)
Applying ViT in Generalized Few-shot Semantic Segmentation
by: Geng, Liyuan, et al.
Published: (2024)
by: Geng, Liyuan, et al.
Published: (2024)
A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading
by: Qiu, Junlai, et al.
Published: (2025)
by: Qiu, Junlai, et al.
Published: (2025)
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)
by: Hong, Sungrae
Published: (2025)
U-REPA: Aligning Diffusion U-Nets to ViTs
by: Tian, Yuchuan, et al.
Published: (2025)
by: Tian, Yuchuan, et al.
Published: (2025)
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
by: Zhong, Yunshan, et al.
Published: (2023)
by: Zhong, Yunshan, et al.
Published: (2023)
SPAR: Single-Pass Any-Resolution ViT for Open-vocabulary Segmentation
by: Kombol, Naomi, et al.
Published: (2026)
by: Kombol, Naomi, et al.
Published: (2026)
Efficient Breast and Ovarian Cancer Classification via ViT-Based Preprocessing and Transfer Learning
by: Rawat, Richa, et al.
Published: (2025)
by: Rawat, Richa, et al.
Published: (2025)
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
by: Ramachandran, Akshat, et al.
Published: (2024)
by: Ramachandran, Akshat, et al.
Published: (2024)
RepViT: Revisiting Mobile CNN From ViT Perspective
by: Wang, Ao, et al.
Published: (2023)
by: Wang, Ao, et al.
Published: (2023)
Adaptive Aspect Ratios with Patch-Mixup-ViT-based Vehicle ReID
by: Qiu, Mei, et al.
Published: (2024)
by: Qiu, Mei, et al.
Published: (2024)
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
by: Li, Ye, et al.
Published: (2024)
by: Li, Ye, et al.
Published: (2024)
SegDebias: Test-Time Bias Mitigation for ViT-Based CLIP via Segmentation
by: Wu, Fangyu, et al.
Published: (2025)
by: Wu, Fangyu, et al.
Published: (2025)
STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
by: Chattopadhyay, Nandish, et al.
Published: (2026)
by: Chattopadhyay, Nandish, et al.
Published: (2026)
DCDB: Dynamic Conditional Dual Diffusion Bridge for Ill-posed Multi-Tasks
by: Huang, Chengjie, et al.
Published: (2025)
by: Huang, Chengjie, et al.
Published: (2025)
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model
by: Norouzi, Narges, et al.
Published: (2026)
by: Norouzi, Narges, et al.
Published: (2026)
S2R-ViT for Multi-Agent Cooperative Perception: Bridging the Gap from Simulation to Reality
by: Li, Jinlong, et al.
Published: (2023)
by: Li, Jinlong, et al.
Published: (2023)
Intrapartum Ultrasound Image Segmentation of Pubic Symphysis and Fetal Head Using Dual Student-Teacher Framework with CNN-ViT Collaborative Learning
by: Jiang, Jianmei, et al.
Published: (2024)
by: Jiang, Jianmei, et al.
Published: (2024)
Rethinking Random Masking in Self-Distillation on ViT
by: Seong, Jihyeon, et al.
Published: (2025)
by: Seong, Jihyeon, et al.
Published: (2025)
YOLO-Former: YOLO Shakes Hand With ViT
by: Khoramdel, Javad, et al.
Published: (2024)
by: Khoramdel, Javad, et al.
Published: (2024)
ViT-5: Vision Transformers for The Mid-2020s
by: Wang, Feng, et al.
Published: (2026)
by: Wang, Feng, et al.
Published: (2026)
ViTCAE: ViT-based Class-conditioned Autoencoder
by: Jebraeeli, Vahid, et al.
Published: (2025)
by: Jebraeeli, Vahid, et al.
Published: (2025)
PointDiffuse: A Dual-Conditional Diffusion Model for Enhanced Point Cloud Semantic Segmentation
by: He, Yong, et al.
Published: (2025)
by: He, Yong, et al.
Published: (2025)
Frequency-Adaptive Discrete Cosine-ViT-ResNet Architecture for Sparse-Data Vision
by: Kang, Ziyue, et al.
Published: (2025)
by: Kang, Ziyue, et al.
Published: (2025)
OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery
by: Inkawhich, Matthew, et al.
Published: (2024)
by: Inkawhich, Matthew, et al.
Published: (2024)
DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation
by: Chen, Renqi, et al.
Published: (2024)
by: Chen, Renqi, et al.
Published: (2024)
ViT$^3$: Unlocking Test-Time Training in Vision
by: Han, Dongchen, et al.
Published: (2025)
by: Han, Dongchen, et al.
Published: (2025)
EA-ViT: Efficient Adaptation for Elastic Vision Transformer
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image
by: Madan, Chetan, et al.
Published: (2024)
by: Madan, Chetan, et al.
Published: (2024)
One-Shot Multilingual Font Generation Via ViT
by: Wang, Zhiheng, et al.
Published: (2024)
by: Wang, Zhiheng, et al.
Published: (2024)
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
by: Ibtehaz, Nabil, et al.
Published: (2024)
by: Ibtehaz, Nabil, et al.
Published: (2024)
Purrturbed but Stable: Human-Cat Invariant Representations Across CNNs, ViTs and Self-Supervised ViTs
by: Shah, Arya, et al.
Published: (2025)
by: Shah, Arya, et al.
Published: (2025)
DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity
by: Jin, Yeying, et al.
Published: (2022)
by: Jin, Yeying, et al.
Published: (2022)
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
by: Zheng, Yunling, et al.
Published: (2024)
by: Zheng, Yunling, et al.
Published: (2024)
ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
by: Dong, Le, et al.
Published: (2024)
by: Dong, Le, et al.
Published: (2024)
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation
by: Tang, Fenghe, et al.
Published: (2025)
by: Tang, Fenghe, et al.
Published: (2025)
Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking
by: Kang, Ben, et al.
Published: (2025)
by: Kang, Ben, et al.
Published: (2025)
Hybrid CNN-ViT Framework for Motion-Blurred Scene Text Restoration
by: Rashid, Umar, et al.
Published: (2025)
by: Rashid, Umar, et al.
Published: (2025)
Bridging Topology and Deep Representation Learning: A TDA-ViT Fusion Model for Four-Class Brain Tumor Classification
by: Ahmed, Faisal
Published: (2026)
by: Ahmed, Faisal
Published: (2026)
Similar Items
-
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025) -
Vanilla ViT for Automotive Point Cloud Semantic Segmentation
by: Puy, Gilles, et al.
Published: (2026) -
Applying ViT in Generalized Few-shot Semantic Segmentation
by: Geng, Liyuan, et al.
Published: (2024) -
A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading
by: Qiu, Junlai, et al.
Published: (2025) -
Deeper Inside Deep ViT
by: Hong, Sungrae
Published: (2025)