Saved in:
| Main Authors: | Bellaj, Ali El, Cheddadi, Mohammed-Amine, Berber, Rhassan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.11260 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading
by: Bellaj, Ali El, et al.
Published: (2026)
by: Bellaj, Ali El, et al.
Published: (2026)
When Do We Not Need Larger Vision Models?
by: Shi, Baifeng, et al.
Published: (2024)
by: Shi, Baifeng, et al.
Published: (2024)
MambaOut: Do We Really Need Mamba for Vision?
by: Yu, Weihao, et al.
Published: (2024)
by: Yu, Weihao, et al.
Published: (2024)
Vision Transformers Need Registers
by: Darcet, Timothée, et al.
Published: (2023)
by: Darcet, Timothée, et al.
Published: (2023)
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)
Vision Transformers Need More Than Registers
by: Shi, Cheng, et al.
Published: (2026)
by: Shi, Cheng, et al.
Published: (2026)
Do Vision Language Models Need to Process Image Tokens?
by: Ghosh, Sambit, et al.
Published: (2026)
by: Ghosh, Sambit, et al.
Published: (2026)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
by: Hatamizadeh, Ali, et al.
Published: (2024)
by: Hatamizadeh, Ali, et al.
Published: (2024)
Do Vision and Language Encoders Represent the World Similarly?
by: Maniparambil, Mayug, et al.
Published: (2024)
by: Maniparambil, Mayug, et al.
Published: (2024)
Vision Transformers Don't Need Trained Registers
by: Jiang, Nick, et al.
Published: (2025)
by: Jiang, Nick, et al.
Published: (2025)
Hybrid Vision Transformer_GAN Attribute Neutralizer for Mitigating Bias in Chest X_Ray Diagnosis
by: Solomon, Jobeal, et al.
Published: (2026)
by: Solomon, Jobeal, et al.
Published: (2026)
You Only Need Less Attention at Each Stage in Vision Transformers
by: Zhang, Shuoxi, et al.
Published: (2024)
by: Zhang, Shuoxi, et al.
Published: (2024)
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
by: Peng, Jierui, et al.
Published: (2025)
by: Peng, Jierui, et al.
Published: (2025)
Do Vision Transformers See Like Humans? Evaluating their Perceptual Alignment
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
by: Hernández-Cámara, Pablo, et al.
Published: (2025)
Do We Need Perfect Data? Leveraging Noise for Domain Generalized Segmentation
by: Kim, Taeyeong, et al.
Published: (2025)
by: Kim, Taeyeong, et al.
Published: (2025)
DeepHistoViT: An Interpretable Vision Transformer Framework for Histopathological Cancer Classification
by: Mosalpuri, Ravi, et al.
Published: (2026)
by: Mosalpuri, Ravi, et al.
Published: (2026)
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
by: Li, Zhenyang, et al.
Published: (2024)
by: Li, Zhenyang, et al.
Published: (2024)
Concept-Based Explanations in Computer Vision: Where Are We and Where Could We Go?
by: Lee, Jae Hee, et al.
Published: (2024)
by: Lee, Jae Hee, et al.
Published: (2024)
ThinkingViT: Matryoshka Thinking Vision Transformer for Elastic Inference
by: Hojjat, Ali, et al.
Published: (2025)
by: Hojjat, Ali, et al.
Published: (2025)
Denoising Vision Transformers
by: Yang, Jiawei, et al.
Published: (2024)
by: Yang, Jiawei, et al.
Published: (2024)
Do Less, Achieve More: Do We Need Every-Step Optimization for RL Fine-tuning of Diffusion Models?
by: Yan, Renye, et al.
Published: (2026)
by: Yan, Renye, et al.
Published: (2026)
Retina Vision Transformer (RetinaViT): Introducing Scaled Patches into Vision Transformers
by: Shu, Yuyang, et al.
Published: (2024)
by: Shu, Yuyang, et al.
Published: (2024)
Dynamic Mask-Based Backdoor Attack Against Vision AI Models: A Case Study on Mushroom Detection
by: Dridi, Zeineb, et al.
Published: (2026)
by: Dridi, Zeineb, et al.
Published: (2026)
Do We Really Need a Large Number of Visual Prompts?
by: Kim, Youngeun, et al.
Published: (2023)
by: Kim, Youngeun, et al.
Published: (2023)
Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC
by: Tao, Ming, et al.
Published: (2024)
by: Tao, Ming, et al.
Published: (2024)
Toward Semantic-Agnostic and Shape-Aware Vision-Language Segmentation Models
by: Seutin, Corentin, et al.
Published: (2026)
by: Seutin, Corentin, et al.
Published: (2026)
Adaptive Additive Parameter Updates of Vision Transformers for Few-Shot Continual Learning
by: Stein, Kyle, et al.
Published: (2025)
by: Stein, Kyle, et al.
Published: (2025)
Are We on the Right Way for Evaluating Large Vision-Language Models?
by: Chen, Lin, et al.
Published: (2024)
by: Chen, Lin, et al.
Published: (2024)
Wildfire Detection Using Vision Transformer with the Wildfire Dataset
by: Vuppari, Gowtham Raj, et al.
Published: (2025)
by: Vuppari, Gowtham Raj, et al.
Published: (2025)
SeTformer is What You Need for Vision and Language
by: Shamsolmoali, Pourya, et al.
Published: (2024)
by: Shamsolmoali, Pourya, et al.
Published: (2024)
Mamba-R: Vision Mamba ALSO Needs Registers
by: Wang, Feng, et al.
Published: (2024)
by: Wang, Feng, et al.
Published: (2024)
Shuffle Vision Transformer: Lightweight, Fast and Efficient Recognition of Driver Facial Expression
by: Saadi, Ibtissam, et al.
Published: (2024)
by: Saadi, Ibtissam, et al.
Published: (2024)
KANs for Computer Vision: An Experimental Study
by: Mohan, Karthik, et al.
Published: (2024)
by: Mohan, Karthik, et al.
Published: (2024)
Structured Initialization for Vision Transformers
by: Zheng, Jianqiao, et al.
Published: (2025)
by: Zheng, Jianqiao, et al.
Published: (2025)
On the Faithfulness of Vision Transformer Explanations
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
Representative Attention For Vision Transformers
by: Li, Yuntong, et al.
Published: (2026)
by: Li, Yuntong, et al.
Published: (2026)
Vision Transformers with Hierarchical Attention
by: Liu, Yun, et al.
Published: (2021)
by: Liu, Yun, et al.
Published: (2021)
Face Pyramid Vision Transformer
by: Islam, Khawar, et al.
Published: (2022)
by: Islam, Khawar, et al.
Published: (2022)
Interpretability-Aware Vision Transformer
by: Qiang, Yao, et al.
Published: (2023)
by: Qiang, Yao, et al.
Published: (2023)
Locality-Attending Vision Transformer
by: Hajimiri, Sina, et al.
Published: (2026)
by: Hajimiri, Sina, et al.
Published: (2026)
Similar Items
-
Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading
by: Bellaj, Ali El, et al.
Published: (2026) -
When Do We Not Need Larger Vision Models?
by: Shi, Baifeng, et al.
Published: (2024) -
MambaOut: Do We Really Need Mamba for Vision?
by: Yu, Weihao, et al.
Published: (2024) -
Vision Transformers Need Registers
by: Darcet, Timothée, et al.
Published: (2023) -
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders
by: Kuo, Shang-Jui Ray, et al.
Published: (2026)