Saved in:
| Main Authors: | Soroka, Emi, Arzyn, Artem |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.03046 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer
by: Kumar, Sonal, et al.
Published: (2024)
by: Kumar, Sonal, et al.
Published: (2024)
Domain-Specific Self-Supervised Pre-training for Agricultural Disease Classification: A Hierarchical Vision Transformer Study
by: Sonavane, Arnav S.
Published: (2026)
by: Sonavane, Arnav S.
Published: (2026)
SynGen-Vision: Synthetic Data Generation for training industrial vision models
by: Dubey, Alpana, et al.
Published: (2025)
by: Dubey, Alpana, et al.
Published: (2025)
DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining
by: Rodas, Bryan, et al.
Published: (2025)
by: Rodas, Bryan, et al.
Published: (2025)
Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment
by: Baxevanakis, Spiros, et al.
Published: (2026)
by: Baxevanakis, Spiros, et al.
Published: (2026)
Massively Multi-Person 3D Human Motion Forecasting with Scene Context
by: Mueller, Felix B, et al.
Published: (2024)
by: Mueller, Felix B, et al.
Published: (2024)
PyCAT4: A Hierarchical Vision Transformer-based Framework for 3D Human Pose Estimation
by: Yang, Zongyou, et al.
Published: (2025)
by: Yang, Zongyou, et al.
Published: (2025)
On the Domain Robustness of Contrastive Vision-Language Models
by: Koddenbrock, Mario, et al.
Published: (2025)
by: Koddenbrock, Mario, et al.
Published: (2025)
Unified Local and Global Attention Interaction Modeling for Vision Transformers
by: Nguyen, Tan, et al.
Published: (2024)
by: Nguyen, Tan, et al.
Published: (2024)
Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes
by: Perez, Joan, et al.
Published: (2025)
by: Perez, Joan, et al.
Published: (2025)
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification
by: Ashraf, S. M. Nabil, et al.
Published: (2023)
by: Ashraf, S. M. Nabil, et al.
Published: (2023)
A Review of Pseudo-Labeling for Computer Vision
by: Kage, Patrick, et al.
Published: (2024)
by: Kage, Patrick, et al.
Published: (2024)
Exploring Visual Embedding Spaces Induced by Vision Transformers for Online Auto Parts Marketplaces
by: Armijo, Cameron, et al.
Published: (2025)
by: Armijo, Cameron, et al.
Published: (2025)
VT-FSL: Bridging Vision and Text with LLMs for Few-Shot Learning
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
In Context Learning with Vision Transformers: Case Study
by: Zhao, Antony, et al.
Published: (2025)
by: Zhao, Antony, et al.
Published: (2025)
Attention-Aware Transformer-Based Aggregation Network for Video Periocular Recognition
by: Carreira, Luiz G F, et al.
Published: (2026)
by: Carreira, Luiz G F, et al.
Published: (2026)
Application of Generative Adversarial Network (GAN) for Synthetic Training Data Creation to improve performance of ANN Classifier for extracting Built-Up pixels from Landsat Satellite Imagery
by: Mukherjee, Amritendu, et al.
Published: (2025)
by: Mukherjee, Amritendu, et al.
Published: (2025)
Residual Vision Transformer (ResViT) Based Self-Supervised Learning Model for Brain Tumor Classification
by: Karagoz, Meryem Altin, et al.
Published: (2024)
by: Karagoz, Meryem Altin, et al.
Published: (2024)
Vision Transformer-based Model for Severity Quantification of Lung Pneumonia Using Chest X-ray Images
by: Slika, Bouthaina, et al.
Published: (2023)
by: Slika, Bouthaina, et al.
Published: (2023)
High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models
by: He, Mengqi, et al.
Published: (2025)
by: He, Mengqi, et al.
Published: (2025)
Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration
by: Liu, Yi, et al.
Published: (2026)
by: Liu, Yi, et al.
Published: (2026)
Fusion and Grouping Strategies in Deep Learning for Local Climate Zone Classification of Multimodal Remote Sensing Data
by: Thomas, Ancymol, et al.
Published: (2026)
by: Thomas, Ancymol, et al.
Published: (2026)
Comparative Analysis of Vision Transformers and Convolutional Neural Networks for Medical Image Classification
by: Kawadkar, Kunal
Published: (2025)
by: Kawadkar, Kunal
Published: (2025)
Data-driven Super-Resolution of Flood Inundation Maps using Synthetic Simulations
by: Aravamudan, Akshay, et al.
Published: (2025)
by: Aravamudan, Akshay, et al.
Published: (2025)
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
by: Maity, Subhajit, et al.
Published: (2025)
by: Maity, Subhajit, et al.
Published: (2025)
DesertFormer: Transformer-Based Semantic Segmentation for Off-Road Desert Terrain Classification in Autonomous Navigation Systems
by: Chebolu, Yasaswini
Published: (2026)
by: Chebolu, Yasaswini
Published: (2026)
Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2024)
by: Sepehri, Mohammad Shahab, et al.
Published: (2024)
Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing
by: Ciranni, Massimiliano, et al.
Published: (2025)
by: Ciranni, Massimiliano, et al.
Published: (2025)
LeDiFlow: Learned Distribution-guided Flow Matching to Accelerate Image Generation
by: Zwick, Pascal, et al.
Published: (2025)
by: Zwick, Pascal, et al.
Published: (2025)
Looking at Model Debiasing through the Lens of Anomaly Detection
by: Pastore, Vito Paolo, et al.
Published: (2024)
by: Pastore, Vito Paolo, et al.
Published: (2024)
Global-Local Similarity for Efficient Fine-Grained Image Recognition with Vision Transformers
by: Rios, Edwin Arkel, et al.
Published: (2024)
by: Rios, Edwin Arkel, et al.
Published: (2024)
FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
by: Diller, Christian, et al.
Published: (2022)
by: Diller, Christian, et al.
Published: (2022)
A Spitting Image: Modular Superpixel Tokenization in Vision Transformers
by: Aasan, Marius, et al.
Published: (2024)
by: Aasan, Marius, et al.
Published: (2024)
Boosting Model Resilience via Implicit Adversarial Data Augmentation
by: Zhou, Xiaoling, et al.
Published: (2024)
by: Zhou, Xiaoling, et al.
Published: (2024)
A Genealogy of Foundation Models in Remote Sensing
by: Lane, Kevin, et al.
Published: (2025)
by: Lane, Kevin, et al.
Published: (2025)
Comparison Study: Glacier Calving Front Delineation in Synthetic Aperture Radar Images With Deep Learning
by: Gourmelon, Nora, et al.
Published: (2025)
by: Gourmelon, Nora, et al.
Published: (2025)
Label Delay in Online Continual Learning
by: Csaba, Botos, et al.
Published: (2023)
by: Csaba, Botos, et al.
Published: (2023)
An Immersive Multi-Elevation Multi-Seasonal Dataset for 3D Reconstruction and Visualization
by: Liu, Xijun, et al.
Published: (2024)
by: Liu, Xijun, et al.
Published: (2024)
Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
by: Hao, Yan, et al.
Published: (2024)
by: Hao, Yan, et al.
Published: (2024)
From Misclassifications to Outliers: Joint Reliability Assessment in Classification
by: Li, Yang, et al.
Published: (2026)
by: Li, Yang, et al.
Published: (2026)
Similar Items
-
DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer
by: Kumar, Sonal, et al.
Published: (2024) -
Domain-Specific Self-Supervised Pre-training for Agricultural Disease Classification: A Hierarchical Vision Transformer Study
by: Sonavane, Arnav S.
Published: (2026) -
SynGen-Vision: Synthetic Data Generation for training industrial vision models
by: Dubey, Alpana, et al.
Published: (2025) -
DIET-CP: Lightweight and Data Efficient Self Supervised Continued Pretraining
by: Rodas, Bryan, et al.
Published: (2025) -
Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment
by: Baxevanakis, Spiros, et al.
Published: (2026)