:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2401.15568
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
by: Salman, Shaeke, et al.
Published: (2024)

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)

Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024)

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)

Intriguing Properties of Data Attribution on Diffusion Models
by: Zheng, Xiaosen, et al.
Published: (2023)

Intriguing properties of generative classifiers
by: Jaini, Priyank, et al.
Published: (2023)

Topological Alignment of Shared Vision-Language Embedding Space
by: You, Junwon, et al.
Published: (2025)

Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
by: Pantazopoulos, Georgios, et al.
Published: (2024)

Data-Driven Fairness Generalization for Deepfake Detection
by: Ezeakunne, Uzoamaka, et al.
Published: (2024)

Robust Asymmetric Heterogeneous Federated Learning with Corrupted Clients
by: Fang, Xiuwen, et al.
Published: (2025)

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
by: Kim, Bum Jun, et al.
Published: (2024)

AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures
by: Urmi, Sifatullah Sheikh, et al.
Published: (2026)

Improving Interpretation Faithfulness for Vision Transformers
by: Hu, Lijie, et al.
Published: (2023)

ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data
by: Habib, Al Zadid Sultan Bin, et al.
Published: (2026)

Robust Multimodal Learning via Cross-Modal Proxy Tokens
by: Reza, Md Kaykobad, et al.
Published: (2025)

DiffiT: Diffusion Vision Transformers for Image Generation
by: Hatamizadeh, Ali, et al.
Published: (2023)

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
by: Trager, Matthew, et al.
Published: (2023)

Discovering Influential Neuron Path in Vision Transformers
by: Wang, Yifan, et al.
Published: (2025)

ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
by: Fan, Jiawei, et al.
Published: (2024)

Convolutional Neural Nets vs Vision Transformers: A SpaceNet Case Study with Balanced vs Imbalanced Regimes
by: Gothi, Akshar
Published: (2025)

Block-Recurrent Dynamics in Vision Transformers
by: Jacobs, Mozes, et al.
Published: (2025)

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
by: Schlarmann, Christian, et al.
Published: (2024)

On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs
by: Schwalbe, Gesina, et al.
Published: (2025)

ADAPT to Robustify Prompt Tuning Vision Transformers
by: Eskandar, Masih, et al.
Published: (2024)

Continual Adaptation of Vision Transformers for Federated Learning
by: Halbe, Shaunak, et al.
Published: (2023)

Mechanisms of Non-Monotonic Scaling in Vision Transformers
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)

Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)

Class-Discriminative Attention Maps for Vision Transformers
by: Brocki, Lennart, et al.
Published: (2023)

Exploring Token Pruning in Vision State Space Models
by: Zhan, Zheng, et al.
Published: (2024)

AdaptViG: Adaptive Vision GNN with Exponential Decay Gating
by: Munir, Mustafa, et al.
Published: (2025)

SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
by: Guimard, Quentin, et al.
Published: (2026)

SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection
by: Ataiefard, Foozhan, et al.
Published: (2024)

Vision-Based Localization and LLM-based Navigation for Indoor Environments
by: Rahimi, Keyan, et al.
Published: (2025)

Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)

Enhancing Vision Transformer Explainability Using Artificial Astrocytes
by: Echevarrieta-Catalan, Nicolas, et al.
Published: (2025)

FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)

Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning
by: Islam, A. K. M. Shoriful, et al.
Published: (2025)

Structure-Guided Adversarial Training of Diffusion Models
by: Yang, Ling, et al.
Published: (2024)

GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
by: Munir, Mustafa, et al.
Published: (2024)

RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation
by: Cao, Yuefan, et al.
Published: (2025)