Saved in:
| Main Authors: | Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.15568 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024)
by: Islam, Chashi Mahiul, et al.
Published: (2024)
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)
by: Shams, Montasir, et al.
Published: (2025)
Intriguing Properties of Data Attribution on Diffusion Models
by: Zheng, Xiaosen, et al.
Published: (2023)
by: Zheng, Xiaosen, et al.
Published: (2023)
Intriguing properties of generative classifiers
by: Jaini, Priyank, et al.
Published: (2023)
by: Jaini, Priyank, et al.
Published: (2023)
Topological Alignment of Shared Vision-Language Embedding Space
by: You, Junwon, et al.
Published: (2025)
by: You, Junwon, et al.
Published: (2025)
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
by: Pantazopoulos, Georgios, et al.
Published: (2024)
by: Pantazopoulos, Georgios, et al.
Published: (2024)
Data-Driven Fairness Generalization for Deepfake Detection
by: Ezeakunne, Uzoamaka, et al.
Published: (2024)
by: Ezeakunne, Uzoamaka, et al.
Published: (2024)
Robust Asymmetric Heterogeneous Federated Learning with Corrupted Clients
by: Fang, Xiuwen, et al.
Published: (2025)
by: Fang, Xiuwen, et al.
Published: (2025)
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
by: Kim, Bum Jun, et al.
Published: (2024)
by: Kim, Bum Jun, et al.
Published: (2024)
AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures
by: Urmi, Sifatullah Sheikh, et al.
Published: (2026)
by: Urmi, Sifatullah Sheikh, et al.
Published: (2026)
Improving Interpretation Faithfulness for Vision Transformers
by: Hu, Lijie, et al.
Published: (2023)
by: Hu, Lijie, et al.
Published: (2023)
ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data
by: Habib, Al Zadid Sultan Bin, et al.
Published: (2026)
by: Habib, Al Zadid Sultan Bin, et al.
Published: (2026)
Robust Multimodal Learning via Cross-Modal Proxy Tokens
by: Reza, Md Kaykobad, et al.
Published: (2025)
by: Reza, Md Kaykobad, et al.
Published: (2025)
DiffiT: Diffusion Vision Transformers for Image Generation
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
by: Trager, Matthew, et al.
Published: (2023)
by: Trager, Matthew, et al.
Published: (2023)
Discovering Influential Neuron Path in Vision Transformers
by: Wang, Yifan, et al.
Published: (2025)
by: Wang, Yifan, et al.
Published: (2025)
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers
by: Fan, Jiawei, et al.
Published: (2024)
by: Fan, Jiawei, et al.
Published: (2024)
Convolutional Neural Nets vs Vision Transformers: A SpaceNet Case Study with Balanced vs Imbalanced Regimes
by: Gothi, Akshar
Published: (2025)
by: Gothi, Akshar
Published: (2025)
Block-Recurrent Dynamics in Vision Transformers
by: Jacobs, Mozes, et al.
Published: (2025)
by: Jacobs, Mozes, et al.
Published: (2025)
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
by: Schlarmann, Christian, et al.
Published: (2024)
by: Schlarmann, Christian, et al.
Published: (2024)
On Background Bias of Post-Hoc Concept Embeddings in Computer Vision DNNs
by: Schwalbe, Gesina, et al.
Published: (2025)
by: Schwalbe, Gesina, et al.
Published: (2025)
ADAPT to Robustify Prompt Tuning Vision Transformers
by: Eskandar, Masih, et al.
Published: (2024)
by: Eskandar, Masih, et al.
Published: (2024)
Continual Adaptation of Vision Transformers for Federated Learning
by: Halbe, Shaunak, et al.
Published: (2023)
by: Halbe, Shaunak, et al.
Published: (2023)
Mechanisms of Non-Monotonic Scaling in Vision Transformers
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
by: Kumar, Anantha Padmanaban Krishna
Published: (2025)
Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)
by: Choudhury, Rohan, et al.
Published: (2025)
Class-Discriminative Attention Maps for Vision Transformers
by: Brocki, Lennart, et al.
Published: (2023)
by: Brocki, Lennart, et al.
Published: (2023)
Exploring Token Pruning in Vision State Space Models
by: Zhan, Zheng, et al.
Published: (2024)
by: Zhan, Zheng, et al.
Published: (2024)
AdaptViG: Adaptive Vision GNN with Exponential Decay Gating
by: Munir, Mustafa, et al.
Published: (2025)
by: Munir, Mustafa, et al.
Published: (2025)
SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
by: Guimard, Quentin, et al.
Published: (2026)
by: Guimard, Quentin, et al.
Published: (2026)
SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection
by: Ataiefard, Foozhan, et al.
Published: (2024)
by: Ataiefard, Foozhan, et al.
Published: (2024)
Vision-Based Localization and LLM-based Navigation for Indoor Environments
by: Rahimi, Keyan, et al.
Published: (2025)
by: Rahimi, Keyan, et al.
Published: (2025)
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
Enhancing Vision Transformer Explainability Using Artificial Astrocytes
by: Echevarrieta-Catalan, Nicolas, et al.
Published: (2025)
by: Echevarrieta-Catalan, Nicolas, et al.
Published: (2025)
FasterViT: Fast Vision Transformers with Hierarchical Attention
by: Hatamizadeh, Ali, et al.
Published: (2023)
by: Hatamizadeh, Ali, et al.
Published: (2023)
Lightweight Model for Poultry Disease Detection from Fecal Images Using Multi-Color Space Feature Optimization and Machine Learning
by: Islam, A. K. M. Shoriful, et al.
Published: (2025)
by: Islam, A. K. M. Shoriful, et al.
Published: (2025)
Structure-Guided Adversarial Training of Diffusion Models
by: Yang, Ling, et al.
Published: (2024)
by: Yang, Ling, et al.
Published: (2024)
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
by: Munir, Mustafa, et al.
Published: (2024)
by: Munir, Mustafa, et al.
Published: (2024)
RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation
by: Cao, Yuefan, et al.
Published: (2025)
by: Cao, Yuefan, et al.
Published: (2025)
Similar Items
-
Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
by: Salman, Shaeke, et al.
Published: (2024) -
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024) -
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024) -
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025) -
Intriguing Properties of Data Attribution on Diffusion Models
by: Zheng, Xiaosen, et al.
Published: (2023)