:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen, Zhu, Lingjiong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2402.08473
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
by: Salman, Shaeke, et al.
Published: (2024)

Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)

Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024)

VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
by: Sajib, Rakib Hossain, et al.
Published: (2026)

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
by: Xu, Zhenlin, et al.
Published: (2023)

Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets
by: Choi, Lucas, et al.
Published: (2024)

A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
by: Chen, Yongfan, et al.
Published: (2025)

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
by: Chen, Shiming, et al.
Published: (2025)

Binary Verification for Zero-Shot Vision
by: Hu, Rongbin, et al.
Published: (2025)

ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models
by: Zhang, Pu, et al.
Published: (2025)

LLM meets Vision-Language Models for Zero-Shot One-Class Classification
by: Bendou, Yassir, et al.
Published: (2024)

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
by: Chen, Shiming, et al.
Published: (2024)

Intriguing Properties of Large Language and Vision Models
by: Lee, Young-Jun, et al.
Published: (2024)

Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment
by: Jeong, Seongjun, et al.
Published: (2024)

Vision Transformers for Zero-Shot Clustering of Animal Images: A Comparative Benchmarking Study
by: Markoff, Hugo, et al.
Published: (2026)

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models
by: Yu, Lu, et al.
Published: (2024)

LightZeroNav: Zero-Shot Vision Language Navigation in Continuous Environments Based on Lightweight VLMs
by: Luo, Kun, et al.
Published: (2026)

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
by: Zhang, Zicheng, et al.
Published: (2024)

TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
by: Li, Dingbang, et al.
Published: (2024)

ReHARK: Refined Hybrid Adaptive RBF Kernels for Robust One-Shot Vision-Language Adaptation
by: Islam, Md Jahidul
Published: (2026)

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
by: Metzen, Jan Hendrik, et al.
Published: (2023)

ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain
by: Mia, Md Sohag, et al.
Published: (2023)

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images
by: Thota, Kundan, et al.
Published: (2026)

Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models
by: Atabuzzaman, Md., et al.
Published: (2025)

Intriguing Properties of Data Attribution on Diffusion Models
by: Zheng, Xiaosen, et al.
Published: (2023)

Coherent Zero-Shot Visual Instruction Generation
by: Phung, Quynh, et al.
Published: (2024)

Rethinking Plant Disease Diagnosis: Bridging the Academic-Practical Gap with Vision Transformers and Zero-Shot Learning
by: Benabbas, Wassim, et al.
Published: (2025)

Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
by: Rau, Anita, et al.
Published: (2025)

Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models
by: Seth, Ashish, et al.
Published: (2024)

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
by: Wang, Haibo, et al.
Published: (2026)

Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models
by: Xu, Zane, et al.
Published: (2025)

Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
by: Nagar, Aishik, et al.
Published: (2024)

MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context
by: Gu, Zishan, et al.
Published: (2024)

Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection
by: Aqeel, Muhammad, et al.
Published: (2026)

Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
by: Rahaman, Md Mamunur, et al.
Published: (2025)

Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models
by: Zeng, Fanhu, et al.
Published: (2025)

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
by: Sui, Elaine, et al.
Published: (2024)

Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
by: Dafnis, Konstantinos M., et al.
Published: (2025)

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)