Saved in:
| Main Authors: | Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen, Zhu, Lingjiong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.08473 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024)
by: Salman, Shaeke, et al.
Published: (2024)
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025)
by: Shams, Montasir, et al.
Published: (2025)
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024)
by: Islam, Chashi Mahiul, et al.
Published: (2024)
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
by: Sajib, Rakib Hossain, et al.
Published: (2026)
by: Sajib, Rakib Hossain, et al.
Published: (2026)
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
by: Xu, Zhenlin, et al.
Published: (2023)
by: Xu, Zhenlin, et al.
Published: (2023)
Evaluating Vision-Language Models for Zero-Shot Detection, Classification, and Association of Motorcycles, Passengers, and Helmets
by: Choi, Lucas, et al.
Published: (2024)
by: Choi, Lucas, et al.
Published: (2024)
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
by: Chen, Yongfan, et al.
Published: (2025)
by: Chen, Yongfan, et al.
Published: (2025)
Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
by: Chen, Shiming, et al.
Published: (2025)
by: Chen, Shiming, et al.
Published: (2025)
Binary Verification for Zero-Shot Vision
by: Hu, Rongbin, et al.
Published: (2025)
by: Hu, Rongbin, et al.
Published: (2025)
ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models
by: Zhang, Pu, et al.
Published: (2025)
by: Zhang, Pu, et al.
Published: (2025)
LLM meets Vision-Language Models for Zero-Shot One-Class Classification
by: Bendou, Yassir, et al.
Published: (2024)
by: Bendou, Yassir, et al.
Published: (2024)
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
by: Chen, Shiming, et al.
Published: (2024)
by: Chen, Shiming, et al.
Published: (2024)
Intriguing Properties of Large Language and Vision Models
by: Lee, Young-Jun, et al.
Published: (2024)
by: Lee, Young-Jun, et al.
Published: (2024)
Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment
by: Jeong, Seongjun, et al.
Published: (2024)
by: Jeong, Seongjun, et al.
Published: (2024)
Vision Transformers for Zero-Shot Clustering of Animal Images: A Comparative Benchmarking Study
by: Markoff, Hugo, et al.
Published: (2026)
by: Markoff, Hugo, et al.
Published: (2026)
Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models
by: Yu, Lu, et al.
Published: (2024)
by: Yu, Lu, et al.
Published: (2024)
LightZeroNav: Zero-Shot Vision Language Navigation in Continuous Environments Based on Lightweight VLMs
by: Luo, Kun, et al.
Published: (2026)
by: Luo, Kun, et al.
Published: (2026)
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
by: Li, Dingbang, et al.
Published: (2024)
by: Li, Dingbang, et al.
Published: (2024)
ReHARK: Refined Hybrid Adaptive RBF Kernels for Robust One-Shot Vision-Language Adaptation
by: Islam, Md Jahidul
Published: (2026)
by: Islam, Md Jahidul
Published: (2026)
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
by: Metzen, Jan Hendrik, et al.
Published: (2023)
by: Metzen, Jan Hendrik, et al.
Published: (2023)
ViTs are Everywhere: A Comprehensive Study Showcasing Vision Transformers in Different Domain
by: Mia, Md Sohag, et al.
Published: (2023)
by: Mia, Md Sohag, et al.
Published: (2023)
HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images
by: Thota, Kundan, et al.
Published: (2026)
by: Thota, Kundan, et al.
Published: (2026)
Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models
by: Atabuzzaman, Md., et al.
Published: (2025)
by: Atabuzzaman, Md., et al.
Published: (2025)
Intriguing Properties of Data Attribution on Diffusion Models
by: Zheng, Xiaosen, et al.
Published: (2023)
by: Zheng, Xiaosen, et al.
Published: (2023)
Coherent Zero-Shot Visual Instruction Generation
by: Phung, Quynh, et al.
Published: (2024)
by: Phung, Quynh, et al.
Published: (2024)
Rethinking Plant Disease Diagnosis: Bridging the Academic-Practical Gap with Vision Transformers and Zero-Shot Learning
by: Benabbas, Wassim, et al.
Published: (2025)
by: Benabbas, Wassim, et al.
Published: (2025)
Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
by: Rau, Anita, et al.
Published: (2025)
by: Rau, Anita, et al.
Published: (2025)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models
by: Seth, Ashish, et al.
Published: (2024)
by: Seth, Ashish, et al.
Published: (2024)
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
by: Wang, Haibo, et al.
Published: (2026)
by: Wang, Haibo, et al.
Published: (2026)
Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models
by: Xu, Zane, et al.
Published: (2025)
by: Xu, Zane, et al.
Published: (2025)
Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
by: Nagar, Aishik, et al.
Published: (2024)
by: Nagar, Aishik, et al.
Published: (2024)
MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context
by: Gu, Zishan, et al.
Published: (2024)
by: Gu, Zishan, et al.
Published: (2024)
Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection
by: Aqeel, Muhammad, et al.
Published: (2026)
by: Aqeel, Muhammad, et al.
Published: (2026)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
by: Rahaman, Md Mamunur, et al.
Published: (2025)
by: Rahaman, Md Mamunur, et al.
Published: (2025)
Towards Efficient and General-Purpose Few-Shot Misclassification Detection for Vision-Language Models
by: Zeng, Fanhu, et al.
Published: (2025)
by: Zeng, Fanhu, et al.
Published: (2025)
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
by: Sui, Elaine, et al.
Published: (2024)
by: Sui, Elaine, et al.
Published: (2024)
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
by: Dafnis, Konstantinos M., et al.
Published: (2025)
by: Dafnis, Konstantinos M., et al.
Published: (2025)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
Similar Items
-
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
by: Salman, Shaeke, et al.
Published: (2024) -
Unaligning Everything: Or Aligning Any Text to Any Image in Multimodal Models
by: Salman, Shaeke, et al.
Published: (2024) -
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
by: Shams, Montasir, et al.
Published: (2025) -
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
by: Islam, Chashi Mahiul, et al.
Published: (2024) -
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
by: Sajib, Rakib Hossain, et al.
Published: (2026)