:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lu, Haoming, Zhong, Feifei
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.09416
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Attention IoU: Examining Biases in CelebA using Attention Maps
by: Serianni, Aaron, et al.
Published: (2025)

Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA
by: Park, Sieun, et al.
Published: (2026)

Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
by: Chatzoudis, Gerasimos, et al.
Published: (2026)

Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner
by: Wang, Qian-Wei, et al.
Published: (2026)

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by: Nandy, Abhilash, et al.
Published: (2024)

OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025)

Can Vision-Language Models Understand Construction Workers? An Exploratory Study
by: Bui, Hieu, et al.
Published: (2026)

Pre-Trained Vision-Language Models as Partial Annotators
by: Wang, Qian-Wei, et al.
Published: (2024)

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset
by: Blankemeier, Louis, et al.
Published: (2024)

Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
by: Zhang, Shengxuming, et al.
Published: (2024)

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
by: Lu, Yujie, et al.
Published: (2024)

Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations
by: Wijethilake, Navodini, et al.
Published: (2025)

MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
by: Li, Guanzhen, et al.
Published: (2024)

VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models
by: Wu, Kui, et al.
Published: (2025)

Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis
by: Lu, Sheng, et al.
Published: (2026)

doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation
by: Roy, Parthib, et al.
Published: (2024)

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
by: Fan, Sicheng, et al.
Published: (2026)

Time Blindness: Why Video-Language Models Can't See What Humans Can?
by: Upadhyay, Ujjwal, et al.
Published: (2025)

ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models
by: Zhong, Jing, et al.
Published: (2025)

UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models
by: Yin, Jun, et al.
Published: (2025)

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
by: Zhang, Yongting, et al.
Published: (2024)

Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
by: Mahjourian, Nazanin, et al.
Published: (2025)

ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)

Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)

A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)

Privacy-Preserving Computer Vision for Industry: Three Case Studies in Human-Centric Manufacturing
by: De Coninck, Sander, et al.
Published: (2025)

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
by: Bao, Han, et al.
Published: (2024)

Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap
by: Zhang, Mengmi, et al.
Published: (2022)

Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)

Conformal Predictions for Human Action Recognition with Vision-Language Models
by: Tim, Bary, et al.
Published: (2025)

Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
by: Heo, Jaehyuk, et al.
Published: (2024)

PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
by: Liu, Jinlong, et al.
Published: (2026)

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)

ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
by: Chen, Jian, et al.
Published: (2024)

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)

CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT
by: Du, Chengyi, et al.
Published: (2026)

GameVerse: Can Vision-Language Models Learn from Video-based Reflection?
by: Zhang, Kuan, et al.
Published: (2026)

Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery
by: Ma, Sai, et al.
Published: (2025)

Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
by: Jang, Jonggyu, et al.
Published: (2024)

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
by: Tao, Xijia, et al.
Published: (2024)