Saved in:
| Main Authors: | Lu, Haoming, Zhong, Feifei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.09416 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Attention IoU: Examining Biases in CelebA using Attention Maps
by: Serianni, Aaron, et al.
Published: (2025)
by: Serianni, Aaron, et al.
Published: (2025)
Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA
by: Park, Sieun, et al.
Published: (2026)
by: Park, Sieun, et al.
Published: (2026)
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
by: Chatzoudis, Gerasimos, et al.
Published: (2026)
by: Chatzoudis, Gerasimos, et al.
Published: (2026)
Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner
by: Wang, Qian-Wei, et al.
Published: (2026)
by: Wang, Qian-Wei, et al.
Published: (2026)
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by: Nandy, Abhilash, et al.
Published: (2024)
by: Nandy, Abhilash, et al.
Published: (2024)
OCR-Quality: A Human-Annotated Dataset for OCR Quality Assessment
by: Zhang, Yulong
Published: (2025)
by: Zhang, Yulong
Published: (2025)
Can Vision-Language Models Understand Construction Workers? An Exploratory Study
by: Bui, Hieu, et al.
Published: (2026)
by: Bui, Hieu, et al.
Published: (2026)
Pre-Trained Vision-Language Models as Partial Annotators
by: Wang, Qian-Wei, et al.
Published: (2024)
by: Wang, Qian-Wei, et al.
Published: (2024)
Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset
by: Blankemeier, Louis, et al.
Published: (2024)
by: Blankemeier, Louis, et al.
Published: (2024)
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
by: Zhang, Shengxuming, et al.
Published: (2024)
by: Zhang, Shengxuming, et al.
Published: (2024)
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
Longitudinal Vestibular Schwannoma Dataset with Consensus-based Human-in-the-loop Annotations
by: Wijethilake, Navodini, et al.
Published: (2025)
by: Wijethilake, Navodini, et al.
Published: (2025)
MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
by: Li, Guanzhen, et al.
Published: (2024)
by: Li, Guanzhen, et al.
Published: (2024)
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models
by: Wu, Kui, et al.
Published: (2025)
by: Wu, Kui, et al.
Published: (2025)
Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis
by: Lu, Sheng, et al.
Published: (2026)
by: Lu, Sheng, et al.
Published: (2026)
doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation
by: Roy, Parthib, et al.
Published: (2024)
by: Roy, Parthib, et al.
Published: (2024)
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
by: Fan, Sicheng, et al.
Published: (2026)
by: Fan, Sicheng, et al.
Published: (2026)
Time Blindness: Why Video-Language Models Can't See What Humans Can?
by: Upadhyay, Ujjwal, et al.
Published: (2025)
by: Upadhyay, Ujjwal, et al.
Published: (2025)
ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models
by: Zhong, Jing, et al.
Published: (2025)
by: Zhong, Jing, et al.
Published: (2025)
UrbanSense:A Framework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models
by: Yin, Jun, et al.
Published: (2025)
by: Yin, Jun, et al.
Published: (2025)
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
by: Zhang, Yongting, et al.
Published: (2024)
by: Zhang, Yongting, et al.
Published: (2024)
Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
by: Mahjourian, Nazanin, et al.
Published: (2025)
by: Mahjourian, Nazanin, et al.
Published: (2025)
ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
by: Zhou, Yuhao, et al.
Published: (2026)
by: Zhou, Yuhao, et al.
Published: (2026)
Can Vision Language Models Understand Mimed Actions?
by: Cho, Hyundong, et al.
Published: (2025)
by: Cho, Hyundong, et al.
Published: (2025)
A-VL: Adaptive Attention for Large Vision-Language Models
by: Zhang, Junyang, et al.
Published: (2024)
by: Zhang, Junyang, et al.
Published: (2024)
Privacy-Preserving Computer Vision for Industry: Three Case Studies in Human-Centric Manufacturing
by: De Coninck, Sander, et al.
Published: (2025)
by: De Coninck, Sander, et al.
Published: (2025)
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
by: Bao, Han, et al.
Published: (2024)
by: Bao, Han, et al.
Published: (2024)
Can Machines Imitate Humans? Integrative Turing-like tests for Language and Vision Demonstrate a Narrowing Gap
by: Zhang, Mengmi, et al.
Published: (2022)
by: Zhang, Mengmi, et al.
Published: (2022)
Can Vision-Language Models Solve Visual Math Equations?
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
by: Choudhury, Monjoy Narayan, et al.
Published: (2025)
Conformal Predictions for Human Action Recognition with Vision-Language Models
by: Tim, Bary, et al.
Published: (2025)
by: Tim, Bary, et al.
Published: (2025)
Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model
by: Heo, Jaehyuk, et al.
Published: (2024)
by: Heo, Jaehyuk, et al.
Published: (2024)
PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
by: Liu, Jinlong, et al.
Published: (2026)
by: Liu, Jinlong, et al.
Published: (2026)
VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
by: Zhao, Hongbo, et al.
Published: (2025)
by: Zhao, Hongbo, et al.
Published: (2025)
ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
by: Chen, Jian, et al.
Published: (2024)
by: Chen, Jian, et al.
Published: (2024)
Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset
by: Chen, Qian, et al.
Published: (2026)
by: Chen, Qian, et al.
Published: (2026)
CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT
by: Du, Chengyi, et al.
Published: (2026)
by: Du, Chengyi, et al.
Published: (2026)
GameVerse: Can Vision-Language Models Learn from Video-based Reflection?
by: Zhang, Kuan, et al.
Published: (2026)
by: Zhang, Kuan, et al.
Published: (2026)
Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery
by: Ma, Sai, et al.
Published: (2025)
by: Ma, Sai, et al.
Published: (2025)
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models
by: Jang, Jonggyu, et al.
Published: (2024)
by: Jang, Jonggyu, et al.
Published: (2024)
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
by: Tao, Xijia, et al.
Published: (2024)
by: Tao, Xijia, et al.
Published: (2024)
Similar Items
-
Attention IoU: Examining Biases in CelebA using Attention Maps
by: Serianni, Aaron, et al.
Published: (2025) -
Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA
by: Park, Sieun, et al.
Published: (2026) -
Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
by: Chatzoudis, Gerasimos, et al.
Published: (2026) -
Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner
by: Wang, Qian-Wei, et al.
Published: (2026) -
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by: Nandy, Abhilash, et al.
Published: (2024)