:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Castro, Santiago, Ziai, Amir, Saluja, Avneesh, Yuan, Zhuoning, Mihalcea, Rada
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2402.15021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
by: Ignat, Oana, et al.
Published: (2023)

The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
by: Bai, Longju, et al.
Published: (2024)

Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
by: Ignat, Oana, et al.
Published: (2024)

Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
by: Natalie, Rosiana, et al.
Published: (2025)

Natural Language Inference Improves Compositionality in Vision-Language Models
by: Cascante-Bonilla, Paola, et al.
Published: (2024)

What Do Vision-Language Models Encode for Personalized Image Aesthetics Assessment?
by: Ryu, Koki, et al.
Published: (2026)

Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models
by: Nwatu, Joan, et al.
Published: (2024)

Refining Skewed Perceptions in Vision-Language Contrastive Models through Visual Representations
by: Dai, Haocheng, et al.
Published: (2024)

HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)

Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)

Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
by: Nwatu, Joan, et al.
Published: (2025)

The Hard Positive Truth about Vision-Language Compositionality
by: Kamath, Amita, et al.
Published: (2024)

TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions
by: He, Xingwei, et al.
Published: (2024)

An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)
by: Manevich, Avshalom, et al.
Published: (2024)

Inference-Time Structural Reasoning for Compositional Vision-Language Understanding
by: Bhattacharya, Amartya
Published: (2026)

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
by: Ziai, Amir, et al.
Published: (2024)

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
by: Lee, Seongyun, et al.
Published: (2024)

WAON: A Large-Scale Japanese Image-Text Dataset for Cultural Adaptation in Contrastive Vision-Language Models
by: Sugiura, Issa, et al.
Published: (2025)

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
by: Ye, Jiacheng, et al.
Published: (2025)

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)

Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)

Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
by: Bitton-Guetta, Nitzan, et al.
Published: (2024)

Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
by: Li, Zhecheng, et al.
Published: (2025)

Causal Graphical Models for Vision-Language Compositional Understanding
by: Parascandolo, Fiorenzo, et al.
Published: (2024)

Do Vision-Language Models Really Understand Visual Language?
by: Hou, Yifan, et al.
Published: (2024)

Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
by: Fuller, Harrison, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
by: Wang, Xintong, et al.
Published: (2024)

Conflict Adaptation in Vision-Language Models
by: Hu, Xiaoyang
Published: (2025)

Vision Language Models are Confused Tourists
by: Irawan, Patrick Amadeus, et al.
Published: (2025)

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
by: Lu, Yifan, et al.
Published: (2025)

Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)

VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
by: Huang, Jen-tse, et al.
Published: (2025)

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
by: Wu, Shengguang, et al.
Published: (2025)

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
by: Wu, Xuyang, et al.
Published: (2024)

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)

Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
by: Liao, Yuan-Hong, et al.
Published: (2024)