Saved in:
| Main Authors: | Castro, Santiago, Ziai, Amir, Saluja, Avneesh, Yuan, Zhuoning, Mihalcea, Rada |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.15021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
by: Ignat, Oana, et al.
Published: (2023)
by: Ignat, Oana, et al.
Published: (2023)
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
by: Bai, Longju, et al.
Published: (2024)
by: Bai, Longju, et al.
Published: (2024)
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
by: Ignat, Oana, et al.
Published: (2024)
by: Ignat, Oana, et al.
Published: (2024)
Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
by: Natalie, Rosiana, et al.
Published: (2025)
by: Natalie, Rosiana, et al.
Published: (2025)
Natural Language Inference Improves Compositionality in Vision-Language Models
by: Cascante-Bonilla, Paola, et al.
Published: (2024)
by: Cascante-Bonilla, Paola, et al.
Published: (2024)
What Do Vision-Language Models Encode for Personalized Image Aesthetics Assessment?
by: Ryu, Koki, et al.
Published: (2026)
by: Ryu, Koki, et al.
Published: (2026)
Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models
by: Nwatu, Joan, et al.
Published: (2024)
by: Nwatu, Joan, et al.
Published: (2024)
Refining Skewed Perceptions in Vision-Language Contrastive Models through Visual Representations
by: Dai, Haocheng, et al.
Published: (2024)
by: Dai, Haocheng, et al.
Published: (2024)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Vision-Language Models Create Cross-Modal Task Representations
by: Luo, Grace, et al.
Published: (2024)
by: Luo, Grace, et al.
Published: (2024)
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
by: Nwatu, Joan, et al.
Published: (2025)
by: Nwatu, Joan, et al.
Published: (2025)
The Hard Positive Truth about Vision-Language Compositionality
by: Kamath, Amita, et al.
Published: (2024)
by: Kamath, Amita, et al.
Published: (2024)
TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions
by: He, Xingwei, et al.
Published: (2024)
by: He, Xingwei, et al.
Published: (2024)
An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)
by: Ma, Teli, et al.
Published: (2023)
Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)
by: Manevich, Avshalom, et al.
Published: (2024)
by: Manevich, Avshalom, et al.
Published: (2024)
Inference-Time Structural Reasoning for Compositional Vision-Language Understanding
by: Bhattacharya, Amartya
Published: (2026)
by: Bhattacharya, Amartya
Published: (2026)
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
by: Ziai, Amir, et al.
Published: (2024)
by: Ziai, Amir, et al.
Published: (2024)
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
by: Lee, Seongyun, et al.
Published: (2024)
by: Lee, Seongyun, et al.
Published: (2024)
WAON: A Large-Scale Japanese Image-Text Dataset for Cultural Adaptation in Contrastive Vision-Language Models
by: Sugiura, Issa, et al.
Published: (2025)
by: Sugiura, Issa, et al.
Published: (2025)
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
by: Ye, Jiacheng, et al.
Published: (2025)
by: Ye, Jiacheng, et al.
Published: (2025)
Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)
by: Lee, Yi-Lun, et al.
Published: (2024)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)
by: Lavoie, Samuel, et al.
Published: (2024)
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
by: Bitton-Guetta, Nitzan, et al.
Published: (2024)
by: Bitton-Guetta, Nitzan, et al.
Published: (2024)
Texture or Semantics? Vision-Language Models Get Lost in Font Recognition
by: Li, Zhecheng, et al.
Published: (2025)
by: Li, Zhecheng, et al.
Published: (2025)
Causal Graphical Models for Vision-Language Compositional Understanding
by: Parascandolo, Fiorenzo, et al.
Published: (2024)
by: Parascandolo, Fiorenzo, et al.
Published: (2024)
Do Vision-Language Models Really Understand Visual Language?
by: Hou, Yifan, et al.
Published: (2024)
by: Hou, Yifan, et al.
Published: (2024)
Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning
by: Fuller, Harrison, et al.
Published: (2025)
by: Fuller, Harrison, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
by: Wang, Xintong, et al.
Published: (2024)
by: Wang, Xintong, et al.
Published: (2024)
Conflict Adaptation in Vision-Language Models
by: Hu, Xiaoyang
Published: (2025)
by: Hu, Xiaoyang
Published: (2025)
Vision Language Models are Confused Tourists
by: Irawan, Patrick Amadeus, et al.
Published: (2025)
by: Irawan, Patrick Amadeus, et al.
Published: (2025)
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)
by: Cai, Hengxing, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
by: Lu, Yifan, et al.
Published: (2025)
by: Lu, Yifan, et al.
Published: (2025)
Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
by: Miranda, Imanol, et al.
Published: (2026)
by: Miranda, Imanol, et al.
Published: (2026)
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
by: Wu, Shengguang, et al.
Published: (2025)
by: Wu, Shengguang, et al.
Published: (2025)
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
by: Wu, Xuyang, et al.
Published: (2024)
by: Wu, Xuyang, et al.
Published: (2024)
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
by: Patel, Maitreya, et al.
Published: (2024)
by: Patel, Maitreya, et al.
Published: (2024)
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
by: Liao, Yuan-Hong, et al.
Published: (2024)
by: Liao, Yuan-Hong, et al.
Published: (2024)
Similar Items
-
Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction
by: Ignat, Oana, et al.
Published: (2023) -
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
by: Bai, Longju, et al.
Published: (2024) -
Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost
by: Ignat, Oana, et al.
Published: (2024) -
Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
by: Natalie, Rosiana, et al.
Published: (2025) -
Natural Language Inference Improves Compositionality in Vision-Language Models
by: Cascante-Bonilla, Paola, et al.
Published: (2024)