Saved in:
| Main Authors: | Burapacheep, Jirayu, Gaur, Ishan, Bhatia, Agam, Thrush, Tristan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.04492 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Nearest Neighbor Normalization Improves Multimodal Retrieval
by: Chowdhury, Neil, et al.
Published: (2024)
by: Chowdhury, Neil, et al.
Published: (2024)
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
by: Samin, Ahnaf Mozib, et al.
Published: (2024)
by: Samin, Ahnaf Mozib, et al.
Published: (2024)
ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026)
by: Ruan, Chenxi, et al.
Published: (2026)
ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
by: Ling, Zijian, et al.
Published: (2025)
by: Ling, Zijian, et al.
Published: (2025)
ARGS: Alignment as Reward-Guided Search
by: Khanov, Maxim, et al.
Published: (2024)
by: Khanov, Maxim, et al.
Published: (2024)
ERIT Lightweight Multimodal Dataset for Elderly Emotion Recognition and Multimodal Fusion Evaluation
by: Frieske, Rita, et al.
Published: (2024)
by: Frieske, Rita, et al.
Published: (2024)
VideoConviction: A Multimodal Benchmark for Human Conviction and Stock Market Recommendations
by: Galarnyk, Michael, et al.
Published: (2025)
by: Galarnyk, Michael, et al.
Published: (2025)
Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations
by: Moll, Johannes, et al.
Published: (2025)
by: Moll, Johannes, et al.
Published: (2025)
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)
by: Mahanta, Cristina, et al.
Published: (2025)
The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models
by: Paruchuri, Akshay, et al.
Published: (2026)
by: Paruchuri, Akshay, et al.
Published: (2026)
Hummus: A Dataset of Humorous Multimodal Metaphor Use
by: Tong, Xiaoyu, et al.
Published: (2025)
by: Tong, Xiaoyu, et al.
Published: (2025)
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
by: Bhatia, Gagan, et al.
Published: (2024)
by: Bhatia, Gagan, et al.
Published: (2024)
FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues
by: Li, Shuang, et al.
Published: (2024)
by: Li, Shuang, et al.
Published: (2024)
Beyond Words: Multimodal LLM Knows When to Speak
by: Liao, Zikai, et al.
Published: (2025)
by: Liao, Zikai, et al.
Published: (2025)
What Color Scheme is More Effective in Assisting Readers to Locate Information in a Color-Coded Article?
by: Ng, Ho Yin, et al.
Published: (2024)
by: Ng, Ho Yin, et al.
Published: (2024)
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025)
by: Liang, Yijun, et al.
Published: (2025)
A Grounded Typology of Word Classes
by: Haley, Coleman, et al.
Published: (2024)
by: Haley, Coleman, et al.
Published: (2024)
A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs
by: Broomfield, Julius, et al.
Published: (2025)
by: Broomfield, Julius, et al.
Published: (2025)
Multimodal Human-AI Synergy for Medical Imaging Quality Control: A Hybrid Intelligence Framework with Adaptive Dataset Curation and Closed-Loop Evaluation
by: Qin, Zhi, et al.
Published: (2025)
by: Qin, Zhi, et al.
Published: (2025)
Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations
by: Widhoelzl, Hanna-Sophia, et al.
Published: (2024)
by: Widhoelzl, Hanna-Sophia, et al.
Published: (2024)
Towards Patronizing and Condescending Language in Chinese Videos: A Multimodal Dataset and Detector
by: Wang, Hongbo, et al.
Published: (2024)
by: Wang, Hongbo, et al.
Published: (2024)
Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
by: Hayashi, Kazuki, et al.
Published: (2025)
by: Hayashi, Kazuki, et al.
Published: (2025)
ProMQA-Assembly: Multimodal Procedural QA Dataset on Assembly
by: Hasegawa, Kimihiro, et al.
Published: (2025)
by: Hasegawa, Kimihiro, et al.
Published: (2025)
WordVIS: A Color Worth A Thousand Words
by: Khan, Umar, et al.
Published: (2024)
by: Khan, Umar, et al.
Published: (2024)
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
by: Yun, Sukmin, et al.
Published: (2024)
by: Yun, Sukmin, et al.
Published: (2024)
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
by: Ding, Yihao, et al.
Published: (2024)
by: Ding, Yihao, et al.
Published: (2024)
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
by: Sood, Ekta, et al.
Published: (2021)
by: Sood, Ekta, et al.
Published: (2021)
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
by: Bhatia, Mehar, et al.
Published: (2024)
by: Bhatia, Mehar, et al.
Published: (2024)
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
LLaVA-Critic: Learning to Evaluate Multimodal Models
by: Xiong, Tianyi, et al.
Published: (2024)
by: Xiong, Tianyi, et al.
Published: (2024)
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
by: Cheng, Zhili, et al.
Published: (2025)
by: Cheng, Zhili, et al.
Published: (2025)
Control Color: Multimodal Diffusion-based Interactive Image Colorization
by: Liang, Zhexin, et al.
Published: (2024)
by: Liang, Zhexin, et al.
Published: (2024)
Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation
by: Wang, Xintong, et al.
Published: (2025)
by: Wang, Xintong, et al.
Published: (2025)
From Instructions to Assistance: a Dataset Aligning Instruction Manuals with Assembly Videos for Evaluating Multimodal LLMs
by: Toschi, Federico, et al.
Published: (2026)
by: Toschi, Federico, et al.
Published: (2026)
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
by: Pennec, Galann, et al.
Published: (2025)
by: Pennec, Galann, et al.
Published: (2025)
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
by: Song, Yingjin, et al.
Published: (2025)
by: Song, Yingjin, et al.
Published: (2025)
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
by: Garg, Roopal, et al.
Published: (2024)
by: Garg, Roopal, et al.
Published: (2024)
Checkup2Action: A Multimodal Clinical Check-up Report Dataset for Patient-Oriented Action Card Generation
by: Xiang, Sike, et al.
Published: (2026)
by: Xiang, Sike, et al.
Published: (2026)
VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites
by: Islam, Md. Adnanul, et al.
Published: (2025)
by: Islam, Md. Adnanul, et al.
Published: (2025)
Similar Items
-
Nearest Neighbor Normalization Improves Multimodal Retrieval
by: Chowdhury, Neil, et al.
Published: (2024) -
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
by: Samin, Ahnaf Mozib, et al.
Published: (2024) -
ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026) -
ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
by: Ling, Zijian, et al.
Published: (2025) -
ARGS: Alignment as Reward-Guided Search
by: Khanov, Maxim, et al.
Published: (2024)