Saved in:
| Main Authors: | Abdullah, Hasnat Md, Liu, Tian, Wei, Kangda, Kong, Shu, Huang, Ruihong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.01180 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
by: Wei, Kangda, et al.
Published: (2025)
by: Wei, Kangda, et al.
Published: (2025)
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
by: Borah, Abhilekh, et al.
Published: (2025)
by: Borah, Abhilekh, et al.
Published: (2025)
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
by: Wei, Kangda, et al.
Published: (2025)
by: Wei, Kangda, et al.
Published: (2025)
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
by: Nie, Yiqi, et al.
Published: (2026)
by: Nie, Yiqi, et al.
Published: (2026)
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
by: Qin, Lixiong, et al.
Published: (2025)
by: Qin, Lixiong, et al.
Published: (2025)
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
by: Liu, Xuannan, et al.
Published: (2024)
by: Liu, Xuannan, et al.
Published: (2024)
MedRepBench: A Comprehensive Benchmark for Medical Report Interpretation
by: Shang, Fangxin, et al.
Published: (2025)
by: Shang, Fangxin, et al.
Published: (2025)
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
by: Liu, Xuannan, et al.
Published: (2025)
by: Liu, Xuannan, et al.
Published: (2025)
Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone
by: Malik, Shaivi, et al.
Published: (2025)
by: Malik, Shaivi, et al.
Published: (2025)
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
by: Yang, Minglai, et al.
Published: (2026)
by: Yang, Minglai, et al.
Published: (2026)
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
by: Chen, Lei, et al.
Published: (2024)
by: Chen, Lei, et al.
Published: (2024)
DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
by: Wang, Xinran, et al.
Published: (2026)
by: Wang, Xinran, et al.
Published: (2026)
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
EMemBench: Interactive Benchmarking of Episodic Memory for VLM Agents
by: Li, Xinze, et al.
Published: (2026)
by: Li, Xinze, et al.
Published: (2026)
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
by: Ma, David, et al.
Published: (2025)
by: Ma, David, et al.
Published: (2025)
SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning
by: Xu, Mengya, et al.
Published: (2025)
by: Xu, Mengya, et al.
Published: (2025)
HomeSafeBench: A Benchmark for Embodied Vision-Language Models in Free-Exploration Home Safety Inspection
by: Gao, Siyuan, et al.
Published: (2025)
by: Gao, Siyuan, et al.
Published: (2025)
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
by: Yuan, Shenghai, et al.
Published: (2024)
by: Yuan, Shenghai, et al.
Published: (2024)
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
by: Chen, Kaijie, et al.
Published: (2025)
by: Chen, Kaijie, et al.
Published: (2025)
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
by: Ghaboura, Sara, et al.
Published: (2024)
by: Ghaboura, Sara, et al.
Published: (2024)
ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
by: Ruan, Chenxi, et al.
Published: (2026)
by: Ruan, Chenxi, et al.
Published: (2026)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
by: Ma, Yubo, et al.
Published: (2024)
by: Ma, Yubo, et al.
Published: (2024)
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
by: Faure, Gueter Josmy, et al.
Published: (2026)
by: Faure, Gueter Josmy, et al.
Published: (2026)
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
by: Fang, I-Sheng, et al.
Published: (2025)
by: Fang, I-Sheng, et al.
Published: (2025)
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
by: Yang, Sihan, et al.
Published: (2025)
by: Yang, Sihan, et al.
Published: (2025)
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
by: Li, Siqi, et al.
Published: (2025)
by: Li, Siqi, et al.
Published: (2025)
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
by: Kil, Jihyung, et al.
Published: (2024)
by: Kil, Jihyung, et al.
Published: (2024)
Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese
by: Inoue, Yuichi, et al.
Published: (2024)
by: Inoue, Yuichi, et al.
Published: (2024)
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
by: Zhang, Chenkai, et al.
Published: (2025)
by: Zhang, Chenkai, et al.
Published: (2025)
VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites
by: Islam, Md. Adnanul, et al.
Published: (2025)
by: Islam, Md. Adnanul, et al.
Published: (2025)
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding
by: Yu, Haorui, et al.
Published: (2026)
by: Yu, Haorui, et al.
Published: (2026)
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
by: Zhou, Li, et al.
Published: (2025)
by: Zhou, Li, et al.
Published: (2025)
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly
by: Wang, Zhaowei, et al.
Published: (2025)
by: Wang, Zhaowei, et al.
Published: (2025)
Similar Items
-
Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs
by: Wei, Kangda, et al.
Published: (2025) -
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)
by: Borah, Abhilekh, et al.
Published: (2025) -
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
by: Wei, Kangda, et al.
Published: (2025) -
MER-Bench: A Comprehensive Benchmark for Multimodal Meme Reappraisal
by: Nie, Yiqi, et al.
Published: (2026) -
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
by: Wang, Fei, et al.
Published: (2024)