:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cai, Huanqia, Yang, Yijun, Hu, Winston
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.00698
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)

GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
by: Zong, Yi, et al.
Published: (2024)

EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness
by: Sun, Yueru, et al.
Published: (2026)

GLaMM: Pixel Grounding Large Multimodal Model
by: Rasheed, Hanoona, et al.
Published: (2023)

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
by: Du, Lingxiao, et al.
Published: (2025)

MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
by: Guo, Feng, et al.
Published: (2026)

Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
by: Sinha, Rohit, et al.
Published: (2026)

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2024)

CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
by: Li, Jingyao, et al.
Published: (2025)

A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
by: Li, Tianle, et al.
Published: (2025)

System-2 Mathematical Reasoning via Enriched Instruction Tuning
by: Cai, Huanqia, et al.
Published: (2024)

FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
by: He, Zheqi, et al.
Published: (2025)

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)

MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline
by: Yao, Huanjin, et al.
Published: (2026)

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Discern Causal Links Across Modalities
by: Li, Zhiyuan, et al.
Published: (2024)

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
by: Faure, Gueter Josmy, et al.
Published: (2026)

MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
by: Kumar, Anshul, et al.
Published: (2025)

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
by: Li, Yan, et al.
Published: (2026)

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)

MM-UNet: Morph Mamba U-shaped Convolutional Networks for Retinal Vessel Segmentation
by: Liu, Jiawen, et al.
Published: (2025)

MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)

MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
by: Yang, Zijiang, et al.
Published: (2023)

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
by: Pyo, Jiyoon, et al.
Published: (2025)

MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)

ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
by: Chaubey, Ashutosh, et al.
Published: (2024)

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
by: Hu, He, et al.
Published: (2026)

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
by: Xue, Le, et al.
Published: (2024)

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration
by: Zhou, Yue, et al.
Published: (2025)

MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
by: Wang, Kangkang, et al.
Published: (2026)

CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
by: Cai, Jie, et al.
Published: (2025)

Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
by: Zhang, Jiaming, et al.
Published: (2025)

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
by: Zhou, Pengfei, et al.
Published: (2025)

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
by: Fu, Ling, et al.
Published: (2024)

MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)

Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
by: Yeo, Woongyeong, et al.
Published: (2025)

A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
by: Liu, Xiang, et al.
Published: (2025)