Saved in:
| Main Authors: | Cai, Huanqia, Yang, Yijun, Hu, Winston |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.00698 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
by: Zong, Yi, et al.
Published: (2024)
by: Zong, Yi, et al.
Published: (2024)
EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness
by: Sun, Yueru, et al.
Published: (2026)
by: Sun, Yueru, et al.
Published: (2026)
GLaMM: Pixel Grounding Large Multimodal Model
by: Rasheed, Hanoona, et al.
Published: (2023)
by: Rasheed, Hanoona, et al.
Published: (2023)
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)
by: Li, Shilong, et al.
Published: (2025)
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
by: Du, Lingxiao, et al.
Published: (2025)
by: Du, Lingxiao, et al.
Published: (2025)
MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
by: Guo, Feng, et al.
Published: (2026)
by: Guo, Feng, et al.
Published: (2026)
Mind's Eye: A Benchmark of Visual Abstraction, Transformation and Composition for Multimodal LLMs
by: Sinha, Rohit, et al.
Published: (2026)
by: Sinha, Rohit, et al.
Published: (2026)
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2024)
by: Yu, Weihao, et al.
Published: (2024)
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
by: Li, Jingyao, et al.
Published: (2025)
by: Li, Jingyao, et al.
Published: (2025)
A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)
by: Jiang, Siyang, et al.
Published: (2025)
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
by: Li, Tianle, et al.
Published: (2025)
by: Li, Tianle, et al.
Published: (2025)
System-2 Mathematical Reasoning via Enriched Instruction Tuning
by: Cai, Huanqia, et al.
Published: (2024)
by: Cai, Huanqia, et al.
Published: (2024)
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
by: He, Zheqi, et al.
Published: (2025)
by: He, Zheqi, et al.
Published: (2025)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)
by: Srivastava, Varun, et al.
Published: (2025)
MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline
by: Yao, Huanjin, et al.
Published: (2026)
by: Yao, Huanjin, et al.
Published: (2026)
Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Discern Causal Links Across Modalities
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding
by: Faure, Gueter Josmy, et al.
Published: (2026)
by: Faure, Gueter Josmy, et al.
Published: (2026)
MM-Telco: Benchmarks and Multimodal Large Language Models for Telecom Applications
by: Kumar, Anshul, et al.
Published: (2025)
by: Kumar, Anshul, et al.
Published: (2025)
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
by: Li, Yan, et al.
Published: (2026)
by: Li, Yan, et al.
Published: (2026)
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
by: Yu, Weihao, et al.
Published: (2023)
by: Yu, Weihao, et al.
Published: (2023)
MM-UNet: Morph Mamba U-shaped Convolutional Networks for Retinal Vessel Segmentation
by: Liu, Jiawen, et al.
Published: (2025)
by: Liu, Jiawen, et al.
Published: (2025)
MM-ISTS: Cooperating Irregularly Sampled Time Series Forecasting with Multimodal Vision-Text LLMs
by: Lei, Zhi, et al.
Published: (2026)
by: Lei, Zhi, et al.
Published: (2026)
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
by: Yang, Zijiang, et al.
Published: (2023)
by: Yang, Zijiang, et al.
Published: (2023)
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
by: Pyo, Jiyoon, et al.
Published: (2025)
by: Pyo, Jiyoon, et al.
Published: (2025)
MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)
by: Yan, Bei, et al.
Published: (2024)
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising
by: Chaubey, Ashutosh, et al.
Published: (2024)
by: Chaubey, Ashutosh, et al.
Published: (2024)
EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
by: Hu, He, et al.
Published: (2026)
by: Hu, He, et al.
Published: (2026)
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
by: Xue, Le, et al.
Published: (2024)
by: Xue, Le, et al.
Published: (2024)
Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
by: Wang, Kangkang, et al.
Published: (2026)
by: Wang, Kangkang, et al.
Published: (2026)
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
by: Cai, Jie, et al.
Published: (2025)
by: Cai, Jie, et al.
Published: (2025)
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
by: Fu, Ling, et al.
Published: (2024)
by: Fu, Ling, et al.
Published: (2024)
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
by: Xu, Mingjun, et al.
Published: (2025)
by: Xu, Mingjun, et al.
Published: (2025)
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
by: Yeo, Woongyeong, et al.
Published: (2025)
by: Yeo, Woongyeong, et al.
Published: (2025)
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
Similar Items
-
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
by: Zhang, Jusheng, et al.
Published: (2025) -
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation
by: Zong, Yi, et al.
Published: (2024) -
EmoMM: Benchmarking and Steering MLLM for Multimodal Emotion Recognition under Conflict and Missingness
by: Sun, Yueru, et al.
Published: (2026) -
GLaMM: Pixel Grounding Large Multimodal Model
by: Rasheed, Hanoona, et al.
Published: (2023) -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)