:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gan, Ziliang, Lu, Yu, Zhang, Dong, Li, Haohan, Liu, Che, Liu, Jian, Liu, Ji, Wu, Haipang, Fu, Chaoyou, Xu, Zenglin, Zhang, Rongjunchen, Dai, Yong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2411.03314
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
by: Zhang, Chenxi, et al.
Published: (2026)

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
by: Liu, Che, et al.
Published: (2025)

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
by: Liu, Yuansen, et al.
Published: (2025)

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
by: Jiang, Dongzhi, et al.
Published: (2025)

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
by: Fu, Chaoyou, et al.
Published: (2023)

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
by: Fu, Chaoyou, et al.
Published: (2024)

BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
by: Lu, Guilong, et al.
Published: (2025)

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
by: Fu, Chaoyou, et al.
Published: (2026)

CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
by: Zhu, Wenqiao, et al.
Published: (2025)

FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)

MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity
by: Zhang, Kaiyuan, et al.
Published: (2025)

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
by: Luo, Junyu, et al.
Published: (2025)

FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
by: Tang, Zichen, et al.
Published: (2025)

MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
by: Ruan, Jiacheng, et al.
Published: (2025)

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
by: Fu, Chaoyou, et al.
Published: (2024)

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)

MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark
by: Yi, Dongyi, et al.
Published: (2025)

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
by: Zhang, Yi-Fan, et al.
Published: (2024)

A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)

Baichuan4-Finance Technical Report
by: Zhang, Hanyu, et al.
Published: (2024)

Understanding LLM Reasoning for Abstractive Summarization
by: Yuan, Haohan, et al.
Published: (2025)

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
by: Yue, Xiang, et al.
Published: (2023)

BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment
by: Guo, Xin, et al.
Published: (2026)

TRIDENT: Benchmarking LLM Safety in Finance, Medicine, and Law
by: Hui, Zheng, et al.
Published: (2025)

Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning
by: Yang, Bohao, et al.
Published: (2025)

BizBench: A Quantitative Reasoning Benchmark for Business and Finance
by: Koncel-Kedziorski, Rik, et al.
Published: (2023)

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
by: Guo, Ziyu, et al.
Published: (2025)

Revolutionizing Finance with LLMs: An Overview of Applications and Insights
by: Zhao, Huaqin, et al.
Published: (2024)

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)

MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
by: Zhang, Fan, et al.
Published: (2025)

Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
by: Dong, Haoyu, et al.
Published: (2025)

PersonaVLM: Long-Term Personalized Multimodal LLMs
by: Nie, Chang, et al.
Published: (2026)

Ebisu: Benchmarking Large Language Models in Japanese Finance
by: Peng, Xueqing, et al.
Published: (2026)

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)

MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues
by: Xue, Liang, et al.
Published: (2025)

Understanding and Mitigating Network Latency Effect on Teleoperated-Robot with Extended Reality
by: Zhang, Ziliang, et al.
Published: (2025)

PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data
by: Xiong, Kai, et al.
Published: (2025)