Saved in:
| Main Authors: | Gan, Ziliang, Lu, Yu, Zhang, Dong, Li, Haohan, Liu, Che, Liu, Jian, Liu, Ji, Wu, Haipang, Fu, Chaoyou, Xu, Zenglin, Zhang, Rongjunchen, Dai, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.03314 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
by: Zhang, Chenxi, et al.
Published: (2026)
by: Zhang, Chenxi, et al.
Published: (2026)
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025)
by: Xie, Wulin, et al.
Published: (2025)
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025)
by: Yuan, Jiakang, et al.
Published: (2025)
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)
by: Cai, Yuxuan, et al.
Published: (2025)
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
by: Liu, Che, et al.
Published: (2025)
by: Liu, Che, et al.
Published: (2025)
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
by: Liu, Yuansen, et al.
Published: (2025)
by: Liu, Yuansen, et al.
Published: (2025)
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
by: Jiang, Dongzhi, et al.
Published: (2025)
by: Jiang, Dongzhi, et al.
Published: (2025)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
by: Fu, Chaoyou, et al.
Published: (2023)
by: Fu, Chaoyou, et al.
Published: (2023)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
by: Lu, Guilong, et al.
Published: (2025)
by: Lu, Guilong, et al.
Published: (2025)
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
by: Fu, Chaoyou, et al.
Published: (2026)
by: Fu, Chaoyou, et al.
Published: (2026)
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
by: Zhu, Wenqiao, et al.
Published: (2025)
by: Zhu, Wenqiao, et al.
Published: (2025)
FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)
by: Zhao, Yilun, et al.
Published: (2023)
MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity
by: Zhang, Kaiyuan, et al.
Published: (2025)
by: Zhang, Kaiyuan, et al.
Published: (2025)
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
by: Luo, Junyu, et al.
Published: (2025)
by: Luo, Junyu, et al.
Published: (2025)
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging
by: Tang, Zichen, et al.
Published: (2025)
by: Tang, Zichen, et al.
Published: (2025)
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
by: Ruan, Jiacheng, et al.
Published: (2025)
by: Ruan, Jiacheng, et al.
Published: (2025)
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
by: Shi, Yang, et al.
Published: (2025)
by: Shi, Yang, et al.
Published: (2025)
MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark
by: Yi, Dongyi, et al.
Published: (2025)
by: Yi, Dongyi, et al.
Published: (2025)
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
by: Zhang, Yi-Fan, et al.
Published: (2024)
by: Zhang, Yi-Fan, et al.
Published: (2024)
A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
Baichuan4-Finance Technical Report
by: Zhang, Hanyu, et al.
Published: (2024)
by: Zhang, Hanyu, et al.
Published: (2024)
Understanding LLM Reasoning for Abstractive Summarization
by: Yuan, Haohan, et al.
Published: (2025)
by: Yuan, Haohan, et al.
Published: (2025)
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
by: Yue, Xiang, et al.
Published: (2023)
by: Yue, Xiang, et al.
Published: (2023)
BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment
by: Guo, Xin, et al.
Published: (2026)
by: Guo, Xin, et al.
Published: (2026)
TRIDENT: Benchmarking LLM Safety in Finance, Medicine, and Law
by: Hui, Zheng, et al.
Published: (2025)
by: Hui, Zheng, et al.
Published: (2025)
Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning
by: Yang, Bohao, et al.
Published: (2025)
by: Yang, Bohao, et al.
Published: (2025)
BizBench: A Quantitative Reasoning Benchmark for Business and Finance
by: Koncel-Kedziorski, Rik, et al.
Published: (2023)
by: Koncel-Kedziorski, Rik, et al.
Published: (2023)
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
by: Guo, Ziyu, et al.
Published: (2025)
by: Guo, Ziyu, et al.
Published: (2025)
Revolutionizing Finance with LLMs: An Overview of Applications and Insights
by: Zhao, Huaqin, et al.
Published: (2024)
by: Zhao, Huaqin, et al.
Published: (2024)
HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning
by: Wei, Yanbin, et al.
Published: (2026)
by: Wei, Yanbin, et al.
Published: (2026)
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
by: Zhang, Fan, et al.
Published: (2025)
by: Zhang, Fan, et al.
Published: (2025)
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
by: Dong, Haoyu, et al.
Published: (2025)
by: Dong, Haoyu, et al.
Published: (2025)
PersonaVLM: Long-Term Personalized Multimodal LLMs
by: Nie, Chang, et al.
Published: (2026)
by: Nie, Chang, et al.
Published: (2026)
Ebisu: Benchmarking Large Language Models in Japanese Finance
by: Peng, Xueqing, et al.
Published: (2026)
by: Peng, Xueqing, et al.
Published: (2026)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
MME-RAG: Multi-Manager-Expert Retrieval-Augmented Generation for Fine-Grained Entity Recognition in Task-Oriented Dialogues
by: Xue, Liang, et al.
Published: (2025)
by: Xue, Liang, et al.
Published: (2025)
Understanding and Mitigating Network Latency Effect on Teleoperated-Robot with Extended Reality
by: Zhang, Ziliang, et al.
Published: (2025)
by: Zhang, Ziliang, et al.
Published: (2025)
PuzzleClone: A DSL-Powered Framework for Synthesizing Verifiable Data
by: Xiong, Kai, et al.
Published: (2025)
by: Xiong, Kai, et al.
Published: (2025)
Similar Items
-
FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
by: Zhang, Chenxi, et al.
Published: (2026) -
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
by: Xie, Wulin, et al.
Published: (2025) -
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
by: Yuan, Jiakang, et al.
Published: (2025) -
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025) -
Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
by: Liu, Che, et al.
Published: (2025)