Saved in:
| Main Authors: | Ma, Denghao, Liu, Qing, Chen, Zulong, Xu, Chuanfei, Xu, Jia, Yang, Zhibo, Shao, Wei, Li, Zhao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.10550 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
by: Man, Zhibo, et al.
Published: (2025)
by: Man, Zhibo, et al.
Published: (2025)
CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing
by: Xu, Zhipeng, et al.
Published: (2026)
by: Xu, Zhipeng, et al.
Published: (2026)
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
by: Ji, Yifan, et al.
Published: (2026)
by: Ji, Yifan, et al.
Published: (2026)
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
by: Wang, Yan, et al.
Published: (2025)
by: Wang, Yan, et al.
Published: (2025)
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)
by: Chen, Qiguang, et al.
Published: (2024)
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
by: Li, Zhaowei, et al.
Published: (2024)
by: Li, Zhaowei, et al.
Published: (2024)
Revisiting Classification Taxonomy for Grammatical Errors
by: Zou, Deqing, et al.
Published: (2025)
by: Zou, Deqing, et al.
Published: (2025)
Multi-modal Stance Detection: New Datasets and Model
by: Liang, Bin, et al.
Published: (2024)
by: Liang, Bin, et al.
Published: (2024)
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning
by: Jiang, Zhihuan, et al.
Published: (2024)
by: Jiang, Zhihuan, et al.
Published: (2024)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
by: Wu, Siwei, et al.
Published: (2024)
by: Wu, Siwei, et al.
Published: (2024)
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization
by: Ye, Yangfan, et al.
Published: (2024)
by: Ye, Yangfan, et al.
Published: (2024)
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
by: He, Zheqi, et al.
Published: (2024)
by: He, Zheqi, et al.
Published: (2024)
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
by: Li, Zhong-Zhi, et al.
Published: (2024)
by: Li, Zhong-Zhi, et al.
Published: (2024)
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
by: Jiang, Yue, et al.
Published: (2025)
by: Jiang, Yue, et al.
Published: (2025)
CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation
by: Li, Haitao, et al.
Published: (2025)
by: Li, Haitao, et al.
Published: (2025)
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
by: Yu, Shi, et al.
Published: (2024)
by: Yu, Shi, et al.
Published: (2024)
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
by: Zhang, Kaichen, et al.
Published: (2024)
by: Zhang, Kaichen, et al.
Published: (2024)
Instruct-Imagen: Image Generation with Multi-modal Instruction
by: Hu, Hexiang, et al.
Published: (2024)
by: Hu, Hexiang, et al.
Published: (2024)
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
by: Qin, Lixiong, et al.
Published: (2025)
by: Qin, Lixiong, et al.
Published: (2025)
A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges
by: Shen, Huangjun, et al.
Published: (2024)
by: Shen, Huangjun, et al.
Published: (2024)
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
by: Zhao, Shitian, et al.
Published: (2024)
by: Zhao, Shitian, et al.
Published: (2024)
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits
by: He, Xiangheng, et al.
Published: (2023)
by: He, Xiangheng, et al.
Published: (2023)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents
by: Wu, Shiwei, et al.
Published: (2024)
by: Wu, Shiwei, et al.
Published: (2024)
MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction
by: Dai, Xiang, et al.
Published: (2024)
by: Dai, Xiang, et al.
Published: (2024)
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
by: Wang, Yueqian, et al.
Published: (2024)
by: Wang, Yueqian, et al.
Published: (2024)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines
by: Ma, Zi-Ao, et al.
Published: (2024)
by: Ma, Zi-Ao, et al.
Published: (2024)
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
by: Madaan, Divyam, et al.
Published: (2025)
by: Madaan, Divyam, et al.
Published: (2025)
Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification
by: Chen, Huiyao, et al.
Published: (2024)
by: Chen, Huiyao, et al.
Published: (2024)
FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis
by: Yang, Songhua, et al.
Published: (2024)
by: Yang, Songhua, et al.
Published: (2024)
MMMOS: Multi-domain Multi-axis Audio Quality Assessment
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
by: Li, Chuhan, et al.
Published: (2024)
by: Li, Chuhan, et al.
Published: (2024)
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
by: Jiang, Dongzhi, et al.
Published: (2024)
by: Jiang, Dongzhi, et al.
Published: (2024)
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following
by: He, Yun, et al.
Published: (2024)
by: He, Yun, et al.
Published: (2024)
Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion
by: Liu, Bin, et al.
Published: (2025)
by: Liu, Bin, et al.
Published: (2025)
Visual Language Tracking with Multi-modal Interaction: A Robust Benchmark
by: Li, Xuchen, et al.
Published: (2024)
by: Li, Xuchen, et al.
Published: (2024)
Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning
by: Yang, Juncheng, et al.
Published: (2024)
by: Yang, Juncheng, et al.
Published: (2024)
An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue
by: Inoue, Koji, et al.
Published: (2025)
by: Inoue, Koji, et al.
Published: (2025)
MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection
by: Lu, Weihai, et al.
Published: (2026)
by: Lu, Weihai, et al.
Published: (2026)
Similar Items
-
DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
by: Man, Zhibo, et al.
Published: (2025) -
CC-OCR V2: Benchmarking Large Multimodal Models for Literacy in Real-world Document Processing
by: Xu, Zhipeng, et al.
Published: (2026) -
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
by: Ji, Yifan, et al.
Published: (2026) -
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
by: Wang, Yan, et al.
Published: (2025) -
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
by: Chen, Qiguang, et al.
Published: (2024)