Saved in:
| Main Authors: | Zhao, Yi, Zhang, Yilin, Xiang, Rong, Li, Jing, Li, Hillming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01735 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Visually Impaired Assistance Benchmark for VLM-as-a-Judge Evaluation
by: Zhao, Yi, et al.
Published: (2026)
by: Zhao, Yi, et al.
Published: (2026)
A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
Large Multimodal Agents: A Survey
by: Xie, Junlin, et al.
Published: (2024)
by: Xie, Junlin, et al.
Published: (2024)
A Survey on Agentic Multimodal Large Language Models
by: Yao, Huanjin, et al.
Published: (2025)
by: Yao, Huanjin, et al.
Published: (2025)
An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)
by: Ma, Teli, et al.
Published: (2023)
CoViPAL: Layer-wise Contextualized Visual Token Pruning for Large Vision-Language Models
by: Tang, Zicong, et al.
Published: (2025)
by: Tang, Zicong, et al.
Published: (2025)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
by: Song, Shezheng, et al.
Published: (2023)
by: Song, Shezheng, et al.
Published: (2023)
A Survey on Evaluation of Multimodal Large Language Models
by: Huang, Jiaxing, et al.
Published: (2024)
by: Huang, Jiaxing, et al.
Published: (2024)
AI for Service: Proactive Assistance with AI Glasses
by: Wen, Zichen, et al.
Published: (2025)
by: Wen, Zichen, et al.
Published: (2025)
InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models
by: Nguyen, Pha, et al.
Published: (2025)
by: Nguyen, Pha, et al.
Published: (2025)
A Survey on Multimodal Large Language Models
by: Yin, Shukang, et al.
Published: (2023)
by: Yin, Shukang, et al.
Published: (2023)
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
by: Gao, Silin, et al.
Published: (2025)
by: Gao, Silin, et al.
Published: (2025)
Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)
by: Lee, Yi-Lun, et al.
Published: (2024)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
by: Fu, Chaoyou, et al.
Published: (2024)
by: Fu, Chaoyou, et al.
Published: (2024)
OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model
by: Chen, Qiguang, et al.
Published: (2026)
by: Chen, Qiguang, et al.
Published: (2026)
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
by: Liu, Ziqiang, et al.
Published: (2024)
by: Liu, Ziqiang, et al.
Published: (2024)
Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education
by: Wang, Junling, et al.
Published: (2026)
by: Wang, Junling, et al.
Published: (2026)
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
by: Hong, Jixiang, et al.
Published: (2025)
by: Hong, Jixiang, et al.
Published: (2025)
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
A Survey of Multimodal Large Language Model from A Data-centric Perspective
by: Bai, Tianyi, et al.
Published: (2024)
by: Bai, Tianyi, et al.
Published: (2024)
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
by: Jiang, Dongzhi, et al.
Published: (2025)
by: Jiang, Dongzhi, et al.
Published: (2025)
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
by: Zhang, YiFan, et al.
Published: (2024)
by: Zhang, YiFan, et al.
Published: (2024)
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
by: Liu, Xiao, et al.
Published: (2024)
by: Liu, Xiao, et al.
Published: (2024)
MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models
by: Lin, Xiao, et al.
Published: (2025)
by: Lin, Xiao, et al.
Published: (2025)
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework
by: Jin, Jiandong, et al.
Published: (2024)
by: Jin, Jiandong, et al.
Published: (2024)
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
by: Jia, Mengdi, et al.
Published: (2025)
by: Jia, Mengdi, et al.
Published: (2025)
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
Robust Multimodal Large Language Models Against Modality Conflict
by: Zhang, Zongmeng, et al.
Published: (2025)
by: Zhang, Zongmeng, et al.
Published: (2025)
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
by: Zhang, Wenqiao, et al.
Published: (2024)
by: Zhang, Wenqiao, et al.
Published: (2024)
LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
by: Zhao, Zihui, et al.
Published: (2025)
by: Zhao, Zihui, et al.
Published: (2025)
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
by: Li, Zejun, et al.
Published: (2024)
by: Li, Zejun, et al.
Published: (2024)
AviationLMM: A Large Multimodal Foundation Model for Civil Aviation
by: Li, Wenbin, et al.
Published: (2026)
by: Li, Wenbin, et al.
Published: (2026)
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
by: Zou, Chengke, et al.
Published: (2024)
by: Zou, Chengke, et al.
Published: (2024)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
by: Huang, Kung-Hsiang, et al.
Published: (2024)
by: Huang, Kung-Hsiang, et al.
Published: (2024)
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
by: Zhao, Haoyu, et al.
Published: (2024)
by: Zhao, Haoyu, et al.
Published: (2024)
A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges
by: Li, Zongxia, et al.
Published: (2025)
by: Li, Zongxia, et al.
Published: (2025)
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
by: Ma, Xingjun, et al.
Published: (2025)
by: Ma, Xingjun, et al.
Published: (2025)
Similar Items
-
A Visually Impaired Assistance Benchmark for VLM-as-a-Judge Evaluation
by: Zhao, Yi, et al.
Published: (2026) -
A Survey on Benchmarks of Multimodal Large Language Models
by: Li, Jian, et al.
Published: (2024) -
Large Multimodal Agents: A Survey
by: Xie, Junlin, et al.
Published: (2024) -
A Survey on Agentic Multimodal Large Language Models
by: Yao, Huanjin, et al.
Published: (2025) -
An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)