:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Yuan, Duan, Haodong, Zhang, Yuanhan, Li, Bo, Zhang, Songyang, Zhao, Wangbo, Yuan, Yike, Wang, Jiaqi, He, Conghui, Liu, Ziwei, Chen, Kai, Lin, Dahua
Format:	Preprint
Published:	2023
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2307.06281
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
by: Wang, Chonghua, et al.
Published: (2024)

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025)

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
by: Zhuo, Jingming, et al.
Published: (2024)

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
by: Liu, Ziyu, et al.
Published: (2024)

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
by: Cao, Maosong, et al.
Published: (2024)

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
by: Peng, Haosong, et al.
Published: (2026)

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
by: Liu, Hongwei, et al.
Published: (2024)

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
by: Zhang, Kaichen, et al.
Published: (2024)

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
by: Zhang, Chuyu, et al.
Published: (2024)

NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities
by: Li, Mo, et al.
Published: (2024)

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)

Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
by: Cao, Yuhang, et al.
Published: (2024)

Is Your Driving World Model an All-Around Player?
by: Kong, Lingdong, et al.
Published: (2026)

Think Visually, Reason Textually: Vision-Language Synergy in ARC
by: Zhang, Beichen, et al.
Published: (2025)

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
by: Zang, Yuhang, et al.
Published: (2025)

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
by: Hu, Kairui, et al.
Published: (2025)

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
by: Huang, Qidong, et al.
Published: (2023)

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
by: Cao, Maosong, et al.
Published: (2025)

Otter: A Multi-Modal Model with In-Context Instruction Tuning
by: Li, Bo, et al.
Published: (2023)

LongWanjuan: Towards Systematic Measurement for Long Text Quality
by: Lv, Kai, et al.
Published: (2024)

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
by: Zhang, Yuanhan, et al.
Published: (2025)

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
by: Zhang, Beichen, et al.
Published: (2025)

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)

LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
by: Dong, Xiaoyi, et al.
Published: (2024)

Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
by: Zhu, Runchuan, et al.
Published: (2024)

Visual-RFT: Visual Reinforcement Fine-Tuning
by: Liu, Ziyu, et al.
Published: (2025)

PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus
by: Gao, Junyuan, et al.
Published: (2025)

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
by: Li, Wei, et al.
Published: (2024)

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
by: Dong, Xiaoyi, et al.
Published: (2024)

Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
by: Liu, Yuhong, et al.
Published: (2025)

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
by: Hua, Zhouqi, et al.
Published: (2025)

SPARK: Synergistic Policy And Reward Co-Evolving Framework
by: Liu, Ziyu, et al.
Published: (2025)

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
by: Chen, Pengcheng, et al.
Published: (2024)