Saved in:
| Main Authors: | Liu, Yuan, Duan, Haodong, Zhang, Yuanhan, Li, Bo, Zhang, Songyang, Zhao, Wangbo, Yuan, Yike, Wang, Jiaqi, He, Conghui, Liu, Ziwei, Chen, Kai, Lin, Dahua |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.06281 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)
by: Fang, Xinyu, et al.
Published: (2024)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
by: Wang, Chonghua, et al.
Published: (2024)
by: Wang, Chonghua, et al.
Published: (2024)
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025)
by: Fang, Xinyu, et al.
Published: (2025)
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
by: Zhuo, Jingming, et al.
Published: (2024)
by: Zhuo, Jingming, et al.
Published: (2024)
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)
by: Qiao, Yuxuan, et al.
Published: (2024)
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
by: Liu, Ziyu, et al.
Published: (2024)
by: Liu, Ziyu, et al.
Published: (2024)
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
by: Cao, Maosong, et al.
Published: (2024)
by: Cao, Maosong, et al.
Published: (2024)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)
by: Wang, Xuehui, et al.
Published: (2025)
SpatialBench: Is Your Spatial Foundation Model an All-Round Player?
by: Peng, Haosong, et al.
Published: (2026)
by: Peng, Haosong, et al.
Published: (2026)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
by: Liu, Hongwei, et al.
Published: (2024)
by: Liu, Hongwei, et al.
Published: (2024)
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
by: Zhang, Kaichen, et al.
Published: (2024)
by: Zhang, Kaichen, et al.
Published: (2024)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
by: Zhang, Chuyu, et al.
Published: (2024)
by: Zhang, Chuyu, et al.
Published: (2024)
NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities
by: Li, Mo, et al.
Published: (2024)
by: Li, Mo, et al.
Published: (2024)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
by: Li, Rongjie, et al.
Published: (2024)
by: Li, Rongjie, et al.
Published: (2024)
Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)
by: Liu, Junnan, et al.
Published: (2024)
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models
by: Cao, Yuhang, et al.
Published: (2024)
by: Cao, Yuhang, et al.
Published: (2024)
Is Your Driving World Model an All-Around Player?
by: Kong, Lingdong, et al.
Published: (2026)
by: Kong, Lingdong, et al.
Published: (2026)
Think Visually, Reason Textually: Vision-Language Synergy in ARC
by: Zhang, Beichen, et al.
Published: (2025)
by: Zhang, Beichen, et al.
Published: (2025)
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
by: Xing, Long, et al.
Published: (2024)
by: Xing, Long, et al.
Published: (2024)
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
by: Zang, Yuhang, et al.
Published: (2025)
by: Zang, Yuhang, et al.
Published: (2025)
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)
by: Zhou, Baichuan, et al.
Published: (2024)
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
by: Hu, Kairui, et al.
Published: (2025)
by: Hu, Kairui, et al.
Published: (2025)
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
by: Huang, Qidong, et al.
Published: (2023)
by: Huang, Qidong, et al.
Published: (2023)
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
by: Cao, Maosong, et al.
Published: (2025)
by: Cao, Maosong, et al.
Published: (2025)
Otter: A Multi-Modal Model with In-Context Instruction Tuning
by: Li, Bo, et al.
Published: (2023)
by: Li, Bo, et al.
Published: (2023)
LongWanjuan: Towards Systematic Measurement for Long Text Quality
by: Lv, Kai, et al.
Published: (2024)
by: Lv, Kai, et al.
Published: (2024)
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
by: Zhang, Yuanhan, et al.
Published: (2025)
by: Zhang, Yuanhan, et al.
Published: (2025)
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
by: Zhang, Beichen, et al.
Published: (2025)
by: Zhang, Beichen, et al.
Published: (2025)
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
LLaVA-Video: Video Instruction Tuning With Synthetic Data
by: Zhang, Yuanhan, et al.
Published: (2024)
by: Zhang, Yuanhan, et al.
Published: (2024)
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
by: Dong, Xiaoyi, et al.
Published: (2024)
by: Dong, Xiaoyi, et al.
Published: (2024)
Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
by: Zhu, Runchuan, et al.
Published: (2024)
by: Zhu, Runchuan, et al.
Published: (2024)
Visual-RFT: Visual Reinforcement Fine-Tuning
by: Liu, Ziyu, et al.
Published: (2025)
by: Liu, Ziyu, et al.
Published: (2025)
PM4Bench: Benchmarking Large Vision-Language Models with Parallel Multilingual Multi-Modal Multi-task Corpus
by: Gao, Junyuan, et al.
Published: (2025)
by: Gao, Junyuan, et al.
Published: (2025)
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models
by: Li, Wei, et al.
Published: (2024)
by: Li, Wei, et al.
Published: (2024)
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
by: Dong, Xiaoyi, et al.
Published: (2024)
by: Dong, Xiaoyi, et al.
Published: (2024)
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning
by: Liu, Yuhong, et al.
Published: (2025)
by: Liu, Yuhong, et al.
Published: (2025)
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
by: Hua, Zhouqi, et al.
Published: (2025)
by: Hua, Zhouqi, et al.
Published: (2025)
SPARK: Synergistic Policy And Reward Co-Evolving Framework
by: Liu, Ziyu, et al.
Published: (2025)
by: Liu, Ziyu, et al.
Published: (2025)
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
by: Chen, Pengcheng, et al.
Published: (2024)
by: Chen, Pengcheng, et al.
Published: (2024)
Similar Items
-
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024) -
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
by: Wang, Chonghua, et al.
Published: (2024) -
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
by: Fang, Xinyu, et al.
Published: (2025) -
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
by: Zhuo, Jingming, et al.
Published: (2024) -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)