Saved in:
| Main Authors: | Zhou, Haokun, Hong, Yipeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
by: Guang, Jiahui, et al.
Published: (2026)
by: Guang, Jiahui, et al.
Published: (2026)
SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration
by: Monji-Azad, Sara, et al.
Published: (2024)
by: Monji-Azad, Sara, et al.
Published: (2024)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model
by: Wang, Sibo, et al.
Published: (2024)
by: Wang, Sibo, et al.
Published: (2024)
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
by: Yuan, Botai, et al.
Published: (2025)
by: Yuan, Botai, et al.
Published: (2025)
CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026)
by: Chen, Kesheng, et al.
Published: (2026)
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
by: Bao, Han, et al.
Published: (2024)
by: Bao, Han, et al.
Published: (2024)
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
by: Qiu, Lu, et al.
Published: (2024)
by: Qiu, Lu, et al.
Published: (2024)
MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
by: Yan, Bei, et al.
Published: (2024)
by: Yan, Bei, et al.
Published: (2024)
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
by: Yu, Keunwoo Peter, et al.
Published: (2025)
by: Yu, Keunwoo Peter, et al.
Published: (2025)
LocateBench: Evaluating the Locating Ability of Vision Language Models
by: Chiang, Ting-Rui, et al.
Published: (2024)
by: Chiang, Ting-Rui, et al.
Published: (2024)
SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection
by: Lenhard, Tamara R., et al.
Published: (2024)
by: Lenhard, Tamara R., et al.
Published: (2024)
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models
by: Ukai, Mahiro, et al.
Published: (2025)
by: Ukai, Mahiro, et al.
Published: (2025)
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
by: Cai, Jie, et al.
Published: (2025)
by: Cai, Jie, et al.
Published: (2025)
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
by: Saxena, Rohit, et al.
Published: (2026)
by: Saxena, Rohit, et al.
Published: (2026)
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models
by: Wang, JiYang, et al.
Published: (2026)
by: Wang, JiYang, et al.
Published: (2026)
How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
by: Yang, Luyu, et al.
Published: (2026)
by: Yang, Luyu, et al.
Published: (2026)
SynCDR : Training Cross Domain Retrieval Models with Synthetic Data
by: Mishra, Samarth, et al.
Published: (2023)
by: Mishra, Samarth, et al.
Published: (2023)
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
by: Lozano, Alejandro, et al.
Published: (2024)
by: Lozano, Alejandro, et al.
Published: (2024)
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
by: Yasunaga, Michihiro, et al.
Published: (2025)
by: Yasunaga, Michihiro, et al.
Published: (2025)
iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework
by: Fang, Jianjie, et al.
Published: (2026)
by: Fang, Jianjie, et al.
Published: (2026)
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation
by: Lim, Hyeonseok, et al.
Published: (2024)
by: Lim, Hyeonseok, et al.
Published: (2024)
SDGBiasBench: Benchmarking and Mitigating Vision--Language Models' Biases in Sustainable Development Goals
by: Lin, Zihang, et al.
Published: (2026)
by: Lin, Zihang, et al.
Published: (2026)
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
by: Wang, Zhecan, et al.
Published: (2024)
by: Wang, Zhecan, et al.
Published: (2024)
VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing
by: Luo, Zhiming, et al.
Published: (2026)
by: Luo, Zhiming, et al.
Published: (2026)
WorldScore: A Unified Evaluation Benchmark for World Generation
by: Duan, Haoyi, et al.
Published: (2025)
by: Duan, Haoyi, et al.
Published: (2025)
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
by: Gu, Jing, et al.
Published: (2025)
by: Gu, Jing, et al.
Published: (2025)
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
by: Zhou, Yutong, et al.
Published: (2024)
by: Zhou, Yutong, et al.
Published: (2024)
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
by: Gao, Hong, et al.
Published: (2025)
by: Gao, Hong, et al.
Published: (2025)
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
by: Sajib, Rakib Hossain, et al.
Published: (2026)
by: Sajib, Rakib Hossain, et al.
Published: (2026)
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)
by: Zhu, Fengbin, et al.
Published: (2024)
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
by: Lin, Fenfen, et al.
Published: (2025)
by: Lin, Fenfen, et al.
Published: (2025)
WorldModelBench: Judging Video Generation Models As World Models
by: Li, Dacheng, et al.
Published: (2025)
by: Li, Dacheng, et al.
Published: (2025)
Diffusion Curriculum: Synthetic-to-Real Data Curriculum via Image-Guided Diffusion
by: Liang, Yijun, et al.
Published: (2024)
by: Liang, Yijun, et al.
Published: (2024)
VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation
by: Jiang, Longteng, et al.
Published: (2026)
by: Jiang, Longteng, et al.
Published: (2026)
GeoWorld-VLM: Geometry from World Models for Vision-Language Models
by: Gu, Renjie, et al.
Published: (2026)
by: Gu, Renjie, et al.
Published: (2026)
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
by: Zhao, Baining, et al.
Published: (2025)
by: Zhao, Baining, et al.
Published: (2025)
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression
by: Kundu, Souvik, et al.
Published: (2025)
by: Kundu, Souvik, et al.
Published: (2025)
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
by: Zhou, Pengfei, et al.
Published: (2025)
by: Zhou, Pengfei, et al.
Published: (2025)
SynMorph: Generating Synthetic Face Morphing Dataset with Mated Samples
by: Zhang, Haoyu, et al.
Published: (2024)
by: Zhang, Haoyu, et al.
Published: (2024)
Similar Items
-
PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
by: Guang, Jiahui, et al.
Published: (2026) -
SynBench: A Synthetic Benchmark for Non-rigid 3D Point Cloud Registration
by: Monji-Azad, Sara, et al.
Published: (2024) -
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model
by: Wang, Sibo, et al.
Published: (2024) -
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
by: Yuan, Botai, et al.
Published: (2025) -
CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026)