Saved in:
| Main Authors: | Zhou, Shizhe, Jia, Bohan, Wu, Kai, Shen, Yan, Li, Tongyun, Wu, Yuyang, Lin, Shaohui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29579 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DF-LLaVA: Unlocking MLLMs for Synthetic Image Detection via Knowledge Injection and Conflict-Driven Self-Reflection
by: Shen, Zhuokang, et al.
Published: (2025)
by: Shen, Zhuokang, et al.
Published: (2025)
ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams
by: Xu, Qiang, et al.
Published: (2026)
by: Xu, Qiang, et al.
Published: (2026)
CompBench: Benchmarking Complex Instruction-guided Image Editing
by: Jia, Bohan, et al.
Published: (2025)
by: Jia, Bohan, et al.
Published: (2025)
CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026)
by: Chen, Kesheng, et al.
Published: (2026)
What Color Is It? A Text-Interference Multimodal Hallucination Benchmark
by: Zhao, Jinkun, et al.
Published: (2025)
by: Zhao, Jinkun, et al.
Published: (2025)
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies
by: Wang, Chenglin, et al.
Published: (2025)
by: Wang, Chenglin, et al.
Published: (2025)
VirtueBench: Evaluating Trustworthiness under Uncertainty in Long Video Understanding
by: Yu, Xueqing, et al.
Published: (2026)
by: Yu, Xueqing, et al.
Published: (2026)
HallE-Control: Controlling Object Hallucination in Large Multimodal Models
by: Zhai, Bohan, et al.
Published: (2023)
by: Zhai, Bohan, et al.
Published: (2023)
Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
by: Wang, Xiaosen, et al.
Published: (2026)
by: Wang, Xiaosen, et al.
Published: (2026)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
by: Wu, Shengqiong, et al.
Published: (2024)
by: Wu, Shengqiong, et al.
Published: (2024)
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models
by: Weng, Ju-Hsuan, et al.
Published: (2025)
by: Weng, Ju-Hsuan, et al.
Published: (2025)
Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
by: Wu, Sijing, et al.
Published: (2026)
by: Wu, Sijing, et al.
Published: (2026)
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2026)
by: Saito, Kuniaki, et al.
Published: (2026)
HalDec-Bench: Benchmarking Hallucination Detector in Image Captioning
by: Saito, Kuniaki, et al.
Published: (2025)
by: Saito, Kuniaki, et al.
Published: (2025)
A Survey of Multimodal Hallucination Evaluation and Detection
by: Chen, Zhiyuan, et al.
Published: (2025)
by: Chen, Zhiyuan, et al.
Published: (2025)
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
by: Chen, Jiacheng, et al.
Published: (2024)
by: Chen, Jiacheng, et al.
Published: (2024)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
by: Zhang, Zhengbo, et al.
Published: (2026)
by: Zhang, Zhengbo, et al.
Published: (2026)
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks
by: Wu, Zonglin, et al.
Published: (2025)
by: Wu, Zonglin, et al.
Published: (2025)
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
by: Jia, Xiaojun, et al.
Published: (2025)
by: Jia, Xiaojun, et al.
Published: (2025)
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models
by: Li, Jiale, et al.
Published: (2025)
by: Li, Jiale, et al.
Published: (2025)
GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models
by: Butt, Muhammad Atif, et al.
Published: (2025)
by: Butt, Muhammad Atif, et al.
Published: (2025)
DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions
by: Wang, Xinran, et al.
Published: (2026)
by: Wang, Xinran, et al.
Published: (2026)
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
by: Ouyang, Kun, et al.
Published: (2024)
by: Ouyang, Kun, et al.
Published: (2024)
SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis
by: Zhang, Chenghanyu, et al.
Published: (2025)
by: Zhang, Chenghanyu, et al.
Published: (2025)
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
by: Zhou, Baichuan, et al.
Published: (2024)
by: Zhou, Baichuan, et al.
Published: (2024)
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs
by: Nguyen, Dung, et al.
Published: (2025)
by: Nguyen, Dung, et al.
Published: (2025)
Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding
by: Li, Jinlin, et al.
Published: (2025)
by: Li, Jinlin, et al.
Published: (2025)
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
by: Huang, Peizhou, et al.
Published: (2026)
by: Huang, Peizhou, et al.
Published: (2026)
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
by: Chen, Kaijie, et al.
Published: (2025)
by: Chen, Kaijie, et al.
Published: (2025)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
Hallucination Benchmark in Medical Visual Question Answering
by: Wu, Jinge, et al.
Published: (2024)
by: Wu, Jinge, et al.
Published: (2024)
MICON-Bench: Benchmarking and Enhancing Multi-Image Context Image Generation in Unified Multimodal Models
by: Wu, Mingrui, et al.
Published: (2026)
by: Wu, Mingrui, et al.
Published: (2026)
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
by: Rawte, Vipula, et al.
Published: (2024)
by: Rawte, Vipula, et al.
Published: (2024)
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
by: Wang, Junyang, et al.
Published: (2023)
by: Wang, Junyang, et al.
Published: (2023)
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
by: Roberts, Jonathan, et al.
Published: (2025)
by: Roberts, Jonathan, et al.
Published: (2025)
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
by: Feng, Yuan, et al.
Published: (2025)
by: Feng, Yuan, et al.
Published: (2025)
E-React: Towards Emotionally Controlled Synthesis of Human Reactions
by: Zhu, Chen, et al.
Published: (2025)
by: Zhu, Chen, et al.
Published: (2025)
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
by: Wu, Junjie, et al.
Published: (2024)
by: Wu, Junjie, et al.
Published: (2024)
Similar Items
-
DF-LLaVA: Unlocking MLLMs for Synthetic Image Detection via Knowledge Injection and Conflict-Driven Self-Reflection
by: Shen, Zhuokang, et al.
Published: (2025) -
ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams
by: Xu, Qiang, et al.
Published: (2026) -
CompBench: Benchmarking Complex Instruction-guided Image Editing
by: Jia, Bohan, et al.
Published: (2025) -
CDH-Bench: A Commonsense-Driven Hallucination Benchmark for Evaluating Visual Fidelity in Vision-Language Models
by: Chen, Kesheng, et al.
Published: (2026) -
What Color Is It? A Text-Interference Multimodal Hallucination Benchmark
by: Zhao, Jinkun, et al.
Published: (2025)