:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Zhao, Jinkun, Huang, Lei, Ge, Haixin, Wu, Wenjun
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2511.13400
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models
por: Rawte, Vipula, et al.
Publicado: (2024)

ReactBench: A Cause-Driven Benchmark for Multimodal Hallucination via Systematic Evaluation
por: Zhou, Shizhe, et al.
Publicado: (2026)

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
por: Gao, Hongcheng, et al.
Publicado: (2025)

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models
por: Li, Jiale, et al.
Publicado: (2025)

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
por: Li, Bohao, et al.
Publicado: (2024)

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models
por: Butt, Muhammad Atif, et al.
Publicado: (2025)

MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark
por: Shan, Bin, et al.
Publicado: (2024)

Towards Efficient and Effective Deep Clustering with Dynamic Grouping and Prototype Aggregation
por: Zhang, Haixin, et al.
Publicado: (2024)

Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs
por: Nguyen, Dung, et al.
Publicado: (2025)

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM
por: Dong, Bowen, et al.
Publicado: (2025)

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery
por: Ge, Jinchao, et al.
Publicado: (2025)

ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians
por: Liu, Yufei, et al.
Publicado: (2024)

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs
por: Zhang, Le, et al.
Publicado: (2026)

Towards Generalized Multimodal Homography Estimation
por: You, Jinkun, et al.
Publicado: (2026)

Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs
por: Lin, Chenchen, et al.
Publicado: (2026)

Lyapunov Probes for Hallucination Detection in Large Foundation Models
por: Luan, Bozhi, et al.
Publicado: (2026)

Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift
por: Qiu, Jielin, et al.
Publicado: (2022)

Dual-Level Cross-Modal Contrastive Clustering
por: Zhang, Haixin, et al.
Publicado: (2024)

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes
por: Ross, Candace, et al.
Publicado: (2025)

Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
por: He, Zhentao, et al.
Publicado: (2025)

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
por: Zhang, Xingjian, et al.
Publicado: (2025)

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
por: Zhong, Weihong, et al.
Publicado: (2024)

Evaluating Durability: Benchmark Insights into Multimodal Watermarking
por: Qiu, Jielin, et al.
Publicado: (2024)

FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
por: Yin, Zhihan, et al.
Publicado: (2026)

A Survey of Multimodal Hallucination Evaluation and Detection
por: Chen, Zhiyuan, et al.
Publicado: (2025)

ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models
por: Ruan, Chenxi, et al.
Publicado: (2026)

MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
por: Zhu, Kejian, et al.
Publicado: (2025)

Hallucination Benchmark in Medical Visual Question Answering
por: Wu, Jinge, et al.
Publicado: (2024)

AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
por: Wu, Xiyang, et al.
Publicado: (2024)

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
por: Shu, Yan, et al.
Publicado: (2025)

Hallucination of Multimodal Large Language Models: A Survey
por: Bai, Zechen, et al.
Publicado: (2024)

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
por: Jiang, Chaoya, et al.
Publicado: (2023)

When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models
por: Yakun, Cui, et al.
Publicado: (2026)

SELECT: Detecting Label Errors in Real-world Scene Text Data
por: Liu, Wenjun, et al.
Publicado: (2025)

FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
por: Ruan, Jiacheng, et al.
Publicado: (2024)

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
por: Yan, Dingkun, et al.
Publicado: (2024)

CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base
por: Nguyen, Cong-Duy, et al.
Publicado: (2025)

Steering the Verifiability of Multimodal AI Hallucinations
por: Pang, Jianhong, et al.
Publicado: (2026)

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text
por: Rahman, Mizanur, et al.
Publicado: (2025)

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
por: Wu, Shengqiong, et al.
Publicado: (2024)