:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Titiya, Prasham, Trivedi, Jainil, Baral, Chitta, Gupta, Vivek
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Computer Vision and Pattern Recognition Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2505.21771
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
di: Yilmaz, Nilay, et al.
Pubblicazione: (2025)

Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
di: Parmar, Mihir, et al.
Pubblicazione: (2022)

GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning
di: Siingh, Shikhhar, et al.
Pubblicazione: (2025)

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding
di: Radwan, Ahmed Y., et al.
Pubblicazione: (2026)

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
di: Hong, Jack, et al.
Pubblicazione: (2025)

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
di: Chen, Dongping, et al.
Pubblicazione: (2024)

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
di: Zhang, YiFan, et al.
Pubblicazione: (2024)

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
di: Anvekar, Tejas, et al.
Pubblicazione: (2025)

EuraGovExam: A Multilingual Multimodal Benchmark from Real-World Civil Service Exams
di: Kim, Jaeseong, et al.
Pubblicazione: (2026)

Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
di: Greer, Ross, et al.
Pubblicazione: (2024)

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
di: Qiu, Lu, et al.
Pubblicazione: (2024)

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
di: Tian, Juanxi, et al.
Pubblicazione: (2025)

CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography
di: Zhu, Qingqing, et al.
Pubblicazione: (2026)

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
di: Li, Zhang, et al.
Pubblicazione: (2026)

Opening Articulated Structures in the Real World
di: Gupta, Arjun, et al.
Pubblicazione: (2024)

Physics-Based Benchmarking Metrics for Multimodal Synthetic Images
di: Gupta, Kishor Datta, et al.
Pubblicazione: (2025)

MCDDPM: Multichannel Conditional Denoising Diffusion Model for Unsupervised Anomaly Detection in Brain MRI
di: Trivedi, Vivek Kumar, et al.
Pubblicazione: (2024)

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
di: Gao, Shenyuan, et al.
Pubblicazione: (2024)

Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind
di: Li, Qingmei, et al.
Pubblicazione: (2025)

Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection
di: Yakun, Cui, et al.
Pubblicazione: (2025)

MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding
di: Kou, Qian, et al.
Pubblicazione: (2026)

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
di: Yang, Zhenyu, et al.
Pubblicazione: (2025)

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
di: Pandya, Pranshu, et al.
Pubblicazione: (2024)

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
di: Westerhoff, Justus, et al.
Pubblicazione: (2025)

STAR: A Benchmark for Situated Reasoning in Real-World Videos
di: Wu, Bo, et al.
Pubblicazione: (2024)

A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
di: Jiang, Siyang, et al.
Pubblicazione: (2025)

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs
di: Hu, He, et al.
Pubblicazione: (2026)

PPU-Bench:Real World Benchmark for Personalized Partial Unlearning in Vision Language Models
di: Guang, Jiahui, et al.
Pubblicazione: (2026)

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
di: Saxon, Michael, et al.
Pubblicazione: (2024)

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
di: Ryan, Yuriel, et al.
Pubblicazione: (2025)

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
di: Zhang, Ge, et al.
Pubblicazione: (2024)

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
di: Li, Hengzhi, et al.
Pubblicazione: (2025)

SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding
di: Choi, Tae-Min, et al.
Pubblicazione: (2025)

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding
di: Peng, Xiaojiang, et al.
Pubblicazione: (2026)

Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models
di: Im, Eun Woo, et al.
Pubblicazione: (2025)

OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios
di: Gao, Hong, et al.
Pubblicazione: (2025)

Analysis of Invasive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation
di: Adhikari, Jayan, et al.
Pubblicazione: (2025)

CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
di: Bhosale, Mahesh, et al.
Pubblicazione: (2026)

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
di: Liu, Ziqiang, et al.
Pubblicazione: (2024)

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
di: Li, Yifei, et al.
Pubblicazione: (2025)