:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Zeng, Yunlin
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2601.01062
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
von: Gado, Mohamed, et al.
Veröffentlicht: (2025)

ThermEval: A Structured Benchmark for Evaluation of Vision-Language Models on Thermal Imagery
von: Shrivastava, Ayush, et al.
Veröffentlicht: (2026)

SPoT: Subpixel Placement of Tokens in Vision Transformers
von: Hjelkrem-Tan, Martine, et al.
Veröffentlicht: (2025)

MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models
von: Shi, Yang, et al.
Veröffentlicht: (2026)

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
von: Kaul, Prannay, et al.
Veröffentlicht: (2024)

Adapting Vision-Language Models for Evaluating World Models
von: Hendriksen, Mariya, et al.
Veröffentlicht: (2025)

GeoRC: A Benchmark for Geolocation Reasoning Chains
von: Talreja, Mohit, et al.
Veröffentlicht: (2026)

A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges
von: Li, Zongxia, et al.
Veröffentlicht: (2025)

Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models
von: Wang, Hongjun, et al.
Veröffentlicht: (2026)

Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
von: Zeng, Yu, et al.
Veröffentlicht: (2026)

PlantVillageVQA: A Visual Question Answering Dataset for Benchmarking Vision-Language Models in Plant Science
von: Sakib, Syed Nazmus, et al.
Veröffentlicht: (2025)

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification
von: Goyal, Divyanshu, et al.
Veröffentlicht: (2026)

Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis
von: Nagar, Aishik, et al.
Veröffentlicht: (2024)

Coordinated Robustness Evaluation Framework for Vision-Language Models
von: Babu, Ashwin Ramesh, et al.
Veröffentlicht: (2025)

A Survey on Efficient Vision-Language-Action Models
von: Yu, Zhaoshu, et al.
Veröffentlicht: (2025)

UniFusion: Vision-Language Model as Unified Encoder in Image Generation
von: Li, Kevin, et al.
Veröffentlicht: (2025)

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
von: Miyai, Atsuyuki, et al.
Veröffentlicht: (2024)

Predictive but Not Plannable: RC-aux for Latent World Models
von: Li, Wenyuan, et al.
Veröffentlicht: (2026)

Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities
von: Datta, Shounak, et al.
Veröffentlicht: (2025)

Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models
von: Salman, Shaeke, et al.
Veröffentlicht: (2024)

TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
von: Zhou, Wenhao, et al.
Veröffentlicht: (2025)

Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization
von: Park, Jihwan, et al.
Veröffentlicht: (2025)

Beyond Generation: Multi-Hop Reasoning for Factual Accuracy in Vision-Language Models
von: Hossain, Shamima
Veröffentlicht: (2025)

Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
von: Cherian, Anoop, et al.
Veröffentlicht: (2024)

FastVLM: Efficient Vision Encoding for Vision Language Models
von: Vasu, Pavan Kumar Anasosalu, et al.
Veröffentlicht: (2024)

Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models
von: Nguyen-Truong, Hai, et al.
Veröffentlicht: (2026)

AVA-Bench: Atomic Visual Ability Benchmark for Vision Foundation Models
von: Mai, Zheda, et al.
Veröffentlicht: (2025)

TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation
von: Gong, Han, et al.
Veröffentlicht: (2026)

Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation
von: Zadeh, Danial Sadrian, et al.
Veröffentlicht: (2026)

NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders
von: Dineva, Katarina Trojachanec, et al.
Veröffentlicht: (2026)

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models
von: Wu, Junjie, et al.
Veröffentlicht: (2024)

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
von: Zou, Bocheng, et al.
Veröffentlicht: (2024)

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
von: Zhang, Yuhui, et al.
Veröffentlicht: (2025)

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
von: Sui, Elaine, et al.
Veröffentlicht: (2024)

PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting
von: Maqbool, Danyal, et al.
Veröffentlicht: (2025)

Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models
von: Dafnis, Konstantinos M., et al.
Veröffentlicht: (2025)

Prismer: A Vision-Language Model with Multi-Task Experts
von: Liu, Shikun, et al.
Veröffentlicht: (2023)

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
von: Havrylov, Volodymyr, et al.
Veröffentlicht: (2025)

Multi-Modal Adapter for Vision-Language Models
von: Seputis, Dominykas, et al.
Veröffentlicht: (2024)

SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling
von: Wang, Eileen, et al.
Veröffentlicht: (2024)