:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hamilton, Brayden, Cashmore, Tim, Driscoll, Peter, Gee, Trevor, Williams, Henry
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.20196
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automated Building Heritage Assessment Using Street-Level Imagery
by: Dabrock, Kristina, et al.
Published: (2025)

Automated Assessment of Residual Plots with Computer Vision Models
by: Li, Weihao, et al.
Published: (2024)

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision
by: Li, Wenyi, et al.
Published: (2026)

Vision Function Layer in Multimodal LLMs
by: Shi, Cheng, et al.
Published: (2025)

Pallet Detection And Localisation From Synthetic Data
by: Mueller, Henri, et al.
Published: (2025)

Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
by: Wang, Youze, et al.
Published: (2025)

Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
by: Williams, Evan M., et al.
Published: (2024)

LogicGaze: Benchmarking Causal Consistency in Visual Narratives via Counterfactual Verification
by: Driscoll, Rory, et al.
Published: (2026)

Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots
by: Polak, Maciej P., et al.
Published: (2025)

Skipping Computations in Multimodal LLMs
by: Shukor, Mustafa, et al.
Published: (2024)

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)

Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
by: Bueno, Ivo, et al.
Published: (2025)

A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
by: Yang, Haiyu, et al.
Published: (2025)

HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)

SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis
by: Zhang, Chenghanyu, et al.
Published: (2025)

Advancing from Automated to Autonomous Beamline by Leveraging Computer Vision
by: Li, Baolu, et al.
Published: (2025)

Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment
by: Kyem, Blessing Agyei, et al.
Published: (2026)

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
by: Wu, Tsung-Han, et al.
Published: (2024)

Multimodal AI for Body Fat Estimation: Computer Vision and Anthropometry with DEXA Benchmarks
by: Aldajani, Rayan
Published: (2025)

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026)

Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)

Recent Advances of Continual Learning in Computer Vision: An Overview
by: Qu, Haoxuan, et al.
Published: (2021)

Computer Vision based Automated Quantification of Agricultural Sprayers Boom Displacement
by: Dalal, Aryan Singh, et al.
Published: (2025)

FireANTs: Adaptive Riemannian Optimization for Multi-Scale Diffeomorphic Matching
by: Jena, Rohit, et al.
Published: (2024)

Scaling Vision Pre-Training to 4K Resolution
by: Shi, Baifeng, et al.
Published: (2025)

Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
by: Ding, Xin, et al.
Published: (2025)

Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications
by: Feng, Junchi, et al.
Published: (2025)

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs
by: Zhang, Le, et al.
Published: (2026)

DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
by: Tehrani, Sara, et al.
Published: (2026)

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding
by: Li, Renjie, et al.
Published: (2025)

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
by: Tong, Shengbang, et al.
Published: (2024)

Designing High-Performing Networks for Multi-Scale Computer Vision
by: Picron, Cédric
Published: (2024)

Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers
by: Gee, Leonidas, et al.
Published: (2024)

Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
by: Tyo, Jacob, et al.
Published: (2024)

Weather-Robust Scene Semantics with Vision-Aligned 4D Radar
by: Hamilton, Kali, et al.
Published: (2026)

Foul prediction with estimated poses from soccer broadcast video
by: Fang, Jiale, et al.
Published: (2024)

PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)

SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning
by: Selvas-Sala, Cai, et al.
Published: (2026)

HUGE-Bench: A Benchmark for High-Level UAV Vision-Language-Action Tasks
by: Guo, Jingyu, et al.
Published: (2026)

Enhancing Computer Vision Model Generalization in Warehouse Facilities: A Case Study on Anomaly Detection in Vertical Material Handling Systems
by: Liu, Ruiliang, et al.
Published: (2026)