Saved in:
| Main Authors: | Hamilton, Brayden, Cashmore, Tim, Driscoll, Peter, Gee, Trevor, Williams, Henry |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20196 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automated Building Heritage Assessment Using Street-Level Imagery
by: Dabrock, Kristina, et al.
Published: (2025)
by: Dabrock, Kristina, et al.
Published: (2025)
Automated Assessment of Residual Plots with Computer Vision Models
by: Li, Weihao, et al.
Published: (2024)
by: Li, Weihao, et al.
Published: (2024)
Benchmarking PhD-Level Coding in 3D Geometric Computer Vision
by: Li, Wenyi, et al.
Published: (2026)
by: Li, Wenyi, et al.
Published: (2026)
Vision Function Layer in Multimodal LLMs
by: Shi, Cheng, et al.
Published: (2025)
by: Shi, Cheng, et al.
Published: (2025)
Pallet Detection And Localisation From Synthetic Data
by: Mueller, Henri, et al.
Published: (2025)
by: Mueller, Henri, et al.
Published: (2025)
Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
by: Wang, Youze, et al.
Published: (2025)
by: Wang, Youze, et al.
Published: (2025)
Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark
by: Williams, Evan M., et al.
Published: (2024)
by: Williams, Evan M., et al.
Published: (2024)
LogicGaze: Benchmarking Causal Consistency in Visual Narratives via Counterfactual Verification
by: Driscoll, Rory, et al.
Published: (2026)
by: Driscoll, Rory, et al.
Published: (2026)
Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots
by: Polak, Maciej P., et al.
Published: (2025)
by: Polak, Maciej P., et al.
Published: (2025)
Skipping Computations in Multimodal LLMs
by: Shukor, Mustafa, et al.
Published: (2024)
by: Shukor, Mustafa, et al.
Published: (2024)
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
by: Choong, Wey Yeh, et al.
Published: (2024)
by: Choong, Wey Yeh, et al.
Published: (2024)
Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
by: Bueno, Ivo, et al.
Published: (2025)
by: Bueno, Ivo, et al.
Published: (2025)
A Computer Vision Pipeline for Individual-Level Behavior Analysis: Benchmarking on the Edinburgh Pig Dataset
by: Yang, Haiyu, et al.
Published: (2025)
by: Yang, Haiyu, et al.
Published: (2025)
HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
by: Li, Keliang, et al.
Published: (2024)
by: Li, Keliang, et al.
Published: (2024)
SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis
by: Zhang, Chenghanyu, et al.
Published: (2025)
by: Zhang, Chenghanyu, et al.
Published: (2025)
Advancing from Automated to Autonomous Beamline by Leveraging Computer Vision
by: Li, Baolu, et al.
Published: (2025)
by: Li, Baolu, et al.
Published: (2025)
Vision-Language Foundation Models for Comprehensive Automated Pavement Condition Assessment
by: Kyem, Blessing Agyei, et al.
Published: (2026)
by: Kyem, Blessing Agyei, et al.
Published: (2026)
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
by: Wu, Tsung-Han, et al.
Published: (2024)
by: Wu, Tsung-Han, et al.
Published: (2024)
Multimodal AI for Body Fat Estimation: Computer Vision and Anthropometry with DEXA Benchmarks
by: Aldajani, Rayan
Published: (2025)
by: Aldajani, Rayan
Published: (2025)
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs
by: Zhou, Xingcheng, et al.
Published: (2026)
by: Zhou, Xingcheng, et al.
Published: (2026)
Hyperphantasia: A Benchmark for Evaluating the Mental Visualization Capabilities of Multimodal LLMs
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)
Recent Advances of Continual Learning in Computer Vision: An Overview
by: Qu, Haoxuan, et al.
Published: (2021)
by: Qu, Haoxuan, et al.
Published: (2021)
Computer Vision based Automated Quantification of Agricultural Sprayers Boom Displacement
by: Dalal, Aryan Singh, et al.
Published: (2025)
by: Dalal, Aryan Singh, et al.
Published: (2025)
FireANTs: Adaptive Riemannian Optimization for Multi-Scale Diffeomorphic Matching
by: Jena, Rohit, et al.
Published: (2024)
by: Jena, Rohit, et al.
Published: (2024)
Scaling Vision Pre-Training to 4K Resolution
by: Shi, Baifeng, et al.
Published: (2025)
by: Shi, Baifeng, et al.
Published: (2025)
Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
by: Ding, Xin, et al.
Published: (2025)
by: Ding, Xin, et al.
Published: (2025)
Robust Computer-Vision based Construction Site Detection for Assistive-Technology Applications
by: Feng, Junchi, et al.
Published: (2025)
by: Feng, Junchi, et al.
Published: (2025)
From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs
by: Zhang, Le, et al.
Published: (2026)
by: Zhang, Le, et al.
Published: (2026)
DisasterInsight: A Multimodal Benchmark for Function-Aware and Grounded Disaster Assessment
by: Tehrani, Sara, et al.
Published: (2026)
by: Tehrani, Sara, et al.
Published: (2026)
MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding
by: Li, Renjie, et al.
Published: (2025)
by: Li, Renjie, et al.
Published: (2025)
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
by: Tong, Shengbang, et al.
Published: (2024)
by: Tong, Shengbang, et al.
Published: (2024)
Designing High-Performing Networks for Multi-Scale Computer Vision
by: Picron, Cédric
Published: (2024)
by: Picron, Cédric
Published: (2024)
Visual-Word Tokenizer: Beyond Fixed Sets of Tokens in Vision Transformers
by: Gee, Leonidas, et al.
Published: (2024)
by: Gee, Leonidas, et al.
Published: (2024)
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
by: Tyo, Jacob, et al.
Published: (2024)
by: Tyo, Jacob, et al.
Published: (2024)
Weather-Robust Scene Semantics with Vision-Aligned 4D Radar
by: Hamilton, Kali, et al.
Published: (2026)
by: Hamilton, Kali, et al.
Published: (2026)
Foul prediction with estimated poses from soccer broadcast video
by: Fang, Jiale, et al.
Published: (2024)
by: Fang, Jiale, et al.
Published: (2024)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning
by: Selvas-Sala, Cai, et al.
Published: (2026)
by: Selvas-Sala, Cai, et al.
Published: (2026)
HUGE-Bench: A Benchmark for High-Level UAV Vision-Language-Action Tasks
by: Guo, Jingyu, et al.
Published: (2026)
by: Guo, Jingyu, et al.
Published: (2026)
Enhancing Computer Vision Model Generalization in Warehouse Facilities: A Case Study on Anomaly Detection in Vertical Material Handling Systems
by: Liu, Ruiliang, et al.
Published: (2026)
by: Liu, Ruiliang, et al.
Published: (2026)
Similar Items
-
Automated Building Heritage Assessment Using Street-Level Imagery
by: Dabrock, Kristina, et al.
Published: (2025) -
Automated Assessment of Residual Plots with Computer Vision Models
by: Li, Weihao, et al.
Published: (2024) -
Benchmarking PhD-Level Coding in 3D Geometric Computer Vision
by: Li, Wenyi, et al.
Published: (2026) -
Vision Function Layer in Multimodal LLMs
by: Shi, Cheng, et al.
Published: (2025) -
Pallet Detection And Localisation From Synthetic Data
by: Mueller, Henri, et al.
Published: (2025)