_version_ 1866910860981567488
author Roberts, Jonathan
Taesiri, Mohammad Reza
Sharma, Ansh
Gupta, Akash
Roberts, Samuel
Croitoru, Ioana
Bogolin, Simion-Vlad
Tang, Jialu
Langer, Florian
Raina, Vyas
Raina, Vatsal
Xiong, Hanyi
Udandarao, Vishaal
Lu, Jingyi
Chen, Shiyang
Purkis, Sam
Yan, Tianshuo
Lin, Wenye
Shin, Gyungin
Yang, Qiaochu
Nguyen, Anh Totti
Atkinson, David I.
Baranwal, Aaditya
Coca, Alexandru
Dang, Mikah
Dziadzio, Sebastian
Kunz, Jakob D.
Liang, Kaiqu
Lo, Alexander
Pulfer, Brian
Walton, Steven
Yang, Charig
Han, Kai
Albanie, Samuel
author_facet Roberts, Jonathan
Taesiri, Mohammad Reza
Sharma, Ansh
Gupta, Akash
Roberts, Samuel
Croitoru, Ioana
Bogolin, Simion-Vlad
Tang, Jialu
Langer, Florian
Raina, Vyas
Raina, Vatsal
Xiong, Hanyi
Udandarao, Vishaal
Lu, Jingyi
Chen, Shiyang
Purkis, Sam
Yan, Tianshuo
Lin, Wenye
Shin, Gyungin
Yang, Qiaochu
Nguyen, Anh Totti
Atkinson, David I.
Baranwal, Aaditya
Coca, Alexandru
Dang, Mikah
Dziadzio, Sebastian
Kunz, Jakob D.
Liang, Kaiqu
Lo, Alexander
Pulfer, Brian
Walton, Steven
Yang, Charig
Han, Kai
Albanie, Samuel
contents Large Multimodal Models (LMMs) exhibit major shortfalls when interpreting images and, by some measures, have poorer spatial cognition than small children or animals. Despite this, they attain high scores on many popular visual benchmarks, with headroom rapidly eroded by an ongoing surge of model progress. To address this, there is a pressing need for difficult benchmarks that remain relevant for longer. We take this idea to its limit by introducing ZeroBench-a lightweight visual reasoning benchmark that is entirely impossible for contemporary frontier LMMs. Our benchmark consists of 100 manually curated questions and 334 less difficult subquestions. We evaluate 20 LMMs on ZeroBench, all of which score 0.0%, and rigorously analyse the errors. To encourage progress in visual understanding, we publicly release ZeroBench.
format Preprint
id arxiv_https___arxiv_org_abs_2502_09696
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
Roberts, Jonathan
Taesiri, Mohammad Reza
Sharma, Ansh
Gupta, Akash
Roberts, Samuel
Croitoru, Ioana
Bogolin, Simion-Vlad
Tang, Jialu
Langer, Florian
Raina, Vyas
Raina, Vatsal
Xiong, Hanyi
Udandarao, Vishaal
Lu, Jingyi
Chen, Shiyang
Purkis, Sam
Yan, Tianshuo
Lin, Wenye
Shin, Gyungin
Yang, Qiaochu
Nguyen, Anh Totti
Atkinson, David I.
Baranwal, Aaditya
Coca, Alexandru
Dang, Mikah
Dziadzio, Sebastian
Kunz, Jakob D.
Liang, Kaiqu
Lo, Alexander
Pulfer, Brian
Walton, Steven
Yang, Charig
Han, Kai
Albanie, Samuel
Computer Vision and Pattern Recognition
Large Multimodal Models (LMMs) exhibit major shortfalls when interpreting images and, by some measures, have poorer spatial cognition than small children or animals. Despite this, they attain high scores on many popular visual benchmarks, with headroom rapidly eroded by an ongoing surge of model progress. To address this, there is a pressing need for difficult benchmarks that remain relevant for longer. We take this idea to its limit by introducing ZeroBench-a lightweight visual reasoning benchmark that is entirely impossible for contemporary frontier LMMs. Our benchmark consists of 100 manually curated questions and 334 less difficult subquestions. We evaluate 20 LMMs on ZeroBench, all of which score 0.0%, and rigorously analyse the errors. To encourage progress in visual understanding, we publicly release ZeroBench.
title ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2502.09696