Guardado en:
| Autores principales: | Heap, Thomas, Aitchison, Laurence, Cahill, Emma, Rodriguez, Adriana Casado |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2602.18540 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction
por: Sharifipour, Sasan, et al.
Publicado: (2025)
por: Sharifipour, Sasan, et al.
Publicado: (2025)
PushupBench: Your VLM is not good at counting pushups
por: Li, Shengzhi, et al.
Publicado: (2026)
por: Li, Shengzhi, et al.
Publicado: (2026)
Video-Bench: Human-Aligned Video Generation Benchmark
por: Han, Hui, et al.
Publicado: (2025)
por: Han, Hui, et al.
Publicado: (2025)
VEU-Bench: Towards Comprehensive Understanding of Video Editing
por: Li, Bozheng, et al.
Publicado: (2025)
por: Li, Bozheng, et al.
Publicado: (2025)
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
por: Ouyang, Kun, et al.
Publicado: (2024)
por: Ouyang, Kun, et al.
Publicado: (2024)
Seeing the Big Picture: Evaluating Multimodal LLMs' Ability to Interpret and Grade Handwritten Student Work
por: Henkel, Owen, et al.
Publicado: (2025)
por: Henkel, Owen, et al.
Publicado: (2025)
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
por: Wang, Zhecan, et al.
Publicado: (2024)
por: Wang, Zhecan, et al.
Publicado: (2024)
AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects
por: Li, Danrui, et al.
Publicado: (2026)
por: Li, Danrui, et al.
Publicado: (2026)
HY3D-Bench: Generation of 3D Assets
por: Hunyuan3D, Team, et al.
Publicado: (2026)
por: Hunyuan3D, Team, et al.
Publicado: (2026)
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
por: Zhang, Zicheng, et al.
Publicado: (2024)
por: Zhang, Zicheng, et al.
Publicado: (2024)
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
por: Rismanchian, Sina, et al.
Publicado: (2024)
por: Rismanchian, Sina, et al.
Publicado: (2024)
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
por: Lozano, Alejandro, et al.
Publicado: (2024)
por: Lozano, Alejandro, et al.
Publicado: (2024)
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
por: Tang, Tianqi, et al.
Publicado: (2024)
por: Tang, Tianqi, et al.
Publicado: (2024)
LocateBench: Evaluating the Locating Ability of Vision Language Models
por: Chiang, Ting-Rui, et al.
Publicado: (2024)
por: Chiang, Ting-Rui, et al.
Publicado: (2024)
VideoGameBench: Can Vision-Language Models complete popular video games?
por: Zhang, Alex L., et al.
Publicado: (2025)
por: Zhang, Alex L., et al.
Publicado: (2025)
VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing
por: Luo, Zhiming, et al.
Publicado: (2026)
por: Luo, Zhiming, et al.
Publicado: (2026)
Omni IIE Bench: Benchmarking the Practical Capabilities of Image Editing Models
por: Yang, Yujia, et al.
Publicado: (2026)
por: Yang, Yujia, et al.
Publicado: (2026)
CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography
por: Zhu, Qingqing, et al.
Publicado: (2026)
por: Zhu, Qingqing, et al.
Publicado: (2026)
SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models
por: Tang, Zhengxu, et al.
Publicado: (2025)
por: Tang, Zhengxu, et al.
Publicado: (2025)
Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments
por: Ali, Muhammad, et al.
Publicado: (2025)
por: Ali, Muhammad, et al.
Publicado: (2025)
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
por: Yuan, Botai, et al.
Publicado: (2025)
por: Yuan, Botai, et al.
Publicado: (2025)
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
por: Lin, Junming, et al.
Publicado: (2024)
por: Lin, Junming, et al.
Publicado: (2024)
SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning
por: Kong, Fanqi, et al.
Publicado: (2025)
por: Kong, Fanqi, et al.
Publicado: (2025)
Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing
por: Liu, Yimeng, et al.
Publicado: (2025)
por: Liu, Yimeng, et al.
Publicado: (2025)
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
por: Chowdhury, Sanjoy, et al.
Publicado: (2025)
por: Chowdhury, Sanjoy, et al.
Publicado: (2025)
WorldModelBench: Judging Video Generation Models As World Models
por: Li, Dacheng, et al.
Publicado: (2025)
por: Li, Dacheng, et al.
Publicado: (2025)
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions
por: Ma, Sihan, et al.
Publicado: (2024)
por: Ma, Sihan, et al.
Publicado: (2024)
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
por: Zhang, Yuyou, et al.
Publicado: (2025)
por: Zhang, Yuyou, et al.
Publicado: (2025)
GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation
por: Zhai, Ziyu, et al.
Publicado: (2026)
por: Zhai, Ziyu, et al.
Publicado: (2026)
VisBrowse-Bench: Benchmarking Visual-Native Search for Multimodal Browsing Agents
por: Zhang, Zhengbo, et al.
Publicado: (2026)
por: Zhang, Zhengbo, et al.
Publicado: (2026)
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning
por: Jia, Zi-Yi, et al.
Publicado: (2026)
por: Jia, Zi-Yi, et al.
Publicado: (2026)
LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
por: Wang, Jun, et al.
Publicado: (2026)
por: Wang, Jun, et al.
Publicado: (2026)
MMCL-Bench: Multimodal Context Learning from Visual Rules, Procedures, and Evidence
por: Chen, Yifan, et al.
Publicado: (2026)
por: Chen, Yifan, et al.
Publicado: (2026)
EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
por: Ran, Dongchuan, et al.
Publicado: (2026)
por: Ran, Dongchuan, et al.
Publicado: (2026)
VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models
por: Saxena, Rohit, et al.
Publicado: (2026)
por: Saxena, Rohit, et al.
Publicado: (2026)
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models
por: Wang, JiYang, et al.
Publicado: (2026)
por: Wang, JiYang, et al.
Publicado: (2026)
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis
por: Wei, Jianhui, et al.
Publicado: (2025)
por: Wei, Jianhui, et al.
Publicado: (2025)
VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding
por: Zhang, Zhihong, et al.
Publicado: (2025)
por: Zhang, Zhihong, et al.
Publicado: (2025)
GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI
por: Simumba, Naomi, et al.
Publicado: (2025)
por: Simumba, Naomi, et al.
Publicado: (2025)
RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension
por: Gao, Tianyi, et al.
Publicado: (2025)
por: Gao, Tianyi, et al.
Publicado: (2025)
Ejemplares similares
-
APML: Adaptive Probabilistic Matching Loss for Robust 3D Point Cloud Reconstruction
por: Sharifipour, Sasan, et al.
Publicado: (2025) -
PushupBench: Your VLM is not good at counting pushups
por: Li, Shengzhi, et al.
Publicado: (2026) -
Video-Bench: Human-Aligned Video Generation Benchmark
por: Han, Hui, et al.
Publicado: (2025) -
VEU-Bench: Towards Comprehensive Understanding of Video Editing
por: Li, Bozheng, et al.
Publicado: (2025) -
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
por: Ouyang, Kun, et al.
Publicado: (2024)