Saved in:
Bibliographic Details
Main Authors: Dai, Shiqi, Ma, Zizhi, Luo, Zhicong, Yang, Xuesong, Huang, Yibin, Zhang, Wanyue, Chen, Chi, Guo, Zonghao, Xu, Wang, Sun, Yufei, Sun, Maosong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.23219
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low-altitude scenarios, while UAV-related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM-UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions-perception, cognition, and planning-in low-altitude UAV scenarios. MM-UAVBench comprises 19 sub-tasks with over 5.7K manually annotated questions, all derived from real-world UAV data collected from public datasets. Extensive experiments on 16 open-source and proprietary MLLMs reveal that current models struggle to adapt to the complex visual and cognitive demands of low-altitude scenarios. Our analyses further uncover critical bottlenecks such as spatial bias and multi-view understanding that hinder the effective deployment of MLLMs in UAV scenarios. We hope MM-UAVBench will foster future research on robust and reliable MLLMs for real-world UAV intelligence.