Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dai, Shiqi, Ma, Zizhi, Luo, Zhicong, Yang, Xuesong, Huang, Yibin, Zhang, Wanyue, Chen, Chi, Guo, Zonghao, Xu, Wang, Sun, Yufei, Sun, Maosong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.23219
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915697971429376
author	Dai, Shiqi Ma, Zizhi Luo, Zhicong Yang, Xuesong Huang, Yibin Zhang, Wanyue Chen, Chi Guo, Zonghao Xu, Wang Sun, Yufei Sun, Maosong
author_facet	Dai, Shiqi Ma, Zizhi Luo, Zhicong Yang, Xuesong Huang, Yibin Zhang, Wanyue Chen, Chi Guo, Zonghao Xu, Wang Sun, Yufei Sun, Maosong
contents	While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low-altitude scenarios, while UAV-related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM-UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions-perception, cognition, and planning-in low-altitude UAV scenarios. MM-UAVBench comprises 19 sub-tasks with over 5.7K manually annotated questions, all derived from real-world UAV data collected from public datasets. Extensive experiments on 16 open-source and proprietary MLLMs reveal that current models struggle to adapt to the complex visual and cognitive demands of low-altitude scenarios. Our analyses further uncover critical bottlenecks such as spatial bias and multi-view understanding that hinder the effective deployment of MLLMs in UAV scenarios. We hope MM-UAVBench will foster future research on robust and reliable MLLMs for real-world UAV intelligence.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_23219
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios? Dai, Shiqi Ma, Zizhi Luo, Zhicong Yang, Xuesong Huang, Yibin Zhang, Wanyue Chen, Chi Guo, Zonghao Xu, Wang Sun, Yufei Sun, Maosong Computer Vision and Pattern Recognition While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low-altitude scenarios, while UAV-related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM-UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions-perception, cognition, and planning-in low-altitude UAV scenarios. MM-UAVBench comprises 19 sub-tasks with over 5.7K manually annotated questions, all derived from real-world UAV data collected from public datasets. Extensive experiments on 16 open-source and proprietary MLLMs reveal that current models struggle to adapt to the complex visual and cognitive demands of low-altitude scenarios. Our analyses further uncover critical bottlenecks such as spatial bias and multi-view understanding that hinder the effective deployment of MLLMs in UAV scenarios. We hope MM-UAVBench will foster future research on robust and reliable MLLMs for real-world UAV intelligence.
title	MM-UAVBench: How Well Do Multimodal Large Language Models See, Think, and Plan in Low-Altitude UAV Scenarios?
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.23219

Similar Items