MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Wang, Ziqin, Chen, Jinyu, Zheng, Xiangyi, Liao, Qinan, Huang, Linjiang, Liu, Si
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Robotics
Accesso online:	https://arxiv.org/abs/2507.04430
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866909677151846400
author	Wang, Ziqin Chen, Jinyu Zheng, Xiangyi Liao, Qinan Huang, Linjiang Liu, Si
author_facet	Wang, Ziqin Chen, Jinyu Zheng, Xiangyi Liao, Qinan Huang, Linjiang Liu, Si
contents	Unmanned Aerial Vehicles, operating in environments with relatively few obstacles, offer high maneuverability and full three-dimensional mobility. This allows them to rapidly approach objects and perform a wide range of tasks often challenging for ground robots, making them ideal for exploration, inspection, aerial imaging, and everyday assistance. In this paper, we introduce AirStar, a UAV-centric embodied platform that turns a UAV into an intelligent aerial assistant: a large language model acts as the cognitive core for environmental understanding, contextual reasoning, and task planning. AirStar accepts natural interaction through voice commands and gestures, removing the need for a remote controller and significantly broadening its user base. It combines geospatial knowledge-driven long-distance navigation with contextual reasoning for fine-grained short-range control, resulting in an efficient and accurate vision-and-language navigation (VLN) capability.Furthermore, the system also offers built-in capabilities such as cross-modal question answering, intelligent filming, and target tracking. With a highly extensible framework, it supports seamless integration of new functionalities, paving the way toward a general-purpose, instruction-driven intelligent UAV agent. The supplementary PPT is available at \href{https://buaa-colalab.github.io/airstar.github.io}{https://buaa-colalab.github.io/airstar.github.io}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_04430
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	"Hi AirStar, Guide Me to the Badminton Court." Wang, Ziqin Chen, Jinyu Zheng, Xiangyi Liao, Qinan Huang, Linjiang Liu, Si Robotics Unmanned Aerial Vehicles, operating in environments with relatively few obstacles, offer high maneuverability and full three-dimensional mobility. This allows them to rapidly approach objects and perform a wide range of tasks often challenging for ground robots, making them ideal for exploration, inspection, aerial imaging, and everyday assistance. In this paper, we introduce AirStar, a UAV-centric embodied platform that turns a UAV into an intelligent aerial assistant: a large language model acts as the cognitive core for environmental understanding, contextual reasoning, and task planning. AirStar accepts natural interaction through voice commands and gestures, removing the need for a remote controller and significantly broadening its user base. It combines geospatial knowledge-driven long-distance navigation with contextual reasoning for fine-grained short-range control, resulting in an efficient and accurate vision-and-language navigation (VLN) capability.Furthermore, the system also offers built-in capabilities such as cross-modal question answering, intelligent filming, and target tracking. With a highly extensible framework, it supports seamless integration of new functionalities, paving the way toward a general-purpose, instruction-driven intelligent UAV agent. The supplementary PPT is available at \href{https://buaa-colalab.github.io/airstar.github.io}{https://buaa-colalab.github.io/airstar.github.io}.
title	"Hi AirStar, Guide Me to the Badminton Court."
topic	Robotics
url	https://arxiv.org/abs/2507.04430

Documenti analoghi