Table des matières: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Tehrani, Sara, Xu, Yonghao, Haglund, Leif, Berg, Amanda, Felsberg, Michael
Format:	Preprint
Publié:	2026
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2601.18493
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Table des matières:

Timely interpretation of satellite imagery is critical for disaster response, yet existing vision-language benchmarks for remote sensing largely focus on coarse labels and image-level recognition, overlooking the functional understanding and instruction robustness required in real humanitarian workflows. We introduce DisasterInsight, a multimodal benchmark designed to evaluate vision-language models (VLMs) on realistic disaster analysis tasks. DisasterInsight restructures the xBD dataset into approximately 112K building-centered instances and supports instruction-diverse evaluation across multiple tasks, including building-function classification, damage-level and disaster-type classification, counting, and structured report generation aligned with humanitarian assessment guidelines. To establish domain-adapted baselines, we propose DI-Chat, obtained by fine-tuning existing VLM backbones on disaster-specific instruction data using parameter-efficient Low-Rank Adaptation (LoRA). Extensive experiments on state-of-the-art generic and remote-sensing VLMs reveal substantial performance gaps across tasks, particularly in damage understanding and structured report generation. DI-Chat achieves significant improvements on damage-level and disaster-type classification as well as report generation quality, while building-function classification remains challenging for all evaluated models. DisasterInsight provides a unified benchmark for studying grounded multimodal reasoning in disaster imagery.

Documents similaires