Saved in:
Bibliographic Details
Main Authors: Siva, Smriti, Cross-Zamirski, Jan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.08117
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914314732961792
author Siva, Smriti
Cross-Zamirski, Jan
author_facet Siva, Smriti
Cross-Zamirski, Jan
contents Rapid building damage assessment is critical for post-disaster response. Damage classification models built on satellite imagery provide a scalable means of obtaining situational awareness. However, label noise and severe class imbalance in satellite data create major challenges. The xBD dataset offers a standardized benchmark for building-level damage across diverse geographic regions. In this study, we evaluate Vision Transformer (ViT) model performance on the xBD dataset, specifically investigating how these models distinguish between types of structural damage when training on noisy, imbalanced data. In this study, we specifically evaluate DINOv2-small and DeiT for multi-class damage classification. We propose a targeted patch-based pre-processing pipeline to isolate structural features and minimize background noise in training. We adopt a frozen-head fine-tuning strategy to keep computational requirements manageable. Model performance is evaluated through accuracy, precision, recall, and macro-averaged F1 scores. We show that small ViT architectures with our novel training method achieves competitive macro-averaged F1 relative to prior CNN baselines for disaster classification.
format Preprint
id arxiv_https___arxiv_org_abs_2602_08117
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Building Damage Detection using Satellite Images and Patch-Based Transformer Methods
Siva, Smriti
Cross-Zamirski, Jan
Computer Vision and Pattern Recognition
Rapid building damage assessment is critical for post-disaster response. Damage classification models built on satellite imagery provide a scalable means of obtaining situational awareness. However, label noise and severe class imbalance in satellite data create major challenges. The xBD dataset offers a standardized benchmark for building-level damage across diverse geographic regions. In this study, we evaluate Vision Transformer (ViT) model performance on the xBD dataset, specifically investigating how these models distinguish between types of structural damage when training on noisy, imbalanced data. In this study, we specifically evaluate DINOv2-small and DeiT for multi-class damage classification. We propose a targeted patch-based pre-processing pipeline to isolate structural features and minimize background noise in training. We adopt a frozen-head fine-tuning strategy to keep computational requirements manageable. Model performance is evaluated through accuracy, precision, recall, and macro-averaged F1 scores. We show that small ViT architectures with our novel training method achieves competitive macro-averaged F1 relative to prior CNN baselines for disaster classification.
title Building Damage Detection using Satellite Images and Patch-Based Transformer Methods
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.08117