Guardado en:
Detalles Bibliográficos
Autores principales: Vanyan, Ani, Barseghyan, Alvard, Tamazyan, Hakob, Galstyan, Tigran, Huroyan, Vahan, Hovakimyan, Naira, Khachatrian, Hrant
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2510.17014
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911220433420288
author Vanyan, Ani
Barseghyan, Alvard
Tamazyan, Hakob
Galstyan, Tigran
Huroyan, Vahan
Hovakimyan, Naira
Khachatrian, Hrant
author_facet Vanyan, Ani
Barseghyan, Alvard
Tamazyan, Hakob
Galstyan, Tigran
Huroyan, Vahan
Hovakimyan, Naira
Khachatrian, Hrant
contents Foundation models have advanced machine learning across various modalities, including images. Recently multiple teams trained foundation models specialized for remote sensing applications. This line of research is motivated by the distinct characteristics of remote sensing imagery, specific applications and types of robustness useful for satellite image analysis. In this work we systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models, at least in the small scale. First, we design a simple benchmark that measures generalization of remote sensing models towards images with lower resolution for two downstream tasks. Second, we train iBOT, a self-supervised vision encoder, on MillionAID, an ImageNet-scale satellite imagery dataset, with several modifications specific to remote sensing. We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale.
format Preprint
id arxiv_https___arxiv_org_abs_2510_17014
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Do Satellite Tasks Need Special Pretraining?
Vanyan, Ani
Barseghyan, Alvard
Tamazyan, Hakob
Galstyan, Tigran
Huroyan, Vahan
Hovakimyan, Naira
Khachatrian, Hrant
Computer Vision and Pattern Recognition
Foundation models have advanced machine learning across various modalities, including images. Recently multiple teams trained foundation models specialized for remote sensing applications. This line of research is motivated by the distinct characteristics of remote sensing imagery, specific applications and types of robustness useful for satellite image analysis. In this work we systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models, at least in the small scale. First, we design a simple benchmark that measures generalization of remote sensing models towards images with lower resolution for two downstream tasks. Second, we train iBOT, a self-supervised vision encoder, on MillionAID, an ImageNet-scale satellite imagery dataset, with several modifications specific to remote sensing. We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale.
title Do Satellite Tasks Need Special Pretraining?
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2510.17014