Guardado en:
| Autores principales: | , , , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2510.17014 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866911220433420288 |
|---|---|
| author | Vanyan, Ani Barseghyan, Alvard Tamazyan, Hakob Galstyan, Tigran Huroyan, Vahan Hovakimyan, Naira Khachatrian, Hrant |
| author_facet | Vanyan, Ani Barseghyan, Alvard Tamazyan, Hakob Galstyan, Tigran Huroyan, Vahan Hovakimyan, Naira Khachatrian, Hrant |
| contents | Foundation models have advanced machine learning across various modalities, including images. Recently multiple teams trained foundation models specialized for remote sensing applications. This line of research is motivated by the distinct characteristics of remote sensing imagery, specific applications and types of robustness useful for satellite image analysis. In this work we systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models, at least in the small scale. First, we design a simple benchmark that measures generalization of remote sensing models towards images with lower resolution for two downstream tasks. Second, we train iBOT, a self-supervised vision encoder, on MillionAID, an ImageNet-scale satellite imagery dataset, with several modifications specific to remote sensing. We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_17014 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Do Satellite Tasks Need Special Pretraining? Vanyan, Ani Barseghyan, Alvard Tamazyan, Hakob Galstyan, Tigran Huroyan, Vahan Hovakimyan, Naira Khachatrian, Hrant Computer Vision and Pattern Recognition Foundation models have advanced machine learning across various modalities, including images. Recently multiple teams trained foundation models specialized for remote sensing applications. This line of research is motivated by the distinct characteristics of remote sensing imagery, specific applications and types of robustness useful for satellite image analysis. In this work we systematically challenge the idea that specific foundation models are more useful than general-purpose vision foundation models, at least in the small scale. First, we design a simple benchmark that measures generalization of remote sensing models towards images with lower resolution for two downstream tasks. Second, we train iBOT, a self-supervised vision encoder, on MillionAID, an ImageNet-scale satellite imagery dataset, with several modifications specific to remote sensing. We show that none of those pretrained models bring consistent improvements upon general-purpose baselines at the ViT-B scale. |
| title | Do Satellite Tasks Need Special Pretraining? |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2510.17014 |