Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Astruc, Guillaume, Gonthier, Nicolas, Mallet, Clement, Landrieu, Loic
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.08351
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916327221886976
author	Astruc, Guillaume Gonthier, Nicolas Mallet, Clement Landrieu, Loic
author_facet	Astruc, Guillaume Gonthier, Nicolas Mallet, Clement Landrieu, Loic
contents	The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge diverse EO modalities into expressive features without labels by exploiting their alignment. To demonstrate the advantages of our approach, we create two new multimodal datasets by augmenting existing ones with new modalities. As demonstrated for three downstream tasks -- forestry, land cover classification, and crop mapping -- OmniSat can learn rich representations without supervision, leading to state-of-the-art performances in semi- and fully supervised settings. Furthermore, our multimodal pretraining scheme improves performance even when only one modality is available for inference. The code and dataset are available at https://github.com/gastruc/OmniSat.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_08351
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	OmniSat: Self-Supervised Modality Fusion for Earth Observation Astruc, Guillaume Gonthier, Nicolas Mallet, Clement Landrieu, Loic Computer Vision and Pattern Recognition The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge diverse EO modalities into expressive features without labels by exploiting their alignment. To demonstrate the advantages of our approach, we create two new multimodal datasets by augmenting existing ones with new modalities. As demonstrated for three downstream tasks -- forestry, land cover classification, and crop mapping -- OmniSat can learn rich representations without supervision, leading to state-of-the-art performances in semi- and fully supervised settings. Furthermore, our multimodal pretraining scheme improves performance even when only one modality is available for inference. The code and dataset are available at https://github.com/gastruc/OmniSat.
title	OmniSat: Self-Supervised Modality Fusion for Earth Observation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.08351

Similar Items