Saved in:
Bibliographic Details
Main Authors: Gupta, Debashis, Golder, Aditi, Zhu, Rongkhun, Cui, Kangning, Tang, Wei, Yang, Fan, Csillik, Ovidiu, Alaqahtani, Sarra, Pauca, V. Paul
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.08683
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911051459592192
author Gupta, Debashis
Golder, Aditi
Zhu, Rongkhun
Cui, Kangning
Tang, Wei
Yang, Fan
Csillik, Ovidiu
Alaqahtani, Sarra
Pauca, V. Paul
author_facet Gupta, Debashis
Golder, Aditi
Zhu, Rongkhun
Cui, Kangning
Tang, Wei
Yang, Fan
Csillik, Ovidiu
Alaqahtani, Sarra
Pauca, V. Paul
contents Contrastive learning (CL) has emerged as a powerful paradigm for learning transferable representations without the reliance on large labeled datasets. Its ability to capture intrinsic similarities and differences among data samples has led to state-of-the-art results in computer vision tasks. These strengths make CL particularly well-suited for Earth System Observation (ESO), where diverse satellite modalities such as optical and SAR imagery offer naturally aligned views of the same geospatial regions. However, ESO presents unique challenges, including high inter-class similarity, scene clutter, and ambiguous boundaries, which complicate representation learning -- especially in low-label, multi-label settings. Existing CL frameworks often focus on intra-modality self-supervision or lack mechanisms for multi-label alignment and semantic precision across modalities. In this work, we introduce MoSAiC, a unified framework that jointly optimizes intra- and inter-modality contrastive learning with a multi-label supervised contrastive loss. Designed specifically for multi-modal satellite imagery, MoSAiC enables finer semantic disentanglement and more robust representation learning across spectrally similar and spatially complex classes. Experiments on two benchmark datasets, BigEarthNet V2.0 and Sent12MS, show that MoSAiC consistently outperforms both fully supervised and self-supervised baselines in terms of accuracy, cluster coherence, and generalization in low-label and high-class-overlap scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2507_08683
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
Gupta, Debashis
Golder, Aditi
Zhu, Rongkhun
Cui, Kangning
Tang, Wei
Yang, Fan
Csillik, Ovidiu
Alaqahtani, Sarra
Pauca, V. Paul
Computer Vision and Pattern Recognition
Artificial Intelligence
Contrastive learning (CL) has emerged as a powerful paradigm for learning transferable representations without the reliance on large labeled datasets. Its ability to capture intrinsic similarities and differences among data samples has led to state-of-the-art results in computer vision tasks. These strengths make CL particularly well-suited for Earth System Observation (ESO), where diverse satellite modalities such as optical and SAR imagery offer naturally aligned views of the same geospatial regions. However, ESO presents unique challenges, including high inter-class similarity, scene clutter, and ambiguous boundaries, which complicate representation learning -- especially in low-label, multi-label settings. Existing CL frameworks often focus on intra-modality self-supervision or lack mechanisms for multi-label alignment and semantic precision across modalities. In this work, we introduce MoSAiC, a unified framework that jointly optimizes intra- and inter-modality contrastive learning with a multi-label supervised contrastive loss. Designed specifically for multi-modal satellite imagery, MoSAiC enables finer semantic disentanglement and more robust representation learning across spectrally similar and spatially complex classes. Experiments on two benchmark datasets, BigEarthNet V2.0 and Sent12MS, show that MoSAiC consistently outperforms both fully supervised and self-supervised baselines in terms of accuracy, cluster coherence, and generalization in low-label and high-class-overlap scenarios.
title MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2507.08683