Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gupta, Debashis, Golder, Aditi, Zhu, Rongkhun, Cui, Kangning, Tang, Wei, Yang, Fan, Csillik, Ovidiu, Alaqahtani, Sarra, Pauca, V. Paul
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.08683
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911051459592192
author	Gupta, Debashis Golder, Aditi Zhu, Rongkhun Cui, Kangning Tang, Wei Yang, Fan Csillik, Ovidiu Alaqahtani, Sarra Pauca, V. Paul
author_facet	Gupta, Debashis Golder, Aditi Zhu, Rongkhun Cui, Kangning Tang, Wei Yang, Fan Csillik, Ovidiu Alaqahtani, Sarra Pauca, V. Paul
contents	Contrastive learning (CL) has emerged as a powerful paradigm for learning transferable representations without the reliance on large labeled datasets. Its ability to capture intrinsic similarities and differences among data samples has led to state-of-the-art results in computer vision tasks. These strengths make CL particularly well-suited for Earth System Observation (ESO), where diverse satellite modalities such as optical and SAR imagery offer naturally aligned views of the same geospatial regions. However, ESO presents unique challenges, including high inter-class similarity, scene clutter, and ambiguous boundaries, which complicate representation learning -- especially in low-label, multi-label settings. Existing CL frameworks often focus on intra-modality self-supervision or lack mechanisms for multi-label alignment and semantic precision across modalities. In this work, we introduce MoSAiC, a unified framework that jointly optimizes intra- and inter-modality contrastive learning with a multi-label supervised contrastive loss. Designed specifically for multi-modal satellite imagery, MoSAiC enables finer semantic disentanglement and more robust representation learning across spectrally similar and spatially complex classes. Experiments on two benchmark datasets, BigEarthNet V2.0 and Sent12MS, show that MoSAiC consistently outperforms both fully supervised and self-supervised baselines in terms of accuracy, cluster coherence, and generalization in low-label and high-class-overlap scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_08683
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing Gupta, Debashis Golder, Aditi Zhu, Rongkhun Cui, Kangning Tang, Wei Yang, Fan Csillik, Ovidiu Alaqahtani, Sarra Pauca, V. Paul Computer Vision and Pattern Recognition Artificial Intelligence Contrastive learning (CL) has emerged as a powerful paradigm for learning transferable representations without the reliance on large labeled datasets. Its ability to capture intrinsic similarities and differences among data samples has led to state-of-the-art results in computer vision tasks. These strengths make CL particularly well-suited for Earth System Observation (ESO), where diverse satellite modalities such as optical and SAR imagery offer naturally aligned views of the same geospatial regions. However, ESO presents unique challenges, including high inter-class similarity, scene clutter, and ambiguous boundaries, which complicate representation learning -- especially in low-label, multi-label settings. Existing CL frameworks often focus on intra-modality self-supervision or lack mechanisms for multi-label alignment and semantic precision across modalities. In this work, we introduce MoSAiC, a unified framework that jointly optimizes intra- and inter-modality contrastive learning with a multi-label supervised contrastive loss. Designed specifically for multi-modal satellite imagery, MoSAiC enables finer semantic disentanglement and more robust representation learning across spectrally similar and spatially complex classes. Experiments on two benchmark datasets, BigEarthNet V2.0 and Sent12MS, show that MoSAiC consistently outperforms both fully supervised and self-supervised baselines in terms of accuracy, cluster coherence, and generalization in low-label and high-class-overlap scenarios.
title	MoSAiC: Multi-Modal Multi-Label Supervision-Aware Contrastive Learning for Remote Sensing
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2507.08683

Similar Items