Saved in:
Bibliographic Details
Main Authors: Sun, Enzhe, Cui, Yongchuan, Liu, Peng, Yan, Jining
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.00901
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913937532911616
author Sun, Enzhe
Cui, Yongchuan
Liu, Peng
Yan, Jining
author_facet Sun, Enzhe
Cui, Yongchuan
Liu, Peng
Yan, Jining
contents Remote sensing spatiotemporal fusion (STF) addresses the fundamental trade-off between temporal and spatial resolution by combining high temporal-low spatial and high spatial-low temporal imagery. This paper presents the first comprehensive survey of deep learning advances in remote sensing STF over the past decade. We establish a systematic taxonomy of deep learning architectures including Convolutional Neural Networks (CNNs), Transformers, Generative Adversarial Networks (GANs), diffusion models, and sequence models, revealing significant growth in deep learning adoption for STF tasks. Our analysis reveals that CNN-based methods dominate spatial feature extraction, while Transformer architectures show superior performance in capturing long-range temporal dependencies. GAN and diffusion models demonstrate exceptional capability in detail reconstruction, substantially outperforming traditional methods in structural similarity and spectral fidelity. Through comprehensive experiments on seven benchmark datasets comparing ten representative methods, we validate these findings and quantify the performance trade-offs between different approaches. We identify five critical challenges: time-space conflicts, limited generalization across datasets, computational efficiency for large-scale processing, multi-source heterogeneous fusion, and insufficient benchmark diversity. The survey highlights promising opportunities in foundation models, hybrid architectures, and self-supervised learning approaches that could address current limitations and enable multimodal applications. The specific models, datasets, and other information mentioned in this article have been collected in: https://github.com/yc-cui/Deep-Learning-Spatiotemporal-Fusion-Survey.
format Preprint
id arxiv_https___arxiv_org_abs_2504_00901
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
Sun, Enzhe
Cui, Yongchuan
Liu, Peng
Yan, Jining
Computer Vision and Pattern Recognition
Remote sensing spatiotemporal fusion (STF) addresses the fundamental trade-off between temporal and spatial resolution by combining high temporal-low spatial and high spatial-low temporal imagery. This paper presents the first comprehensive survey of deep learning advances in remote sensing STF over the past decade. We establish a systematic taxonomy of deep learning architectures including Convolutional Neural Networks (CNNs), Transformers, Generative Adversarial Networks (GANs), diffusion models, and sequence models, revealing significant growth in deep learning adoption for STF tasks. Our analysis reveals that CNN-based methods dominate spatial feature extraction, while Transformer architectures show superior performance in capturing long-range temporal dependencies. GAN and diffusion models demonstrate exceptional capability in detail reconstruction, substantially outperforming traditional methods in structural similarity and spectral fidelity. Through comprehensive experiments on seven benchmark datasets comparing ten representative methods, we validate these findings and quantify the performance trade-offs between different approaches. We identify five critical challenges: time-space conflicts, limited generalization across datasets, computational efficiency for large-scale processing, multi-source heterogeneous fusion, and insufficient benchmark diversity. The survey highlights promising opportunities in foundation models, hybrid architectures, and self-supervised learning approaches that could address current limitations and enable multimodal applications. The specific models, datasets, and other information mentioned in this article have been collected in: https://github.com/yc-cui/Deep-Learning-Spatiotemporal-Fusion-Survey.
title A Decade of Deep Learning for Remote Sensing Spatiotemporal Fusion: Advances, Challenges, and Opportunities
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2504.00901