:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yu, Zhenyu, Idris, Mohd. Yamani Idna, Wang, Pei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.13442
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From Physics to Foundation Models: A Review of AI-Driven Quantitative Remote Sensing Inversion
by: Yu, Zhenyu, et al.
Published: (2025)

DC4CR: When Cloud Removal Meets Diffusion Control in Remote Sensing
by: Yu, Zhenyu, et al.
Published: (2025)

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies
by: Sarkar, Ayushman, et al.
Published: (2025)

SatelliteFormula: Multi-Modal Symbolic Regression from Remote Sensing Imagery for Physics Discovery
by: Yu, Zhenyu, et al.
Published: (2025)

Rainy: Unlocking Satellite Calibration for Deep Learning in Precipitation
by: Yu, Zhenyu, et al.
Published: (2025)

DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling
by: Sarkar, Ayushman, et al.
Published: (2026)

A Diffusion-Based Framework for Terrain-Aware Remote Sensing Image Reconstruction
by: Yu, Zhenyu, et al.
Published: (2025)

Improved implicit diffusion model with knowledge distillation to estimate the spatial distribution density of carbon stock in remote sensing imagery
by: Yu, Zhenyu, et al.
Published: (2024)

ForgetMe: Evaluating Selective Forgetting in Generative Models
by: Yu, Zhenyu, et al.
Published: (2025)

StoryState: Agent-Based State Control for Consistent and Editable Storybooks
by: Sarkar, Ayushman, et al.
Published: (2026)

ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation
by: Sarkar, Ayushman, et al.
Published: (2026)

DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images
by: Yu, Zhenyu, et al.
Published: (2025)

A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge
by: Dahri, Tarique, et al.
Published: (2025)

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
by: Wang, Di, et al.
Published: (2024)

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
by: Liu, Fan, et al.
Published: (2023)

UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models
by: Li, Yujie, et al.
Published: (2024)

Vision Foundation Models in Remote Sensing: A Survey
by: Lu, Siqi, et al.
Published: (2024)

A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
by: Huang, Ziyue, et al.
Published: (2025)

RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks
by: Wang, Zhechao, et al.
Published: (2024)

Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)
by: Yao, Kelu, et al.
Published: (2025)

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring
by: Bountos, Nikolaos Ioannis, et al.
Published: (2023)

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
by: Zhang, Yingying, et al.
Published: (2025)

Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images
by: Zou, Xuechao, et al.
Published: (2024)

Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing
by: Adler, Mathis Jürgen, et al.
Published: (2025)

DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes
by: Guo, Junlin, et al.
Published: (2025)

CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation
by: Yi, Zhiwei, et al.
Published: (2025)

RSDehamba: Lightweight Vision Mamba for Remote Sensing Satellite Image Dehazing
by: Zhou, Huiling, et al.
Published: (2024)

Towards Remote Sensing Change Detection with Neural Memory
by: Yang, Zhenyu, et al.
Published: (2026)

Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)

An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain
by: Silva, João Daniel, et al.
Published: (2025)

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
by: Guo, Xin, et al.
Published: (2023)

Foundation Models for Remote Sensing and Earth Observation: A Survey
by: Xiao, Aoran, et al.
Published: (2024)

CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation
by: Gong, Ziyang, et al.
Published: (2024)

FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing
by: Corley, Isaac, et al.
Published: (2025)

VFM-ISRefiner: Towards Better Adapting Vision Foundation Models for Interactive Segmentation of Remote Sensing Images
by: Wang, Deliang, et al.
Published: (2025)

RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning
by: Hu, Huiyang, et al.
Published: (2025)

Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models
by: Blumenstiel, Benedikt, et al.
Published: (2024)

Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
by: Zheng, Zhuo, et al.
Published: (2024)

SIGMAE: A Spectral-Index-Guided Foundation Model for Multispectral Remote Sensing
by: Zhang, Xiaokang, et al.
Published: (2026)

GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing
by: Ma, Xianzhi, et al.
Published: (2025)