Saved in:
Bibliographic Details
Main Authors: Miranda, Miro, Pathak, Deepak, Helber, Patrick, Bischke, Benjamin, Najjar, Hiba, Mena, Francisco, Sanchez, Cristhian, Pai, Akshay, Arenas, Diego, Valdenegro-Toro, Matias, Charfuelan, Marcela, Nuske, Marlon, Dengel, Andreas
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.00940
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914437440471040
author Miranda, Miro
Pathak, Deepak
Helber, Patrick
Bischke, Benjamin
Najjar, Hiba
Mena, Francisco
Sanchez, Cristhian
Pai, Akshay
Arenas, Diego
Valdenegro-Toro, Matias
Charfuelan, Marcela
Nuske, Marlon
Dengel, Andreas
author_facet Miranda, Miro
Pathak, Deepak
Helber, Patrick
Bischke, Benjamin
Najjar, Hiba
Mena, Francisco
Sanchez, Cristhian
Pai, Akshay
Arenas, Diego
Valdenegro-Toro, Matias
Charfuelan, Marcela
Nuske, Marlon
Dengel, Andreas
contents Crop yield prediction requires substantial data to train scalable models. However, creating yield prediction datasets is constrained by high acquisition costs, heterogeneous data quality, and data privacy regulations. Consequently, existing datasets are scarce, low in quality, or limited to regional levels or single crop types, hindering the development of scalable data-driven solutions. In this work, we release YieldSAT, a large, high-quality, and multimodal dataset for high-resolution crop yield prediction. YieldSAT spans various climate zones across multiple countries, including Argentina, Brazil, Uruguay, and Germany, and includes major crop types, including corn, rapeseed, soybeans, and wheat, across 2,173 expert-curated fields. In total, over 12.2 million yield samples are available, each with a spatial resolution of 10 m. Each field is paired with multispectral satellite imagery, resulting in 113,555 labeled satellite images, complemented by auxiliary environmental data. We demonstrate the potential of large-scale and high-resolution crop yield prediction as a pixel regression task by comparing various deep learning models and data fusion architectures. Furthermore, we highlight open challenges arising from severe distribution shifts in the ground truth data under real-world conditions. To mitigate this, we explore a domain-informed Deep Ensemble approach that exhibits significant performance gains. The dataset is available at https://yieldsat.github.io/.
format Preprint
id arxiv_https___arxiv_org_abs_2604_00940
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction
Miranda, Miro
Pathak, Deepak
Helber, Patrick
Bischke, Benjamin
Najjar, Hiba
Mena, Francisco
Sanchez, Cristhian
Pai, Akshay
Arenas, Diego
Valdenegro-Toro, Matias
Charfuelan, Marcela
Nuske, Marlon
Dengel, Andreas
Computer Vision and Pattern Recognition
Crop yield prediction requires substantial data to train scalable models. However, creating yield prediction datasets is constrained by high acquisition costs, heterogeneous data quality, and data privacy regulations. Consequently, existing datasets are scarce, low in quality, or limited to regional levels or single crop types, hindering the development of scalable data-driven solutions. In this work, we release YieldSAT, a large, high-quality, and multimodal dataset for high-resolution crop yield prediction. YieldSAT spans various climate zones across multiple countries, including Argentina, Brazil, Uruguay, and Germany, and includes major crop types, including corn, rapeseed, soybeans, and wheat, across 2,173 expert-curated fields. In total, over 12.2 million yield samples are available, each with a spatial resolution of 10 m. Each field is paired with multispectral satellite imagery, resulting in 113,555 labeled satellite images, complemented by auxiliary environmental data. We demonstrate the potential of large-scale and high-resolution crop yield prediction as a pixel regression task by comparing various deep learning models and data fusion architectures. Furthermore, we highlight open challenges arising from severe distribution shifts in the ground truth data under real-world conditions. To mitigate this, we explore a domain-informed Deep Ensemble approach that exhibits significant performance gains. The dataset is available at https://yieldsat.github.io/.
title YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2604.00940