Saved in:
Bibliographic Details
Main Authors: Fan, Hang, Pei, Haoran, Liang, Runze, Liu, Weican, Cheng, Long, Wei, Wei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.04145
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911568473620480
author Fan, Hang
Pei, Haoran
Liang, Runze
Liu, Weican
Cheng, Long
Wei, Wei
author_facet Fan, Hang
Pei, Haoran
Liang, Runze
Liu, Weican
Cheng, Long
Wei, Wei
contents Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision backbone, extracts cloud-cover information from satellite images. The text encoder distills historical weather characteristics from textual descriptions. Second, to capture spatial dependencies across geographically distributed PV stations, a cross-site feature fusion mechanism is introduced. Specifically, a Graph Learner models inter-station correlations through a graph attention network constructed over a K-nearest-neighbor (KNN) graph, while a cross-site attention module further facilitates adaptive information exchange among sites. Finally, experiments conducted on data from eight PV stations in a northern province of China demonstrate the effectiveness of the proposed framework. Our proposed model is publicly available at https://github.com/rhp413/Solar-VLM.
format Preprint
id arxiv_https___arxiv_org_abs_2604_04145
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting
Fan, Hang
Pei, Haoran
Liang, Runze
Liu, Weican
Cheng, Long
Wei, Wei
Artificial Intelligence
Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision backbone, extracts cloud-cover information from satellite images. The text encoder distills historical weather characteristics from textual descriptions. Second, to capture spatial dependencies across geographically distributed PV stations, a cross-site feature fusion mechanism is introduced. Specifically, a Graph Learner models inter-station correlations through a graph attention network constructed over a K-nearest-neighbor (KNN) graph, while a cross-site attention module further facilitates adaptive information exchange among sites. Finally, experiments conducted on data from eight PV stations in a northern province of China demonstrate the effectiveness of the proposed framework. Our proposed model is publicly available at https://github.com/rhp413/Solar-VLM.
title Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting
topic Artificial Intelligence
url https://arxiv.org/abs/2604.04145