You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue aims to consolidate and clarify the current set of satellite-derived datasets relevant for solar radiation nowcasting within the MLCast community.
The goal is to:
Provide a clear overview of available and relevant satellite-derived surface solar radiation datasets
Clarify differences in data source, processing level, and intended use
Serve as a hub linking to dataset-specific sub-issues
This follows recent discussions in the MLCast community about avoiding duplication of preprocessing efforts and aligning on a shared AI-ready data foundation.
🧭 Motivation
There is currently some ambiguity across the community regarding:
Which satellite-derived solar radiation datasets exist
How they differ in terms of:
satellite source (MSG vs MTG)
processing method
spatial/temporal resolution
intended use (climate product vs AI-ready training data)
This issue is intended to resolve that by collecting and structuring the landscape.
Summary tables
🟦 1. Operational input datasets
These datasets represent the direct satellite observation space, closest to the operational data that flows into met services through EUMETCAST.
They are intended to be used as:
inputs (X) to ML models
Dataset
Satellite
Role
MSG SEVIRI L1C
MSG (operational)
Raw multispectral radiances
MTG FCI L1C (proposed)
MTG (operational)
Operational successor to MSG
Related issue on obtaining an ML-ready zarr dataset from MTG FCI level 1c data: #43
🟨 2. Derived geophysical products (target variables) for surface solar radiation (SSR)
These datasets represent physically interpreted variables, derived from satellite observations via retrieval algorithms or radiative transfer models.
They are typically used as:
training targets
evaluation / benchmarking datasets
physical reference products
Dataset
Satellite/Instrument
Variable
Type
MSGCPP
MSG SEVIRI
GHI
Operational surface solar radiation + clearsky radiation product
SARAH-3
MSG SEVIRI
GHI
Climate data record of surface solar radiation, 5 day latency
HANNA
MSG SEVIRI
GHI
High-resolution surface solar radiation demonstrator
DWD SSR
MSG SEVIRI / MTG FCI (emerging)
GHI
Operational surface solar radiation product
LSA-SAF MDSSFTD
MSG SEVIRI
GHI?
Surface solar radiation (land only)
🛰️ Candidate Surface Solar Radiation datasets
1. MSGCPP (MSG Cloud Physical Properties) (KNMI)
Source: Meteosat Second Generation (MSG)
Type: Derived product (radiative transfer / physical retrievals)
Variables: Surface solar radiation, clearsky radiation, etc.
Used in solar applications (e.g. retraining IrradianceNet @olah-soma, operationally used @geosphere with IrradPhyDNet and DE_330 on euro-cordex domain, used in post-processing for PV production nowcasts)
Limited by land-only coverage
❓ Open questions for the community
Which derived SSR product(s) should be used as:
training targets?
evaluation benchmarks?
Do we want a community benchmark comparison paper/study across SSR products?
This issue aims to consolidate and clarify the current set of satellite-derived datasets relevant for solar radiation nowcasting within the MLCast community.
The goal is to:
This follows recent discussions in the MLCast community about avoiding duplication of preprocessing efforts and aligning on a shared AI-ready data foundation.
🧭 Motivation
There is currently some ambiguity across the community regarding:
This issue is intended to resolve that by collecting and structuring the landscape.
Summary tables
🟦 1. Operational input datasets
These datasets represent the direct satellite observation space, closest to the operational data that flows into met services through EUMETCAST.
They are intended to be used as:
Related issue on obtaining an ML-ready zarr dataset from MTG FCI level 1c data: #43
🟨 2. Derived geophysical products (target variables) for surface solar radiation (SSR)
These datasets represent physically interpreted variables, derived from satellite observations via retrieval algorithms or radiative transfer models.
They are typically used as:
🛰️ Candidate Surface Solar Radiation datasets
1. MSGCPP (MSG Cloud Physical Properties) (KNMI)
)
2. SARAH-3 (CM SAF)
3. HANNA (CM SAF demonstrator)
4. DWD SSR product (MSG + MTG)
5 LSA-SAF MDSSFTD (EUMETSAT SAF)
❓ Open questions for the community
👥 Contributors / interested parties
Tagging MLCasters possibly interested in this discussion:
@pdebuyl @ladc @leifdenby @franchg
A quick look at some of the datasets by @irenelivia:
