T4.2.1: code and data to predict carbon pools from observable variables for North-West European Shelf
Data source: the model was trained on a 5 year 2016-2020 NEMO-FABM-ERSEM free run for the North-West European Shelf, which is stored on a Met Office MASS storage facility and can be obtained upon request. The original, daily and 7km, model run outputs were coarsened to a 35km and 10-daily scale, with these training data stored in the ./data folder. The model was subsequently tested on data from a Met Office reanalysis for the 2016-2020 period (taken from bi-decadal Met Office reanalysis provided for Copernicus: https://data.marine.copernicus.eu/product/NWSHELF_MULTIYEAR_BGC_004_011/description). The reanalysis assimilated physics data and PFT chlorophyll, and the coarsened reanalysis data are also stored in the ./data folder. The original reanalysis 7km and daily data can be either downloaded from Copernicus, or the variables not offered by Copernicus can be obtained from the Met Office MASS storage system upon request.
Metrics: the model was validated using metrics such as R^2 as part hyperparameter tuning, but the validation on the test reanalysis data was performed using total spatial bias and the bias-corrected Root-Mean-Square-Difference (RMSD), as well as R^2. The details can be found in Skakala (2025), preprint on https://arxiv.org/abs/2508.10178.
Baseline: from one perspective reanalysis provides a baseline for comparison, from another it is the model free run. There is however not a trivial baseline to choose from.
The core code can be found in class_ML_carbon.py and can be run using two Jupyter notebooks provided. In the Skakala 2025 study a deep ensemble of 15 models was run, however to save space only weights for one of the members are uploaded here in the ./model directory. The model uses SST, surface salinity and surface PFT chlorophyll, together with atmospheric (SWR, wind speed), riverine discharge data and structural variables (latitude, longitude, day of the year, bathymetry) as inputs, predicting 5 surface carbon pools (detritus, DOC, total zooplankton, heterotrophic bacteria and DIC), as well as vertically averaged pools for the same variables except for zooplankton. The skill is very good for surface variables (in terms of how the model generalizes to reanalysis), however for vertically averaged variables the model fails to generalize well. It is therefore recommended to use it for surface variables only.
Citations: Skakala (2025), Estimating carbon pools in the shelf sea environment: reanalysis or model-informed machine learning?, submitted to JGR-Machine Learning and Computation, preprint on https://arxiv.org/abs/2508.10178.
Dependencies: python version 3.9.21, numpy version 1.26.4, TensorFlow version 2.16.1, netCDF4 version 1.5.8