Refactor forecast #39

isaacaka · 2025-03-04T15:26:33Z

The aim is to refactor the forecast.py code to make it easier to swap between different dimensionality reduction and forecasting techniques.

Refactored jumper.ipynb, main_forecast.py and main to reflect changes to the refactored libraries. The jumper notebook is specific to using normal PCA (not Kernal PCA)
Separated code into forecasting technique classes and dimensionality reduction classes
forecast.py calls the methods defined in the above two classes
The DR and forecasting techniques created are imported and stored in a dictionary
The DR and forecasting techniques the user wants to use can be specified by setting the name within techniques_config.yaml
ocean_terms.yaml defines the names of the terms stored in the NEMO grid files, e.g. different configurations of nemo use different names for temperature, ssh and salinity. You can specify which names are used in the yaml file. and restart.py will then be able to read the correct keys.

Tested that the refactored code runs end-to-end

Test written against the non-refactored code need to be written.

Closes #37 #52 #9 #44 #48

…e code

…ing number of components, predict with skforecast

…gnature for get_component Creates a setter method to pass neccessary variables from simulation class to the decompostion class Corrects method signature for get_component in forecast.py and dimensionality_reduction.py

…eviation

review-notebook-app · 2025-04-22T13:59:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull Request Overview

This PR refactors the forecasting code by separating dimensionality reduction and forecasting techniques into dedicated classes, updating configuration files, and modifying related scripts for consistency. Key changes include updating function signatures (e.g. adding a filename parameter to prepare), refactoring PCA/decomposition routines to leverage strategy classes, and adjusting forecast prediction reconstruction and error evaluation.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
techniques_config.yaml	Added configuration entries to specify DR and forecast techniques.
ocean_terms.yaml	Introduced a YAML file to map ocean term names for flexible usage.
main_forecast.py	Updated function signatures, logging, and file handling logic.
lib/restart.py	Added YAML loading for ocean terms in restart functions.
lib/forecast_technique.py	Refactored forecast techniques including direct and recursive approaches.
lib/forecast.py	Modified simulation loading, decomposition, reconstruction and forecasting logic.
lib/dimensionality_reduction.py	Updated PCA and KernelPCA implementations with new methods for reconstruction.

Comments suppressed due to low confidence (3)

main_forecast.py:557

The revised logic for generating x_pred may not cover the full intended time range; please verify that the prediction array handles the simulation length and forecast steps as expected.

x_pred = np.arange(1 + pas, 1 + steps * pas, pas).reshape(-1, 1)

lib/forecast.py:156

The comparison 'if self.len < self.end' may lead to errors if self.end is None; ensure that self.end is always set to a valid integer value or introduce a conditional branch to handle the None case.

array = [self.loadFile(file) for file in self.files if self.len < self.end]

main_forecast.py:85

[nitpick] Minor typo: 'forcasted' should be corrected to 'forecasted'.

print(f"{term} time series forcasted")

lib/forecast_technique.py

ma595

Running through Jumper.ipynb. Some small nitpicking issues (will look over main structural changes next).

I deployed with python==3.11 and ran the Jumper.ipynb:

I modified the path to point to Grid_T files.
Error is returned with KPCA due to the setting of 0.9 for components
I modified the DR_technique in techniques_config.yaml to "DimensionalityReductionPCA"
- Could we alias these names to make them a bit more readable?
- I modified "DimensionalityReductionPCA" to "DimensionalityReductionABC" and I get a KeyError 'DimensionalityReductionABC' - could a more helpful error message be provided?
Word "Compilated" is not correct (remnant of Maud's notebook).
Error bars look a little weird (using jupyter notebook defaults) - I loaded the data in the google drive.

simu_predicted needs to be manually created otherwise predictions will fail (again a remnant from Maud's code). Consider adding to cell 12 - where it was previously.
When plotting evaluation AttributeError: 'Simulation' object has no attribute 'getPC' - does the getPC function exist anymore? Perhaps this isn't important?

ma595 · 2025-05-02T09:56:49Z

Notebooks/Jumper.ipynb

A few initial comments.

Make sure that ruff check <file.py passes. We need to add this as a check in the CI as well. I can add the linting rules I've used for Spinup-Evaluation as a new PR.

The getPC method was refactored to be the get_component method

isaacaka · 2025-05-09T15:01:32Z

Running through Jumper.ipynb. Some small nitpicking issues (will look over main structural changes next).

I deployed with python==3.11 and ran the Jumper.ipynb:

I modified the path to point to Grid_T files.

Error is returned with KPCA due to the setting of 0.9 for components

I modified the DR_technique in techniques_config.yaml to "DimensionalityReductionPCA"

Could we alias these names to make them a bit more readable?

I modified "DimensionalityReductionPCA" to "DimensionalityReductionABC" and I get a KeyError 'DimensionalityReductionABC' - could a more helpful error message be provided?

Word "Compilated" is not correct (remnant of Maud's notebook).

Error bars look a little weird (using jupyter notebook defaults) - I loaded the data in the google drive.

* `simu_predicted` needs to be manually created otherwise predictions will fail (again a remnant from Maud's code). Consider adding to cell 12 - where it was previously. * When plotting evaluation `AttributeError: 'Simulation' object has no attribute 'getPC'` - does the `getPC` function exist anymore? Perhaps this isn't important?

Thanks, I renamed them to just KernelPCA and PCA and raised an exception with a message when an invalid technique is specified.
The error bars are a bit off set. They should start at t=20 rather than t=0
getPC is now get_component

ma595

This PR introduces skforecast Forecasters, sklearn regressors and a dimensionality reduction class.

Within forecast_technique.py we create instances of the combined regressor and forecaster. So we essentially create all available options here.

These are then imported in the forecast.py

At present, all forecasters, regressors and dimensionality reduction algorithms are imported to the user in the forecast.py

Added new files:

dimensionality_reduction.py
forecast_technique.py

Configuration files

ocean_terms.yaml
techniques_config.yaml

ocean_terms.yaml now generalises the inputs quite nicely. Used in restart.py.

General Issues

Add a quick sketch of an updated README.md (already mentioned). Generally we need more comments and docs.
Testing - We need this project wide. I'll help with this.
- Testing (separate PR coming). Avoid assertions in favour of Exceptions.
Formatting / linting issues need addressing - To fix some automatically: Run ruff check --fix . Others will need manually fixed. Make sure local install of Ruff is up to date ruff check
STYLE: Inconsistent formatting in docstrings. Consider adopting numpy style https://numpydoc.readthedocs.io/en/latest/format.html#parameters. I've added a PR that imposes this style project wide using github actions. create_gp_regressor in lib/forecast_technique.py is a good example of how to do it.
STYLE: Could you add some static type hinting where you think it'd be useful? This is to get us ready for mypy in future.
STYLE: Consider renaming forecast_technique.py to forecast_method.py?
FUNCTIONALITY: I don't think the parallel jobs in Forecast works. (MINOR)
STYLE: snake_case for function and method names - class names have CamelCase / PascalCase. Again, these are remnants from Maud's code.
Remove optimisation from lib/forecast.py move to a different class? I'm not sure what this is currently used for?
STYLE: Needs unifying (Run ruff check with or without --fix.
Possibly need to refactor forecast_technique again (and rename).
Could we add a simple regressor like linear or tree based?
We need to think about how to generalise this further so we do not hardcode the regressor and Forecaster?

ma595 · 2025-05-14T09:26:05Z

Notebooks/Jumper.ipynb

I think I reviewed this previously. Will run through again and check if it runs without error.

ma595 · 2025-05-14T09:28:19Z

lib/dimensionality_reduction.py

Nicely modularised now. getPC() -> get_component().

General comments

Need some tests

Avoid asserts - replace with ValueError or whatever's more relevant.

Probably better as a PR

ma595 · 2025-05-14T09:29:52Z

lib/forecast.py

+        # TODO: Change to DR_technique
+        self.int_mask, ts_array = DimensionalityReductionPCA.reconstruct_predictions(
+            predictions, n, self.info, begin
+        )


Would be great to generalise this now to DimensionalityReductionMethod or similar

ma595 · 2025-05-14T09:31:31Z

lib/forecast.py

-        x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1)
+        # x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1) #TODO: Check this len(self) + steps logic


Has this defintely been checked? I recall discussing this - but I can't remember what we decided on. I suspect the old implementation was correct?

ma595 · 2025-05-14T09:32:27Z

lib/forecast.py

+from lib.forecast_technique import (
+    GaussianProcessForecaster,
+    GaussianProcessRecursiveForecaster,
+)
+from lib.dimensionality_reduction import (
+    DimensionalityReductionPCA,
+    DimensionalityReductionKernelPCA,
+)
+
+
+sys.path.insert(0, "../")
+
+path_to_nemo_directory = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+path_to_nemo_directory = Path(path_to_nemo_directory)

 warnings.filterwarnings("ignore")

+Forecast_techniques = {
+    "GaussianProcessForecaster": GaussianProcessForecaster,
+    "GaussianProcessRecursiveForecaster": GaussianProcessRecursiveForecaster,
+}
+Dimensionality_reduction_techniques = {
+    "PCA": DimensionalityReductionPCA,
+    "KernelPCA": DimensionalityReductionKernelPCA,
+}
+
+
+with open(path_to_nemo_directory / "techniques_config.yaml", "r") as f:
+    config = yaml.safe_load(f)
+
+
+if config["DR_technique"]["name"] not in Dimensionality_reduction_techniques:
+    raise KeyError(
+        f"DR_technique {config['DR_technique']['name']} not found. Have you specified a valid dimensionality reduction technique in the config file?"
+    )
+else:
+    DR_technique = Dimensionality_reduction_techniques[config["DR_technique"]["name"]]
+
+if config["Forecast_technique"]["name"] not in Forecast_techniques:
+    raise KeyError(
+        f"Forecast_technique {config['Forecast_technique']['name']} not found. Have you specified a valid forecasting technique in the config file?"
+    )
+else:
+    Forecast_technique = Forecast_techniques[config["Forecast_technique"]["name"]]
+


Would it be best to move all of this to somewhere else like a main_forecast.py? See file comments.

ma595 · 2025-05-14T09:33:06Z

lib/forecast.py

STYLE: Lower case Forecast function

STYLE: There's a prepare method in Simulation class as well as main_forecast and Predictions class. It's quite confusing (I realise this is a remnant of Maud's code)

There are similarly named functions in Simulation and Prediction too, which makes the whole code quite hard to navigate.

ma595 · 2025-05-14T09:35:19Z

lib/forecast_technique.py

This defines the forecasting strategy i.e. recursive, direct and the regressor i.e. GP or regression.

Regressor is defined here.

GP, regression etc.

Do all regressors have predict(return_std=True)?

suggests that all regressors have uncertainty outputs

Lower case these i.e.: GaussianProcessRecursiveForecaster to gp_recursive_forecaster?

They are instances of a class (i.e a Forecaster that takes a regressor as parameter).

But they are treated in much the same way as dimensionality reduction classes - this is potentially confusing.

Rename ForcastTechnique to something like ForecastMethod or ForecastStrategy.

Could this code be generalised further?

Drawbacks: Hardcoded forecast steps / lags in GP and quite tight coupling to skforecast

I could imagine a situation where instead of hardcoding combinations like you have done. You could 'register' different regressors and 'forecasters'. And then use a function to combine them.

And use a configuration yaml file which specifies the regressor and forecast method you want a long with their parameters - so they aren't hardcoded.

We can leave this to a future PR if you want.

Happy to help with a PR here or provide an example.

OR we could just define all of the available options a little more cleanly - this is better if we only expect to have a few regressor + forecaster combos.

We could move instantiation of regressors and forecasters into a separate module.

Testing

ma595 · 2025-05-14T09:37:27Z

lib/restart.py

This is nicely abstracted now with the get_ocean_term function. The idea, I suppose, is to generalise this "toolbox" to work for other restart files i.e. Oceananigans?

This get_ocean_term() function needs a docstring. I understand it gets the name of the generated [so,toce,soce].npy file.

STYLE: Think about the naming of the variables in this file. I would avoid thetao, so and zos / ssh. The restart convention of NEMO / DINO is heavily embedded in here. (restart["tn"], restart["sshn"] etc.)

ma595 · 2025-05-14T09:39:40Z

main_forecast.py

Would be nice to have the config file that you're hinting at in main_forecast.py. I.e. abstract out the DINO and ssh/soce/toce mapping. You are already using this in the restart.py - could it also be used here as well?

ma595 · 2025-05-20T12:37:35Z

ruff check gives the following:

Found 111 errors.
[*] 43 fixable with the `--fix` option (16 hidden fixes can be enabled with the `--unsafe-fixes` option).

I can open a PR to start addressing some of these, if that helps?

ma595 · 2025-05-20T13:49:16Z

Have issues #9, #49, #48 and #44 been addressed with this PR @isaacaka? If so, can you update the PR description accordingly.

isaacaka added 21 commits March 4, 2025 15:16

Initial refactor of dimensionality reduction and forecasting techniqu…

e8583b9

…e code

Move component reconstruction to Simulation class, functionalise gett…

362a936

…ing number of components, predict with skforecast

Refactored forecasting code to implement an abstract base

5d3ba45

Adds non-skforecast based gaussian process

0f33543

Typo in name of forecaster class

6d4398f

Adds skforecast to requirements.txt

b66cdd5

Changes rmse function to more general error function in ABC class

fad57b2

Add assert on the required dimensionality of the data

514e2d5

Fixed bug where number of years forecasted were incorrect

11fa2f2

change to append whole map instead of first value in map

67c9cac

Calls the correct names of files for forecasted components

2e67c54

Changes parameters from tuple to seperate named variables

40a3351

Reorganises specifying DR and forecasting techniques

cfbe148

Adds a config file for specifying DR/forecaster techniques

a74b54b

Adds config file for specifying techniques

993f3d1

Fix import error

376ee6e

Add config file for names of forecasted ocean terms

1565562

Add additional dimensionality check

de32224

Changed config file structure

4ffb7ab

Change naming from rmse to error

71167d5

Removed TODO comments

a6d35ff

ma595 marked this pull request as draft April 9, 2025 19:39

ma595 changed the title ~~Refactor forecast~~ [DRAFT] Refactor forecast Apr 9, 2025

ma595 changed the title ~~[DRAFT] Refactor forecast~~ [WIP] Refactor forecast Apr 9, 2025

ma595 mentioned this pull request Apr 17, 2025

Refactor class hierarchy #52

Open

isaacaka added 5 commits April 17, 2025 16:50

Changes Forecaster parameters, adds placeholder values for standard d…

7ba9cef

…eviation

Adds filepaths

ddb9ba9

renames config.yaml -> techniques_config.yaml

56c92e8

Updated notebook to reflect refactor of libraries

00909ea

Parameterised classes, forecaster can now take a regressor as input

f2dbdfa

isaacaka added 3 commits April 22, 2025 15:12

Formats notebook

254ee7b

Parameterises create_gp_regressor

c5f83aa

Remove unused args

0d1b519

isaacaka requested review from ma595 and surbhigoel77 April 23, 2025 15:11

isaacaka marked this pull request as ready for review April 24, 2025 09:06

Copilot AI review requested due to automatic review settings April 24, 2025 09:06

Copilot AI reviewed Apr 24, 2025

View reviewed changes

lib/forecast_technique.py Outdated Show resolved Hide resolved

ma595 assigned isaacaka Apr 29, 2025

ma595 added the iccs label Apr 29, 2025

ma595 changed the title ~~[WIP] Refactor forecast~~ Refactor forecast Apr 29, 2025

Remove conditional handling of predict arguments

c5c2485

ma595 requested changes May 2, 2025

View reviewed changes

ma595 self-requested a review May 8, 2025 14:12

isaacaka added 4 commits May 9, 2025 14:47

Raise an error when an invalid technique is specified

15ec979

Fix typo in jumpper notebook, rename dictionary key in forecast.py

b134891

Change the method call from getPC to get_component

0fda991

The getPC method was refactored to be the get_component method

Changed name of technique

3280eb9

ma595 requested changes May 14, 2025

View reviewed changes

		x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1)
		# x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1) #TODO: Check this len(self) + steps logic

Refactor forecast #39

Are you sure you want to change the base?

Refactor forecast #39

Uh oh!

Conversation

isaacaka commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

ma595 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ma595 May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

isaacaka commented May 9, 2025

Uh oh!

ma595 left a comment

Choose a reason for hiding this comment

General Issues

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ma595 commented May 20, 2025

Uh oh!

ma595 commented May 20, 2025

Uh oh!

Uh oh!

isaacaka commented Mar 4, 2025 •

edited

Loading

ma595 left a comment •

edited

Loading

ma595 May 6, 2025 •

edited

Loading