-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor forecast #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ing number of components, predict with skforecast
…gnature for get_component Creates a setter method to pass neccessary variables from simulation class to the decompostion class Corrects method signature for get_component in forecast.py and dimensionality_reduction.py
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the forecasting code by separating dimensionality reduction and forecasting techniques into dedicated classes, updating configuration files, and modifying related scripts for consistency. Key changes include updating function signatures (e.g. adding a filename parameter to prepare), refactoring PCA/decomposition routines to leverage strategy classes, and adjusting forecast prediction reconstruction and error evaluation.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
techniques_config.yaml | Added configuration entries to specify DR and forecast techniques. |
ocean_terms.yaml | Introduced a YAML file to map ocean term names for flexible usage. |
main_forecast.py | Updated function signatures, logging, and file handling logic. |
lib/restart.py | Added YAML loading for ocean terms in restart functions. |
lib/forecast_technique.py | Refactored forecast techniques including direct and recursive approaches. |
lib/forecast.py | Modified simulation loading, decomposition, reconstruction and forecasting logic. |
lib/dimensionality_reduction.py | Updated PCA and KernelPCA implementations with new methods for reconstruction. |
Comments suppressed due to low confidence (3)
main_forecast.py:557
- The revised logic for generating x_pred may not cover the full intended time range; please verify that the prediction array handles the simulation length and forecast steps as expected.
x_pred = np.arange(1 + pas, 1 + steps * pas, pas).reshape(-1, 1)
lib/forecast.py:156
- The comparison 'if self.len < self.end' may lead to errors if self.end is None; ensure that self.end is always set to a valid integer value or introduce a conditional branch to handle the None case.
array = [self.loadFile(file) for file in self.files if self.len < self.end]
main_forecast.py:85
- [nitpick] Minor typo: 'forcasted' should be corrected to 'forecasted'.
print(f"{term} time series forcasted")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running through Jumper.ipynb. Some small nitpicking issues (will look over main structural changes next).
I deployed with python==3.11 and ran the Jumper.ipynb:
- I modified the path to point to
Grid_T
files. - Error is returned with KPCA due to the setting of 0.9 for components
- I modified the
DR_technique
intechniques_config.yaml
to "DimensionalityReductionPCA
"- Could we alias these names to make them a bit more readable?
- I modified "
DimensionalityReductionPCA
" to "DimensionalityReductionABC
" and I get aKeyError 'DimensionalityReductionABC'
- could a more helpful error message be provided?
- Word "Compilated" is not correct (remnant of Maud's notebook).
- Error bars look a little weird (using jupyter notebook defaults) - I loaded the data in the google drive.

simu_predicted
needs to be manually created otherwise predictions will fail (again a remnant from Maud's code). Consider adding to cell 12 - where it was previously.- When plotting evaluation
AttributeError: 'Simulation' object has no attribute 'getPC'
- does thegetPC
function exist anymore? Perhaps this isn't important?
Notebooks/Jumper.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that ruff check <file.py
passes. We need to add this as a check in the CI as well. I can add the linting rules I've used for Spinup-Evaluation
as a new PR.
The getPC method was refactored to be the get_component method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces skforecast Forecasters, sklearn regressors and a dimensionality reduction class.
Within forecast_technique.py we create instances of the combined regressor and forecaster. So we essentially create all available options here.
These are then imported in the forecast.py
At present, all forecasters, regressors and dimensionality reduction algorithms are imported to the user in the forecast.py
Added new files:
dimensionality_reduction.py
forecast_technique.py
Configuration files
ocean_terms.yaml
techniques_config.yaml
ocean_terms.yaml
now generalises the inputs quite nicely. Used in restart.py
.
General Issues
- Add a quick sketch of an updated
README.md
(already mentioned). Generally we need more comments and docs. - Testing - We need this project wide. I'll help with this.
- Testing (separate PR coming). Avoid assertions in favour of Exceptions.
- Formatting / linting issues need addressing - To fix some automatically: Run
ruff check --fix
. Others will need manually fixed. Make sure local install of Ruff is up to dateruff check
- STYLE: Inconsistent formatting in docstrings. Consider adopting
numpy
style https://numpydoc.readthedocs.io/en/latest/format.html#parameters. I've added a PR that imposes this style project wide using github actions.create_gp_regressor
inlib/forecast_technique.py
is a good example of how to do it. - STYLE: Could you add some static type hinting where you think it'd be useful? This is to get us ready for
mypy
in future. - STYLE: Consider renaming
forecast_technique.py
toforecast_method.py
? - FUNCTIONALITY: I don't think the parallel jobs in Forecast works. (MINOR)
- STYLE:
snake_case
for function and method names - class names have CamelCase / PascalCase. Again, these are remnants from Maud's code. - Remove optimisation from
lib/forecast.py
move to a different class? I'm not sure what this is currently used for? - STYLE: Needs unifying (Run
ruff check
with or without--fix
. - Possibly need to refactor forecast_technique again (and rename).
- Could we add a simple regressor like linear or tree based?
- We need to think about how to generalise this further so we do not hardcode the regressor and Forecaster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I reviewed this previously. Will run through again and check if it runs without error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely modularised now. getPC()
-> get_component()
.
General comments
- Need some tests
- Avoid asserts - replace with ValueError or whatever's more relevant.
- Probably better as a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
# TODO: Change to DR_technique | ||
self.int_mask, ts_array = DimensionalityReductionPCA.reconstruct_predictions( | ||
predictions, n, self.info, begin | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to generalise this now to DimensionalityReductionMethod
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1) | ||
# x_pred = np.arange(0, (len(self) + steps) * pas, pas).reshape(-1, 1) #TODO: Check this len(self) + steps logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this defintely been checked? I recall discussing this - but I can't remember what we decided on. I suspect the old implementation was correct?
from lib.forecast_technique import ( | ||
GaussianProcessForecaster, | ||
GaussianProcessRecursiveForecaster, | ||
) | ||
from lib.dimensionality_reduction import ( | ||
DimensionalityReductionPCA, | ||
DimensionalityReductionKernelPCA, | ||
) | ||
|
||
|
||
sys.path.insert(0, "../") | ||
|
||
path_to_nemo_directory = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | ||
path_to_nemo_directory = Path(path_to_nemo_directory) | ||
|
||
warnings.filterwarnings("ignore") | ||
|
||
Forecast_techniques = { | ||
"GaussianProcessForecaster": GaussianProcessForecaster, | ||
"GaussianProcessRecursiveForecaster": GaussianProcessRecursiveForecaster, | ||
} | ||
Dimensionality_reduction_techniques = { | ||
"PCA": DimensionalityReductionPCA, | ||
"KernelPCA": DimensionalityReductionKernelPCA, | ||
} | ||
|
||
|
||
with open(path_to_nemo_directory / "techniques_config.yaml", "r") as f: | ||
config = yaml.safe_load(f) | ||
|
||
|
||
if config["DR_technique"]["name"] not in Dimensionality_reduction_techniques: | ||
raise KeyError( | ||
f"DR_technique {config['DR_technique']['name']} not found. Have you specified a valid dimensionality reduction technique in the config file?" | ||
) | ||
else: | ||
DR_technique = Dimensionality_reduction_techniques[config["DR_technique"]["name"]] | ||
|
||
if config["Forecast_technique"]["name"] not in Forecast_techniques: | ||
raise KeyError( | ||
f"Forecast_technique {config['Forecast_technique']['name']} not found. Have you specified a valid forecasting technique in the config file?" | ||
) | ||
else: | ||
Forecast_technique = Forecast_techniques[config["Forecast_technique"]["name"]] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be best to move all of this to somewhere else like a main_forecast.py
? See file comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- STYLE: Lower case
Forecast
function - STYLE: There's a
prepare
method inSimulation
class as well asmain_forecast
andPredictions
class. It's quite confusing (I realise this is a remnant of Maud's code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are similarly named functions in Simulation
and Prediction
too, which makes the whole code quite hard to navigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defines the forecasting strategy i.e. recursive, direct and the regressor i.e. GP or regression.
- Regressor is defined here.
- GP, regression etc.
- Do all regressors have
predict(return_std=True)
?- suggests that all regressors have uncertainty outputs
- Lower case these i.e.:
GaussianProcessRecursiveForecaster
togp_recursive_forecaster
?- They are instances of a class (i.e a Forecaster that takes a regressor as parameter).
- But they are treated in much the same way as dimensionality reduction classes - this is potentially confusing.
- Rename
ForcastTechnique
to something likeForecastMethod
orForecastStrategy
. - Could this code be generalised further?
- Drawbacks: Hardcoded forecast steps / lags in GP and quite tight coupling to skforecast
- I could imagine a situation where instead of hardcoding combinations like you have done. You could 'register' different regressors and 'forecasters'. And then use a function to combine them.
- And use a configuration yaml file which specifies the regressor and forecast method you want a long with their parameters - so they aren't hardcoded.
- We can leave this to a future PR if you want.
- Happy to help with a PR here or provide an example.
- OR we could just define all of the available options a little more cleanly - this is better if we only expect to have a few regressor + forecaster combos.
- We could move instantiation of regressors and forecasters into a separate module.
- Testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This is nicely abstracted now with the
get_ocean_term
function. The idea, I suppose, is to generalise this "toolbox" to work for other restart files i.e. Oceananigans?- This
get_ocean_term()
function needs a docstring. I understand it gets the name of the generated[so,toce,soce].npy
file.
- This
- STYLE: Think about the naming of the variables in this file. I would avoid thetao, so and zos / ssh. The restart convention of NEMO / DINO is heavily embedded in here. (
restart["tn"]
,restart["sshn"]
etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Would be nice to have the
config
file that you're hinting at inmain_forecast.py
. I.e. abstract out the DINO and ssh/soce/toce mapping. You are already using this in therestart.py
- could it also be used here as well?
I can open a PR to start addressing some of these, if that helps? |
The aim is to refactor the forecast.py code to make it easier to swap between different dimensionality reduction and forecasting techniques.
Refactored
jumper.ipynb
,main_forecast.py
and main to reflect changes to the refactored libraries. The jumper notebook is specific to using normal PCA (not Kernal PCA)Separated code into forecasting technique classes and dimensionality reduction classes
forecast.py
calls the methods defined in the above two classesThe DR and forecasting techniques created are imported and stored in a dictionary
The DR and forecasting techniques the user wants to use can be specified by setting the name within
techniques_config.yaml
ocean_terms.yaml
defines the names of the terms stored in the NEMO grid files, e.g. different configurations of nemo use different names for temperature, ssh and salinity. You can specify which names are used in the yaml file. andrestart.py
will then be able to read the correct keys.Tested that the refactored code runs end-to-end
Test written against the non-refactored code need to be written.
Closes #37 #52 #9 #44 #48