Skip to content

Add GEnsembleBuilder#222

Merged
ph-kev merged 12 commits intomainfrom
kp/g_ens_col
Sep 19, 2025
Merged

Add GEnsembleBuilder#222
ph-kev merged 12 commits intomainfrom
kp/g_ens_col

Conversation

@ph-kev
Copy link
Member

@ph-kev ph-kev commented Aug 22, 2025

closes #221, closes #211, closes #220

This PR adds GEnsembleBuilder to facilitate easy construction of the G ensemble matrix using the metadata in the observation.

@ph-kev ph-kev force-pushed the kp/g_ens_col branch 4 times, most recently from 7446005 to 770f21d Compare August 26, 2025 20:01
Comment on lines 56 to 59
pkgversion(ClimaAnalysis) > v"0.5.19" || error(
"Using GEnsembleBuilder requires a version of ClimaAnalysis above 0.5.19",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to continue supporting ClimaAnalysis v0.5.19 and below, since there will probably need to be a change to support passing flatten keyword arguments when constructing the covariance matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only support >0.5.19, I don't think we have users that want to use old versions of ClimaAnalysis with this package anyway.

Comment on lines +191 to +222
function _is_compatible_with_metadata(var::OutputVar, metadata::Metadata)
return _same_short_names(var, metadata) &&
_same_dim_names(var, metadata) &&
_same_dim_units(var, metadata) &&
_same_units(var, metadata) &&
# For the temporal dimension, only check if the times of metadata is
# a subset of the times of var, because _match_dates will get the
# correct dates for us
_compatible_dims_values(var, metadata)
end
Copy link
Member Author

@ph-kev ph-kev Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need to be changed, since it is not very helpful to the user right now. For example, if a OutputVar doesn't match with any of the metadata, it would help if there was logging that tell you why it can't match with the metadata that was tried. Instead, it tells you that none of the metadata the function tried match with the OutputVar. This could be resolved with a verbose keyword argument or another function that can be used to diagnose the issues. Furthermore, I would like this system to be extensible, where the user can pass in their own custom checks.

I was thinking of doing it in another PR to avoid this one to become more bloated than it is right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this will be an important addition in a future PR.

@ph-kev
Copy link
Member Author

ph-kev commented Aug 26, 2025

I think this is good enough for a review, but not ready to merge (still need to release a ClimaAnalysis version and missing/helpful features for building G ensemble matrix).

@ph-kev ph-kev requested a review from nefrathenrici August 26, 2025 20:12
Comment on lines +71 to +74
!!! info "Spinup and windowing times"
Internally, the correct dates are matched between the observational and
simulation data. As a result, you do not need to window the times (e.g. when
removing spinup) to match the times of the observations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure about this, since it could introduce correctness issues that might be overlooked. Although, it is convenient to automatically just match times, so you don't need to worry about windowing or slicing the time dimension.

::Type{FT},
ekp::EKP.EnsembleKalmanProcess,
) where {FT <: AbstractFloat}
pkgversion(ClimaAnalysis) > v"0.5.19" || error(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add a check for EKP as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to use the latest version of EnsembleKalmanProcesses and ClimaAnalysis, so this is not necessary.

Copy link
Member

@nefrathenrici nefrathenrici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great so far! I think we can push this a bit further and make more information easily available to the user.

Comment on lines 56 to 59
pkgversion(ClimaAnalysis) > v"0.5.19" || error(
"Using GEnsembleBuilder requires a version of ClimaAnalysis above 0.5.19",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only support >0.5.19, I don't think we have users that want to use old versions of ClimaAnalysis with this package anyway.

@ph-kev
Copy link
Member Author

ph-kev commented Sep 2, 2025

The _match_dates function can be more general and be moved to ClimaAnalysis as select which is a function that allows for arbitrary indexing of dimensions by index or value. See this issue CliMA/ClimaAnalysis.jl#323.

@ph-kev ph-kev force-pushed the kp/g_ens_col branch 2 times, most recently from 48d73fb to 3a44f20 Compare September 3, 2025 20:40
@ph-kev
Copy link
Member Author

ph-kev commented Sep 3, 2025

I think this is ready for another review. The only change that I want to make after this is maybe looking over how the functions are named and making another function in ClimaAnalysis to replace most of _match_dates.

Copy link
Member

@nefrathenrici nefrathenrici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to merge, I just left some small comments that we can discuss later today.

Comment on lines +280 to +285
allunique(ClimaAnalysis.dates(var)) || @warn(
"Dates in OutputVar with short name $(short_name(var)) are not unique. You will not be able to use GEnsembleBuilder",
)
end

any(==(""), ClimaAnalysis.short_name.(vars)) && @warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these just be warnings or actual errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline, but these should be warnings because an user might not want to use GEnsembleBuilder for whatever reason.

Comment on lines +191 to +222
function _is_compatible_with_metadata(var::OutputVar, metadata::Metadata)
return _same_short_names(var, metadata) &&
_same_dim_names(var, metadata) &&
_same_dim_units(var, metadata) &&
_same_units(var, metadata) &&
# For the temporal dimension, only check if the times of metadata is
# a subset of the times of var, because _match_dates will get the
# correct dates for us
_compatible_dims_values(var, metadata)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this will be an important addition in a future PR.

Comment on lines +96 to +98
ClimaCalibrate.EnsembleBuilder.is_complete
ClimaCalibrate.EnsembleBuilder.get_g_ensemble
ClimaCalibrate.EnsembleBuilder.ranges_by_short_name
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For is_complete, there should be a variant of this for checking a single column. This is helpful since if it is not completed for the first column, it probably won't be completed for any of the columns.

@ph-kev ph-kev force-pushed the kp/g_ens_col branch 4 times, most recently from cfc0329 to 6ecc0b9 Compare September 10, 2025 23:13
@ph-kev ph-kev force-pushed the kp/g_ens_col branch 4 times, most recently from 3de3183 to a3f6824 Compare September 11, 2025 21:12
@ph-kev ph-kev force-pushed the kp/g_ens_col branch 4 times, most recently from e797c55 to 741d576 Compare September 11, 2025 23:28
@ph-kev ph-kev force-pushed the kp/g_ens_col branch 2 times, most recently from dd617d6 to d980e59 Compare September 18, 2025 23:12
@ph-kev ph-kev force-pushed the kp/g_ens_col branch 3 times, most recently from e880a55 to 8446d54 Compare September 19, 2025 17:09
@ph-kev ph-kev merged commit 47215aa into main Sep 19, 2025
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_data_size is replaced by _data_length Typo in docs Add helper function to construct g ensemble

2 participants