issue
During some recent discussions it became clear that Though we have a couple of dispatching covariance definitions in ObservationRecipes, e.g., ScalarCovariance and SVDplusD, they do not act in a way that is truly modular. In particular the following needs to be true
"From a top level script, and for a given set of observation samples y_1,y_2,...,y_n, (and some user-inputs e.g. scalings), we should be able to define a variety of covariances (scalar*I, diagonal, SVDplusD) easily."
As the fundamental EKP objects (scalar*I,diagonal, SVDplusD) can all operate similarly on y_1,y_2,...,y_n type inputs, it would appear that there is some inconsistency in how the ObservationRecipe applies these constructions. After discussions with @ph-kev we possibly draw this back to perhaps having inconsistent mapping from ClimaAnalaysis OutputVars to define samples.
Possible direction:
the CovarianceEstimator types as they are, are doing too many different things, including handling different complexities in the stacking of data etc. Instead we could separate this task to a ObservationSamplesBuilder method that can generate an object that is considered to be i.i.d samples of data. Then the building of the covariance should be a simple (modular) task applying an estimation approach to this set of samples. Any complexity in the organization/stacking dimensions etc. related to this task based on OutputVars should be delagated to the ObservationSamplesBuilder .
In effect this is providing a struct that observation dispatches on, rather than the current kwarg-based approach?
I think one important consequence is that the different covariance estimator structures will no longer depend on the state vars. They will use the state when they are applied via a dispatched method (e.g. called estimate_covariance)
As a sketch... the user/outer loop defines
cov_estimator = ScalarCovariance(scalar, ...)
obs_sample_builder = ObsSampleBuilder(
sample_dims = ["time"],
sample_dim_operation = ("aggregate", "30mins"),
)
then later in ObservationRecipe.covariance(vars, cov_estimator::CE, obs_sample_builder.::OSB) where {CE <: ..., OSB <: ...}
samples = build_observation_samples(obs_sample_builder, vars, ...)
estimate_covariance(
cov_estimator,
samples,
...
)
Happy to discuss
Link to observation/covariance usage
https://github.com/CliMA/ClimaCalibrate.jl/blob/ca6550f7930b4a2f9f535534c851d61f00630786/ext/observation_recipe.jl
issue
During some recent discussions it became clear that Though we have a couple of dispatching covariance definitions in
ObservationRecipes, e.g.,ScalarCovarianceandSVDplusD, they do not act in a way that is truly modular. In particular the following needs to be true"From a top level script, and for a given set of observation samples
y_1,y_2,...,y_n, (and some user-inputs e.g. scalings), we should be able to define a variety of covariances (scalar*I, diagonal, SVDplusD) easily."As the fundamental EKP objects
(scalar*I,diagonal, SVDplusD)can all operate similarly ony_1,y_2,...,y_ntype inputs, it would appear that there is some inconsistency in how theObservationRecipeapplies these constructions. After discussions with @ph-kev we possibly draw this back to perhaps having inconsistent mapping from ClimaAnalaysisOutputVars to define samples.Possible direction:
the
CovarianceEstimatortypes as they are, are doing too many different things, including handling different complexities in the stacking of data etc. Instead we could separate this task to aObservationSamplesBuildermethod that can generate an object that is considered to be i.i.d samples of data. Then the building of the covariance should be a simple (modular) task applying an estimation approach to this set of samples. Any complexity in the organization/stacking dimensions etc. related to this task based on OutputVars should be delagated to theObservationSamplesBuilder.In effect this is providing a struct that
observationdispatches on, rather than the current kwarg-based approach?I think one important consequence is that the different covariance estimator structures will no longer depend on the state
vars. They will use the state when they are applied via a dispatched method (e.g. calledestimate_covariance)As a sketch... the user/outer loop defines
then later in
ObservationRecipe.covariance(vars, cov_estimator::CE, obs_sample_builder.::OSB) where {CE <: ..., OSB <: ...}Happy to discuss
Link to observation/covariance usage
https://github.com/CliMA/ClimaCalibrate.jl/blob/ca6550f7930b4a2f9f535534c851d61f00630786/ext/observation_recipe.jl