Skip to content

Commit 4f5ca4f

Browse files
authored
Merge pull request #191 from CliMA/kp/observation
Add ObservationRecipe
2 parents cd9cf3a + 8b6012f commit 4f5ca4f

File tree

10 files changed

+1657
-3
lines changed

10 files changed

+1657
-3
lines changed

Project.toml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "ClimaCalibrate"
22
uuid = "4347a170-ebd6-470c-89d3-5c705c0cacc2"
33
authors = ["Climate Modeling Alliance"]
4-
version = "0.1.1"
4+
version = "0.1.2"
55

66
[deps]
77
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
@@ -16,13 +16,19 @@ YAML = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"
1616

1717
[weakdeps]
1818
CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
19+
ClimaAnalysis = "29b5916a-a76c-4e73-9657-3c8fd22e65e6"
20+
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
21+
NaNStatistics = "b946abbf-3ea7-4610-9019-9858bfdeaf2d"
22+
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
1923

2024
[extensions]
2125
CESExt = "CalibrateEmulateSample"
26+
ClimaAnalysisExt = ["ClimaAnalysis", "NaNStatistics", "Statistics", "LinearAlgebra"]
2227

2328
[compat]
2429
Aqua = "0.8"
2530
CalibrateEmulateSample = "0.5, 0.6, 0.7"
31+
ClimaAnalysis = "0.5.18"
2632
ClimaParams = "0.10"
2733
Conda = "1.7, 1.8, 1.9, 1.10"
2834
Dates = "1"
@@ -32,6 +38,8 @@ EnsembleKalmanProcesses = "1, 2"
3238
JLD2 = "0.4, 0.5"
3339
LinearAlgebra = "1"
3440
Logging = "1.10, 1.11"
41+
NaNStatistics = "0.6.8 - 0.6.50, 0.6.53"
42+
OrderedCollections = "1.3"
3543
Random = "1"
3644
SafeTestsets = "0.1"
3745
Statistics = "1"
@@ -43,12 +51,15 @@ julia = "1.9"
4351
[extras]
4452
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
4553
CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
54+
ClimaAnalysis = "29b5916a-a76c-4e73-9657-3c8fd22e65e6"
4655
ClimaParams = "5c42b081-d73a-476f-9059-fd94b934656c"
4756
Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
4857
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
58+
NaNStatistics = "b946abbf-3ea7-4610-9019-9858bfdeaf2d"
59+
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
4960
SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
5061
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
5162
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
5263

5364
[targets]
54-
test = ["Aqua", "CalibrateEmulateSample", "ClimaParams", "Conda", "LinearAlgebra", "SafeTestsets", "Statistics", "Test"]
65+
test = ["Aqua", "CalibrateEmulateSample", "ClimaAnalysis", "ClimaParams", "Conda", "LinearAlgebra", "NaNStatistics", "OrderedCollections", "SafeTestsets", "Statistics", "Test"]

docs/make.jl

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
using Documenter
22
using Documenter: doctest
33
using ClimaCalibrate
4+
import ClimaAnalysis # needed to load ClimaAnalysis extension
45
using Base.CoreLogging
56
using DocumenterCitations
67
import Literate
@@ -18,7 +19,10 @@ Literate.markdown(
1819

1920
makedocs(
2021
plugins = [bib],
21-
modules = [ClimaCalibrate],
22+
modules = [
23+
ClimaCalibrate,
24+
Base.get_extension(ClimaCalibrate, :ClimaAnalysisExt),
25+
],
2226
sitename = "ClimaCalibrate.jl",
2327
authors = "Clima",
2428
checkdocs = :exports,
@@ -33,6 +37,7 @@ makedocs(
3337
"Distributed Calibration Tutorial" => "literate_example.md",
3438
"Backends" => "backends.md",
3539
"Observations" => "observations.md",
40+
"Observation Recipes" => "observation_recipe.md",
3641
"Emulate and Sample" => "emulate_sample.md",
3742
"API" => "api.md",
3843
],

docs/src/api.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,17 @@ ClimaCalibrate.minibatcher_over_samples
6767
ClimaCalibrate.observation_series_from_samples
6868
ClimaCalibrate.load_latest_ekp
6969
```
70+
71+
## Observation Recipe Interface
72+
73+
```@docs
74+
ClimaCalibrate.ObservationRecipe.AbstractCovarianceEstimator
75+
ClimaCalibrate.ObservationRecipe.SeasonalDiagonalCovariance
76+
ClimaCalibrate.ObservationRecipe.SeasonalDiagonalCovariance()
77+
ClimaCalibrate.ObservationRecipe.SVDplusDCovariance
78+
ClimaCalibrate.ObservationRecipe.SVDplusDCovariance(sample_dates)
79+
ClimaCalibrate.ObservationRecipe.covariance
80+
ClimaCalibrate.ObservationRecipe.observation
81+
ClimaCalibrate.ObservationRecipe.seasonally_aligned_yearly_sample_date_ranges
82+
ClimaCalibrate.ObservationRecipe.change_data_type
83+
```

docs/src/observation_recipe.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# ObservationRecipe
2+
3+
!!! warning
4+
To enable this extension, use `using ClimaAnalysis` or `import
5+
ClimaAnalysis`.
6+
7+
When handling weather and climate data, it can be tedious and error-prone when
8+
setting up the observation for calibration with `EnsembleKalmanProcesses` (or
9+
`EKP` for short). As such, ClimaCalibrate provides recipes for setting up
10+
observations consisting of samples, noise covariances, names, and metadata.
11+
12+
## How do I use this to set up observation for calibration with EKP?
13+
14+
All functions assume that any data preprocessing is done with `ClimaAnalysis`.
15+
16+
### Covariance Estimators
17+
18+
There are currently two covariance estimators, `SeasonalDiagonalCovariance` and
19+
`SVDplusDCovariance`, which are subtypes of `AbstractCovarianceEstimator`.
20+
`SeasonalDiagonalCovariance` approximates the observation noise covariance as a
21+
diagonal of variances across all the seasons for each observation, neglecting
22+
correlations between points. `SVDplusDCovariance` additionally approximates the
23+
correlations between points from, often limited, time series observations.
24+
25+
### Necessary data preprocessing
26+
27+
The `OutputVar`s should represent **time series data of summary statistics**.
28+
For example, to compute seasonal averages of a `OutputVar`, one can use
29+
`ClimaAnalysis.average_season_across_time`, which will produce a `OutputVar`
30+
that can be used with either `SeasonalDiagonalCovariance` or
31+
`SVDplusDCovariance`.
32+
33+
```julia
34+
import ClimaAnalysis
35+
36+
obs_var = ClimaAnalysis.OutputVar(
37+
"precip.mon.mean.nc",
38+
"precip",
39+
new_start_date = start_date,
40+
shift_by = Dates.firstdayofmonth,
41+
)
42+
43+
# -- preprocessing for units, times, grid, etc. --
44+
45+
seasonal_averages = ClimaAnalysis.average_season_across_time(obs_var)
46+
```
47+
48+
### Observation
49+
50+
After preprocessing the `OutputVar`s so that they represent time series data of
51+
summary statistics, one can use set up an `EKP.observation` as shown below.
52+
53+
```julia
54+
import ClimaAnalysis
55+
import EnsembleKalmanProcesses as EKP
56+
import ClimaCalibrate
57+
import ClimaCalibrate.ObservationRecipe
58+
59+
# Vars are OutputVars preprocessed to ensure consistent units, times,
60+
# and grid as the diagonstics produced from the model.
61+
# In this example, we want to calibrate with seasonal averages, so we use
62+
# ClimaAnalysis.average_season_across_time
63+
vars = ClimaAnalysis.average_season_across_time.(vars)
64+
65+
# We want the covariance matrix to be Float32, so we change it here.
66+
vars = ObservationRecipe.change_data_type.(vars, Float32)
67+
68+
# We choose SVDplusDCovariance. We need to supply the start and end dates of
69+
# the samples with `sample_date_ranges`. To do this, we can use the function
70+
# below. In this example, the dates in `vars` are all the same. For debugging,
71+
# it is helpful to use `ClimaAnalysis.dates(var)`.
72+
sample_date_ranges =
73+
ObservationRecipe.seasonally_aligned_yearly_sample_date_ranges(first(vars))
74+
covar_estimator = SVDplusDCovariance(
75+
sample_date_ranges,
76+
model_error_scale = Float32(0.05),
77+
regularization = Float32(1e-6),
78+
)
79+
80+
# Finally, we form the observation
81+
start_date = sample_date_ranges[1][1]
82+
end_date = sample_date_ranges[1][2]
83+
obs = ObservationRecipe.observation(
84+
covar_estimator,
85+
vars,
86+
start_date = start_date,
87+
end_date = end_date,
88+
)
89+
```
90+
91+
## Frequently asked questions
92+
93+
**Q: I need to compute `g_ensemble` and I do not know how the data of the `OutputVar`s is flattened.**
94+
95+
**A:** When forming the sample, the data in a `OutputVar` is flattened using
96+
`ClimaAnalysis.flatten`. See
97+
[`ClimaAnalysis.flatten`](https://clima.github.io/ClimaAnalysis.jl/dev/flat/#Flatten)
98+
in the ClimaAnalysis documentation for more information. The order of the
99+
variables in the observation is the same as the order of the `OutputVar`s when
100+
creating the `EKP.Observation` using `ObservationRecipe.observation`.
101+
102+
**Q: How do I handle `NaN`s in the `OutputVar`s so that there are no `NaN`s in the sample and covariance matrix?**
103+
104+
**A:** `NaN`s should be handled when preprocessing the data. In some cases,
105+
there will be `NaN`s in the data (e.g. calibrating with data that is valid only
106+
over land). In these cases, the functions for making observations will
107+
automatically remove `NaN`s from the data. It is important to ensure that across
108+
the time slices, the `NaN`s appear in the same coordinates of the non-temporal
109+
dimensions. For example, if the quantity is defined over the dimensions
110+
longitude, latitude, and time, then any slice of the data at a particular
111+
longitude and latitude should either only contain `NaN`s or no `NaN`s at all.
112+
113+
**Q: How is the name of the observation determined?**
114+
115+
**A:** The name of the observation is determined by the short name in the
116+
attributes of the `OutputVar`. If there are multiple `OutputVar`s, then the name
117+
is all the short names separated by semicolons. If no short name is found, then
118+
the name will be `nothing`.
119+
120+
**Q: What is `regularization` and `model_error_scale` when making a covariance matrix?**
121+
122+
**A:** The model error scale and regularization terms are used to inflate the
123+
diagonal of the observation covariance matrix to reflect estimates of
124+
measurement error. You can add a fixed percentage inflation of the noise due to
125+
the model error to the covariance matrix with the `model_error_scale` keyword
126+
argument. Additionally, to prevent very small variance along the diagonal of the
127+
covariance matrix, you can add a regularization with the `regularization`
128+
keyword argument.

0 commit comments

Comments
 (0)