Skip to content

Commit ae8f064

Browse files
committed
Reorganize example 2 and 3 data scripts and add untracked to .gitignore
1 parent 7ce2129 commit ae8f064

3 files changed

Lines changed: 27 additions & 183 deletions

File tree

.gitignore

Lines changed: 21 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -170,45 +170,28 @@ cython_debug/
170170
#.idea/
171171
*cache/
172172

173-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.cpg
174-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.dbf
175-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.json
176-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.prj
177-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.shp
178-
/examples/1_Boulder/data/gis/mt_sid_boulder_gfid.shx
179-
/examples/1_Boulder/data/plot_timeseries/
180-
/examples/1_Boulder/data/landsat/
181-
/examples/1_Boulder/data/met_timeseries/
182-
/examples/1_Boulder/data/properties/
183-
/examples/1_Boulder/data/snodas/
184-
/examples/1_Boulder/data/tutorial_properties.json
185-
/examples/1_Boulder/data/prepped_input.json
186-
/examples/2_Fort_Peck/data/gis/flux_fields_gfid.json
187-
/examples/2_Fort_Peck/data/landsat/
188-
/examples/2_Fort_Peck/data/met_timeseries/
189-
/examples/2_Fort_Peck/data/properties/
190-
/examples/2_Fort_Peck/data/snodas/
191-
/examples/2_Fort_Peck/data/plot_timeseries/
192-
/examples/2_Fort_Peck/data/US-FPe_daily_data.csv
193-
/examples/2_Fort_Peck/data/pestrun/
194-
/examples/3_Crane/data/gis/flux_fields_gfid.json
195-
/examples/3_Crane/data/landsat/
196-
/examples/3_Crane/data/met_timeseries/
197-
/examples/3_Crane/data/properties/
198-
/examples/3_Crane/data/snodas/
199-
/examples/3_Crane/data/plot_timeseries/
200-
/examples/3_Crane/data/US-FPe_daily_data.csv
201-
/examples/3_Crane/data/pestrun/
202-
/examples/4_Flux_Network/data/snodas/
203-
/examples/4_Flux_Network/data/properties/
204-
/examples/4_Flux_Network/data/met_timeseries/
205-
/examples/4_Flux_Network/data/landsat/
206-
/examples/4_Flux_Network/data/plot_timeseries/
207-
/examples/4_Flux_Network/data/bias_correction_tif/
208-
/examples/4_Flux_Network/data/pestrun/
173+
# Example data directories: ignore everything except tracked inputs
174+
/examples/1_Boulder/data/*
175+
!/examples/1_Boulder/data/gis/
176+
!/examples/1_Boulder/data/bias_correction_tif/
177+
178+
/examples/2_Fort_Peck/data/*
179+
!/examples/2_Fort_Peck/data/gis/
180+
!/examples/2_Fort_Peck/data/prepped_input.zip
181+
!/examples/2_Fort_Peck/data/US-FPe_daily_data.zip
182+
183+
/examples/3_Crane/data/*
184+
!/examples/3_Crane/data/gis/
185+
!/examples/3_Crane/data/prepped_input.zip
186+
187+
/examples/4_Flux_Network/data/*
188+
!/examples/4_Flux_Network/data/gis/
189+
190+
# Generated shapefiles and provenance in gis dirs
191+
/examples/*/data/gis/*_gfid*
192+
/examples/*/data/gis/shapefile_provenance.txt
193+
209194
/examples/logs/
210195

211-
examples/6_Flux_International/data/landsat/
212-
examples/6_Flux_International/data/ecostress/
213196
# Diagnostic scratch work
214197
examples/diagnostics/

examples/2_Fort_Peck/01_uncalibrated_model.ipynb

Lines changed: 3 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -4,51 +4,7 @@
44
"cell_type": "markdown",
55
"id": "cell-intro",
66
"metadata": {},
7-
"source": [
8-
"# Calibration Tutorial - Fort Peck, MT - Unirrigated Flux Plot\n",
9-
"\n",
10-
"## Step 1: Uncalibrated Model Run\n",
11-
"\n",
12-
"This tutorial focuses on calibrating SWIM-RS for a single unirrigated plot: a 3-pixel buffer around FluxNet's US-FPe eddy covariance station from John Volk's Flux ET benchmark dataset. The flux station provides independent observations of both meteorology and ET flux, allowing us to validate our model.\n",
13-
"\n",
14-
"This notebook demonstrates:\n",
15-
"1. Loading pre-built model input data from a SwimContainer\n",
16-
"2. Running the uncalibrated SWIM model\n",
17-
"3. Comparing model output with flux tower observations\n",
18-
"\n",
19-
"**Reference:** This example is based on John Volk's flux footprint study:\n",
20-
"- Paper: https://www.sciencedirect.com/science/article/pii/S0168192323000011\n",
21-
"- Data: https://www.sciencedirect.com/science/article/pii/S2352340923003931\n",
22-
"\n",
23-
"---\n",
24-
"\n",
25-
"### Data Pipeline\n",
26-
"\n",
27-
"**Input Data:** The `data/2_Fort_Peck.swim/` container stores pre-computed input data, so you can get started right away. If you want to build or rebuild the data for this example, we have provided scripts for reproduction:\n",
28-
"\n",
29-
"1. **Extract data** from Earth Engine and GridMET:\n",
30-
" ```bash\n",
31-
" cd data/\n",
32-
" python extract_data.py # Extract US-FPe only (default)\n",
33-
" python extract_data.py --help # See all options\n",
34-
" ```\n",
35-
"\n",
36-
"2. **Sync from bucket** after EE tasks complete:\n",
37-
" ```bash\n",
38-
" gsutil -m rsync -r gs://wudr/2_Fort_Peck/ ./data/\n",
39-
" ```\n",
40-
"\n",
41-
"3. **Build model inputs** using the container:\n",
42-
" ```bash\n",
43-
" cd data/\n",
44-
" python build_inputs.py # Build container\n",
45-
" python build_inputs.py --rebuild # Force rebuild from scratch\n",
46-
" ```\n",
47-
"\n",
48-
"The container (`data/2_Fort_Peck.swim/`) stores all ingested data with provenance tracking.\n",
49-
"\n",
50-
"---"
51-
]
7+
"source": "# Calibration Tutorial - Fort Peck, MT - Unirrigated Flux Plot\n\n## Step 1: Uncalibrated Model Run\n\nThis tutorial focuses on calibrating SWIM-RS for a single unirrigated plot: a 3-pixel buffer around FluxNet's US-FPe eddy covariance station from John Volk's Flux ET benchmark dataset. The flux station provides independent observations of both meteorology and ET flux, allowing us to validate our model.\n\nThis notebook demonstrates:\n1. Loading pre-built model input data from a SwimContainer\n2. Running the uncalibrated SWIM model\n3. Comparing model output with flux tower observations\n\n**Reference:** This example is based on John Volk's flux footprint study:\n- Paper: https://www.sciencedirect.com/science/article/pii/S0168192323000011\n- Data: https://www.sciencedirect.com/science/article/pii/S2352340923003931\n\n---\n\n### Data Pipeline\n\n**Input Data:** The `data/2_Fort_Peck.swim/` container stores pre-computed input data, so you can get started right away. If you want to build or rebuild the data for this example, we have provided scripts for reproduction:\n\n1. **Extract data** from Earth Engine and GridMET:\n ```bash\n python extract_data.py # Extract US-FPe only (default)\n python extract_data.py --help # See all options\n ```\n\n2. **Sync from bucket** after EE tasks complete:\n ```bash\n gsutil -m rsync -r gs://wudr/2_Fort_Peck/ ./data/\n ```\n\n3. **Build model inputs** using the container:\n ```bash\n python build_inputs.py # Build container\n python build_inputs.py --rebuild # Force rebuild from scratch\n ```\n\nThe container (`data/2_Fort_Peck.swim/`) stores all ingested data with provenance tracking.\n\n---"
528
},
539
{
5410
"cell_type": "code",
@@ -166,40 +122,7 @@
166122
}
167123
},
168124
"outputs": [],
169-
"source": [
170-
"# Example: Query data directly from the SwimContainer\n",
171-
"\n",
172-
"container_path = os.path.join(data, \"2_Fort_Peck.swim\")\n",
173-
"\n",
174-
"if os.path.exists(container_path):\n",
175-
" container = SwimContainer.open(container_path, mode=\"r\")\n",
176-
"\n",
177-
" # List available fields\n",
178-
" print(f\"Fields in container: {container.field_uids}\")\n",
179-
"\n",
180-
" # Get all time series for a single field using field_timeseries\n",
181-
" ts_df = container.query.field_timeseries(\"US-FPe\")\n",
182-
" print(f\"\\nTime series shape: {ts_df.shape}\")\n",
183-
" print(f\"Variables: {list(ts_df.columns)[:10]}...\")\n",
184-
"\n",
185-
" # Query specific data using dataframe with zarr paths\n",
186-
" # Path structure: remote_sensing/{type}/{instrument}/{model}/{mask}\n",
187-
" ndvi_df = container.query.dataframe(\"remote_sensing/ndvi/landsat/inv_irr\", fields=[\"US-FPe\"])\n",
188-
" print(f\"\\nNDVI observations: {ndvi_df.notna().sum().values[0]}\")\n",
189-
"\n",
190-
" etf_df = container.query.dataframe(\n",
191-
" \"remote_sensing/etf/landsat/ssebop/inv_irr\", fields=[\"US-FPe\"]\n",
192-
" )\n",
193-
" print(f\"ETf observations: {etf_df.notna().sum().values[0]}\")\n",
194-
"\n",
195-
" # Show container status\n",
196-
" print(\"\\n\" + container.query.status())\n",
197-
"\n",
198-
" container.close()\n",
199-
"else:\n",
200-
" print(f\"Container not found at {container_path}\")\n",
201-
" print(\"Run: cd data && python build_inputs.py --rebuild\")"
202-
]
125+
"source": "# Example: Query data directly from the SwimContainer\n\ncontainer_path = os.path.join(data, \"2_Fort_Peck.swim\")\n\nif os.path.exists(container_path):\n container = SwimContainer.open(container_path, mode=\"r\")\n\n # List available fields\n print(f\"Fields in container: {container.field_uids}\")\n\n # Get all time series for a single field using field_timeseries\n ts_df = container.query.field_timeseries(\"US-FPe\")\n print(f\"\\nTime series shape: {ts_df.shape}\")\n print(f\"Variables: {list(ts_df.columns)[:10]}...\")\n\n # Query specific data using dataframe with zarr paths\n # Path structure: remote_sensing/{type}/{instrument}/{model}/{mask}\n ndvi_df = container.query.dataframe(\"remote_sensing/ndvi/landsat/inv_irr\", fields=[\"US-FPe\"])\n print(f\"\\nNDVI observations: {ndvi_df.notna().sum().values[0]}\")\n\n etf_df = container.query.dataframe(\n \"remote_sensing/etf/landsat/ssebop/inv_irr\", fields=[\"US-FPe\"]\n )\n print(f\"ETf observations: {etf_df.notna().sum().values[0]}\")\n\n # Show container status\n print(\"\\n\" + container.query.status())\n\n container.close()\nelse:\n print(f\"Container not found at {container_path}\")\n print(\"Run: python build_inputs.py --rebuild\")"
203126
},
204127
{
205128
"cell_type": "markdown",
@@ -936,4 +859,4 @@
936859
},
937860
"nbformat": 4,
938861
"nbformat_minor": 5
939-
}
862+
}

examples/3_Crane/01_uncalibrated_model.ipynb

Lines changed: 3 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,7 @@
44
"cell_type": "markdown",
55
"id": "cell-intro",
66
"metadata": {},
7-
"source": [
8-
"# Calibration Tutorial - Crane, OR - Irrigated Flux Plot\n",
9-
"\n",
10-
"## Step 1: Uncalibrated Model Run\n",
11-
"\n",
12-
"This tutorial focuses on calibrating SWIM-RS for a single irrigated alfalfa plot at the S2 flux station in Crane, Oregon. Unlike the unirrigated Fort Peck example, this site is actively irrigated.\n",
13-
"\n",
14-
"This notebook demonstrates:\n",
15-
"1. Loading pre-built model input data from a SwimContainer\n",
16-
"2. Running the uncalibrated SWIM model\n",
17-
"3. Comparing model output with the open source members of the OpenET ensemble (PT-JPL, SIMS, SSEBop, geeSEBAL)\n",
18-
"4. Validation against flux tower observations using multiple metrics (R², r, RMSE, bias)\n",
19-
"\n",
20-
"### Data Pipeline\n",
21-
"\n",
22-
"**Input Data:** The `data/3_Crane.swim/` container stores pre-computed input data.\n",
23-
"\n",
24-
"The full data workflow uses two scripts and can be re-run if needed:\n",
25-
"\n",
26-
"1. **`extract_data.py`** - Extracts raw data from Earth Engine and GridMET to CSV/parquet files\n",
27-
"2. **`build_inputs.py`** - Processes extracted data through SwimContainer\n",
28-
"\n",
29-
"To reproduce the input data from scratch:\n",
30-
"\n",
31-
"```bash\n",
32-
"cd data\n",
33-
"python extract_data.py # Extract from EE/GridMET (requires authentication)\n",
34-
"python build_inputs.py # Build container\n",
35-
"```\n",
36-
"\n",
37-
"See `data/extract_data.py` for extraction options and `data/build_inputs.py` for container workflow details."
38-
]
7+
"source": "# Calibration Tutorial - Crane, OR - Irrigated Flux Plot\n\n## Step 1: Uncalibrated Model Run\n\nThis tutorial focuses on calibrating SWIM-RS for a single irrigated alfalfa plot at the S2 flux station in Crane, Oregon. Unlike the unirrigated Fort Peck example, this site is actively irrigated.\n\nThis notebook demonstrates:\n1. Loading pre-built model input data from a SwimContainer\n2. Running the uncalibrated SWIM model\n3. Comparing model output with the open source members of the OpenET ensemble (PT-JPL, SIMS, SSEBop, geeSEBAL)\n4. Validation against flux tower observations using multiple metrics (R², r, RMSE, bias)\n\n### Data Pipeline\n\n**Input Data:** The `data/3_Crane.swim/` container stores pre-computed input data.\n\nThe full data workflow uses two scripts and can be re-run if needed:\n\n1. **`extract_data.py`** - Extracts raw data from Earth Engine and GridMET to CSV/parquet files\n2. **`build_inputs.py`** - Processes extracted data through SwimContainer\n\nTo reproduce the input data from scratch:\n\n```bash\npython extract_data.py # Extract from EE/GridMET (requires authentication)\npython build_inputs.py # Build container\n```\n\nSee `extract_data.py` for extraction options and `build_inputs.py` for container workflow details."
398
},
409
{
4110
"cell_type": "code",
@@ -255,38 +224,7 @@
255224
}
256225
},
257226
"outputs": [],
258-
"source": [
259-
"# Query container data (optional - requires build_inputs.py to have been run)\n",
260-
"\n",
261-
"container_path = os.path.join(data, \"3_Crane.swim\")\n",
262-
"\n",
263-
"if os.path.exists(container_path):\n",
264-
" container = SwimContainer.open(container_path, mode=\"r\")\n",
265-
"\n",
266-
" # List available fields\n",
267-
" print(f\"Fields in container: {container.field_uids}\")\n",
268-
"\n",
269-
" # Get all time series for a single field using field_timeseries\n",
270-
" ts_df = container.query.field_timeseries(\"S2\")\n",
271-
" print(f\"\\nTime series shape: {ts_df.shape}\")\n",
272-
" print(f\"Variables: {list(ts_df.columns)[:10]}...\")\n",
273-
"\n",
274-
" # Query specific data using dataframe with zarr paths\n",
275-
" # Path structure: remote_sensing/{type}/{instrument}/{model}/{mask}\n",
276-
" ndvi_df = container.query.dataframe(\"remote_sensing/ndvi/landsat/irr\", fields=[\"S2\"])\n",
277-
" print(f\"\\nNDVI observations: {ndvi_df.notna().sum().values[0]}\")\n",
278-
"\n",
279-
" etf_df = container.query.dataframe(\"remote_sensing/etf/landsat/ssebop/irr\", fields=[\"S2\"])\n",
280-
" print(f\"ETf observations: {etf_df.notna().sum().values[0]}\")\n",
281-
"\n",
282-
" # Show container status\n",
283-
" print(\"\\n\" + container.query.status())\n",
284-
"\n",
285-
" container.close()\n",
286-
"else:\n",
287-
" print(f\"Container not found at {container_path}\")\n",
288-
" print(\"Run: cd data && python build_inputs.py\")"
289-
]
227+
"source": "# Query container data (optional - requires build_inputs.py to have been run)\n\ncontainer_path = os.path.join(data, \"3_Crane.swim\")\n\nif os.path.exists(container_path):\n container = SwimContainer.open(container_path, mode=\"r\")\n\n # List available fields\n print(f\"Fields in container: {container.field_uids}\")\n\n # Get all time series for a single field using field_timeseries\n ts_df = container.query.field_timeseries(\"S2\")\n print(f\"\\nTime series shape: {ts_df.shape}\")\n print(f\"Variables: {list(ts_df.columns)[:10]}...\")\n\n # Query specific data using dataframe with zarr paths\n # Path structure: remote_sensing/{type}/{instrument}/{model}/{mask}\n ndvi_df = container.query.dataframe(\"remote_sensing/ndvi/landsat/irr\", fields=[\"S2\"])\n print(f\"\\nNDVI observations: {ndvi_df.notna().sum().values[0]}\")\n\n etf_df = container.query.dataframe(\"remote_sensing/etf/landsat/ssebop/irr\", fields=[\"S2\"])\n print(f\"ETf observations: {etf_df.notna().sum().values[0]}\")\n\n # Show container status\n print(\"\\n\" + container.query.status())\n\n container.close()\nelse:\n print(f\"Container not found at {container_path}\")\n print(\"Run: python build_inputs.py\")"
290228
},
291229
{
292230
"cell_type": "markdown",
@@ -964,4 +902,4 @@
964902
},
965903
"nbformat": 4,
966904
"nbformat_minor": 5
967-
}
905+
}

0 commit comments

Comments
 (0)