USGS WR18 Streamflow Daily Averages Data Package

This dataset contains daily streamflow data and associated hydrologic features for USGS Water Resource Region 18 (WR18). It was created by processing hourly streamflow data from the National Water Model v3.0 (1979–2023 retrospective) into daily means, specifically for StreamRiver features in HUC-2 region 18.

Region Name: Water Resource Region 18
HUC-2: 18
States Covered: California, Nevada
Included HUC-4s:
1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810
Excluded HUC-4s:
1606, 1712
- Note: Vector Processing Unit 18 included 1606 and 1712, but these are not part of HUC-2 region 18 and were excluded.

Package documentation

This repo contains the files and instructions for creating the Google Cloud Data here CB/wr18 for mean daily streamflow 1979-2023 for all of California. See the notebooks directory for how the files were created. To see a simplified version of the steps, run notebooks/tutorial.ipynb.

Author

Science, goals and data: Morgan Bond, NOAA Fisheries Coding: Eli Holmes, NOAA Fisheries

Google Cloud Bucket structure

The flowline and streamflow files are not in this GitHub repo (they are big). You will find them in the Google Cloud Bucket.

bucket/
└── wr18/                           
    ├── flowline/                   # All vector data for the region
    │   ├── WR_18_Flowline.parquet
    │   ├── WR_18_metadata.parquet
    │   └── README.md               
    │
    ├── streamflow/                 # NetCDF and Zarr files
    │   ├── netcdf/
    │   └── zarr/
    │
    ├── notebooks/                  # Reproducibility and how-to examples
    │   ├── package_structure.ipynb
    │   ├── shapefile_processing.ipynb
    │   ├── streamflow_processing.ipynb
    │   ├── save_to_gcp.ipynb
    │   └── README.md               
    │
    └── metadata/                   # Additional documentation
        ├── wr18_hydro_metadata.json
        └── README.md

Storage Format: NetCDF and Zarr
Cloud Storage: Google Cloud Bucket on NOAA NODD
Access: Public, anonymous

Streamflow Data Access

Python (stream Zarr)

import xarray as xr
zarr_url = "gcs://nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/streamflow/zarr"
ds = xr.open_zarr(zarr_url, consolidated=True)
ds["streamflow"].sel(time="2018").isel(feature_id=1).plot()

Python (stream netcdf)

import xarray as xr
import fsspec
url = "gcs://nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/streamflow/netcdf/daily_mean_2018.nc"
fs = fsspec.filesystem("gcs", anon=True)
f = fs.open(url, mode="rb")
ds = xr.open_dataset(f)
ds["streamflow"].isel(feature_id=1).plot()
f.close()

Python (download netcdf)

import xarray as xr
import urllib.request
url = "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/streamflow/netcdf/daily_mean_2018.nc"
urllib.request.urlretrieve(url, "daily_mean_2018.nc")
ds = xr.open_dataset("daily_mean_2018.nc")
ds["streamflow"].isel(feature_id=1).plot()

R (does not allow streaming data from the cloud!) See below for a nicer plotting example.

library(ncdf4)
url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/streamflow/netcdf/daily_mean_2018.nc"
download.file(url, "daily_mean_2018.nc", mode = "wb")
nc <- nc_open("daily_mean_2018.nc")
time <- ncvar_get(nc, "time")
streamflow <- ncvar_get(nc, "streamflow")[1,]
plot(time, streamflow)

Source of hourly streamflow data

Source Dataset: NOAA National Water Model v3.0 Retrospective (1979–2023)
Original Data Access: AWS Open Data Registry
Zarr Path: s3://noaa-nwm-retrospective-3-0-pds/CONUS/zarr/chrtout.zarr
Filter: Only features with FTYPE == 'StreamRiver'
Subset Method: Match COMIDs from WR18 flowline metadata with feature_id in NWM output
Processing step to compute daily means from hourly data:
daily_mean = streamflow_ts.resample(time="1D").mean().compute()
Output Formats: NetCDF and Zarr

Flowline Source: NHDPlusV2.1

This is the map of where the streams and rivers are.

Source: EPA NHDPlusV2.1, Vector Processing Unit 18
Download URL: EPA NHDPlus VPU 18 Original file: NHDPlusV21_CA_18_NHDSnapshot_05.7z
Processed Output: metadata/WR_18_Flowline.parquet

Flowline Attributes Summary

Attribute	Description
Total Features	178,868
FTYPE Values	StreamRiver, ArtificialPath, CanalDitch, Connector, Pipeline
FCODE Info	See `WR_18_metadata.parquet` for full list
Geometry Types	LineString or MultiLineString
CRS (Original)	`ESRI:102039` (USA_Contiguous_Albers_Equal_Area_Conic_USGS)
CRS (Parquet)	`EPSG:3857` (Web Mercator)

Geometry cleaning steps:

Dropped empty or invalid geometries
Converted to 2D LineString (stripped Z/M if present)

Accessing the flowline data

Python (need to download)

import urllib.request
import geopandas as gpd
url = "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/flowline/WR_18_Flowline.parquet"
local_path = "WR_18_Flowline.parquet"
urllib.request.urlretrieve(url, local_path)
gdf = gpd.read_parquet("WR_18_Flowline.parquet")
gdf = gdf[gdf["HUC4"] == "1810"]  # optional filter
gdf.plot()

R (can read from url but need to monkey with geometry)

library(arrow)
library(sf)
library(ggplot2)

url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/flowline/WR_18_Flowline.parquet"
df <- as.data.frame(read_parquet(url))
df_small <- subset(df, HUC4 == "1810")

# Build sf object
geom <- st_as_sfc(df_small$geometry, EWKB = TRUE)
df_small$geometry <- NULL
gdf <- st_sf(df_small, geometry = geom)

# Plot just the geometry
ggplot(st_geometry(gdf)) +
  geom_sf(color = "steelblue", size = 0.3) +
  theme_minimal()

Example of a plot in R

library(ncdf4)
library(ggplot2)

# Get file via download
url <- "https://storage.googleapis.com/nmfs_odp_nwfsc/CB/nwm_daily_means/wr18/streamflow/netcdf/daily_mean_2018.nc"
download.file(url, "daily_mean_2018.nc", mode = "wb")

# Open the file
nc <- ncdf4::nc_open("daily_mean_2018.nc")

# Read the time variable
time_raw <- ncvar_get(nc, "time")
time_units <- ncatt_get(nc, "time", "units")$value
time_origin <- sub("days since ", "", time_units)
time <- as.Date(time_raw, origin = time_origin)

# Read streamflow for one feature_id (e.g., index 1)
feature_id <- ncvar_get(nc, "feature_id")
fid <- 344103
fid_col <- which(feature_id == fid)
streamflow <- ncvar_get(nc, "streamflow")[fid_col,]  # all time steps, first feature_id

# Close NetCDF file
nc_close(nc)

# Create data frame
df <- data.frame(time = time, streamflow = streamflow)

# Plot
ggplot(df, aes(x = time, y = streamflow)) +
  geom_line(color = "steelblue") +
  labs(title = "Daily Mean Streamflow (Feature ID 1)",
       x = "Date", y = "Streamflow (m³/s)") +
  theme_minimal()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USGS WR18 Streamflow Daily Averages Data Package

Package documentation

Author

Google Cloud Bucket structure

Streamflow Data Access

Source of hourly streamflow data

Flowline Source: NHDPlusV2.1

Flowline Attributes Summary

Accessing the flowline data

Example of a plot in R

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
HUC2-18		HUC2-18
docs		docs
flowline		flowline
metadata		metadata
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

eeholmes-uw/Streamflow-Means-NWM

Folders and files

Latest commit

History

Repository files navigation

USGS WR18 Streamflow Daily Averages Data Package

Package documentation

Author

Google Cloud Bucket structure

Streamflow Data Access

Source of hourly streamflow data

Flowline Source: NHDPlusV2.1

Flowline Attributes Summary

Accessing the flowline data

Example of a plot in R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages