Skip to content

Document Suggested Workflow #40

Open
@ChrisBarker-NOAA

Description

@ChrisBarker-NOAA

This code is focused on a specific part of the workflow folks may need to do -- but we are also provided tools and utilities for other bits. So I think it's helpful to Document the suggested workflow, and that will also help us determine where to put code.

My first draft:

Goal:

Starting Point:

User has a set of data that can be loaded into xarray: could be files on disk, or files on AMS, or Kerchunked zarr dataset, or ....

User needs a subset of that data:

  • Restricted to:
    • a polygon in space
    • particular time frame
    • either a single vertical layer or all vertical layers (proper vertical subsetting can wait ...)
    • only the variables they need.

Outcome:

An xarray Dataset all ready to save to netcdf, or .....

That Dataset contains only what the user wants -- and is as similar as the original as possible. e.g. same names for all variables, maybe some additional metadata.

Workflow:

Step One:

User does any pre-processing required to get their data into a single, conforming dataset.

In many cases, there's nothing to be done, but it some cases, there may be work to be done:

  1. The grid and dat variables are in multiple files, they need to be combined into one dataset
  2. If there are "troublesome" variables -- e.g. time coordinates that aren't correct, etc.

As a rule, this will be model specific, maybe even implementation-of-model specific.

This package can't provide all of that, but it can (and should) provide a few examples for common cases.

e.g. SCHISM (STOFS), maybe FVCOM fixing teh time variable (some use single precision float days :-()

Step 2:

The user processes the Dataset to make it CF compliant (or enough so that the subsetting code can work)

This package will contain utilities to do that, e.g.

ugrid.assign_ugrid_topology()

Step 3:

The Dataset can be queried by the user to find out what they need to know in order to specify a subset:

  • what variables are in the dataset
  • what timespan is covered
  • what region is covered (maybe?)
  • whether it's 2D or 3D ?

Step 4:

The user makes a request for a subset.

Result -- a subset Dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions