Adding some of Python's Xarray features to Oceananigans #4364

tomchor · 2025-04-08T02:30:43Z

tomchor
Apr 8, 2025
Collaborator

Talking to @glwagner we came to the conclusion that it might be nice to implement some of Xarray's features in Oceananigans. Xarray has an impressive number of features, not all of them generally useful to physical oceanography, so I'm gonna list the ones that I tend to use the most while working with data (generated by Oceananigans or otherwise):

Probably the most useful of all is the ability to index and slice data/Fields based not on array indices, but on physical locations. Xarray was probably the first to implement this, but I actually think that DimensionalData.jl has a much more elegant way to implement the same concept with Selectors. So I'll use it as an example. Assuming u is a Field with this feature implemented (or a DimArray):

julia> u[X=At(1.2), Y=At(0.4)] # Selects data specifically at x=1.2 and y=0.4
julia> u[X=Near(1.21), Y=At(0.5)] # Selects the data point(s) closest to 1.12, 0.5
julia> u[X=1.2..1.6, Y=OpenInterval(0.1..0.6)] # Selects all data between x=1.2 and 1.6, bounds included, and data between y=0.1 and 0.6, bounds excluded.

There are other things one can do and I encourage people to check the pages I linked. It's extremely handy when trying to abstract away the discretization and doing your thinking in the real world.

The second most useful capability, in my opinion, is the plotting in Xarray. As an example, with one command, I can tell array to get a 4D field (time + 3D dimensions) and plot panels where the columns organize the time dimension, rows organize the z-location, and the horizontal and vertical axes of each panel are the x and y dimensions. I'm also telling it to explicitly plot a colorbar:

u.plot(col="time", row="z", x="x", y="y", add_colorbar=True, cmap="RdBu_r")

Which plots (using some random data as an example):

As another example, something I find extremely useful is plotting variables against each other. Taking an example from the Xarray docs, if I have variables A, B, I can plot them against each other in a scatter plot where the hue can represent a third variable (in the case below, y) and the marker size can represent a fourth variable (in the case below, z).

ds.plot.scatter(x="A", y="B", hue="y", markersize="z")

which produces this (artificial) plot:

Note that Xarray can be connected with hvPlot to do similar things, but with interactive plots. Note also that some of that functionality has already been implemented by DimensionalData, but it's still in early stages.

Something that is minor, but that I find really useful when post-processing, is the fact that DataArrays (the approximate equivalent of Fields in Xarray) carry metadata. That's really helpful for keeping track of units, or making the definition of a given variable clear in case you're sharing the data with someone else (including your future self!).
Fields in Xarray (called DataArrays) can be N-Dimensional. In Oceananigans this could take the form of an NDField. So, for example, in one of my scripts I need to calculate a Reynolds tensor, which I do by using a 3D velocity vector indexed by i (basically $u_i$), manipulating the labels of that vector to create $u_j$ (without creating or modifying the underlying data), and simple multiplying both, as one would do when writing math. Xarray then correctly broadcasts and creates a 3D Reynolds tensor that also depends on $i$ and $j$. The code can be written like this:

uⱼ = uᵢ.rename(i="j")
uᵢuⱼ = uᵢ * uⱼ

The resulting vector depends on x, y, z, i, j (and whatever other dimensions the original vector depended on). So (using now DimensionalData notation) I could calculate the horizontal average of $w^2$ (i.e. the i=j=3 component of the tensor) as simply as:

w2 = Average(uᵢuⱼ[i=At(3), j=At(3)], dims=(X, Y))

among many other calculations that can be done easily. TKE can also be calculated easily as the sum of the diagonal (over two). This can be coupled to a mean velocity gradient vector to calculate shear production rates, etc.

I think it would also be useful to have something like Xarray.Datasets, which are a collection of many N-Dimensional variables. So, for example, if I'm calculating a TKE budget, a Dataset can hold one "Field" for each term in the TKE equation. Then a horizontal integration would integrate everything with one command. In Oceananigans this could take the form of a FieldDataset, and integrating all terms could be done with one call Integrate with the given Dataset as an argument. This seems simple, but it really cleans up post-processing code for me.
Lastly, something I don't particularly use too much but other people reported using a lot is the ability to group data by specific values and apply operations in a rolling window. They are a bit hard to demonstrate so I'll point to the Xarray docs for them here and here.

Some of these features (like the metadata) can be pretty simple to implement. Others can require a pretty big refactor. Also this is my subjective opinion of what I think is useful. I'm curious to hear the opinion of other on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding some of Python's Xarray features to Oceananigans #4364

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Adding some of Python's Xarray features to Oceananigans #4364

Uh oh!

Uh oh!

tomchor Apr 8, 2025 Collaborator

Replies: 0 comments

tomchor
Apr 8, 2025
Collaborator