Skip to content

A means of viewing all differences between two datatrees #9929

Open
@danielfromearth

Description

Is your feature request related to a problem?

It can be frustrating to figure out why two Datatrees are not returning True when running xarray.DataTree.identical() or xarray.DataTree.equals().

Currently, if xarray's diff functions detect any difference in the tree structure, they raise at that point, and so do not show all of the differences. Thus, the current functions excel when the user wants to check that two datatrees are equal, but not when the user wants to discover subtle differences — and there are cases in which such subtle differences may be desired.

For example, when developing or testing new datatree transformations, I would like to be able to quickly check that the datatree has been modified as expected. Or, when expecting two datasets to be the same but they are not, it would be helpful to be able to quickly traverse the entire tree structure and see the differences.

Describe the solution you'd like

I think it would be useful to have a means of visually representing all the differences between two xarray Datatree objects, either showing the whole trees and highlighting all the differences, or showing only the differences.

I'm imagining a solution that shows a comparison report similar to ncompare, which provides aligned and colorized difference reports for quick assessments of groups, variable names, types, shapes, and attributes (see ncompare's readme gif or the example notebook). In contrast to ncompare, the proposed solution would work on the xarray data model.

The solution could be a new function, perhaps in the testing suite, such as xarray.testing.all_differences(dt1: DataTree, dt2: DataTree). This could be based on the diff_datatree_repr function that is used in assert_isomorphic:

assert a.isomorphic(b), diff_datatree_repr(a, b, "isomorphic")

def diff_datatree_repr(a: DataTree, b: DataTree, compat):

Describe alternatives you've considered

Showing differences between Datatrees will achieve similar goals to https://github.com/nasa/ncompare. However, a solution in xarray would be different than ncompare, because ncompare looks directly at the netCDF/HDF files, and makes assumptions that that is the data model you care about. xarray instead opens netCDF (or a range of other formats) into an in-memory object which has a data model that is almost but not quite the same as netCDF's data model, then xarray's assertions compare those. For example, netCDF can have dimensions with no corresponding coordinate values, which aren't a part of xarray's data model. In addition, a solution in xarray would be applicable to data coming from additional formats like Zarr.

Additional context

No response

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions