A means of viewing all differences between two datatrees #9929
Description
Is your feature request related to a problem?
It can be frustrating to figure out why two Datatrees are not returning True
when running xarray.DataTree.identical()
or xarray.DataTree.equals()
.
Currently, if xarray
's diff functions detect any difference in the tree structure, they raise at that point, and so do not show all of the differences. Thus, the current functions excel when the user wants to check that two datatrees are equal, but not when the user wants to discover subtle differences — and there are cases in which such subtle differences may be desired.
For example, when developing or testing new datatree transformations, I would like to be able to quickly check that the datatree has been modified as expected. Or, when expecting two datasets to be the same but they are not, it would be helpful to be able to quickly traverse the entire tree structure and see the differences.
Describe the solution you'd like
I think it would be useful to have a means of visually representing all the differences between two xarray Datatree objects, either showing the whole trees and highlighting all the differences, or showing only the differences.
I'm imagining a solution that shows a comparison report similar to ncompare, which provides aligned and colorized difference reports for quick assessments of groups, variable names, types, shapes, and attributes (see ncompare's readme gif or the example notebook). In contrast to ncompare
, the proposed solution would work on the xarray
data model.
The solution could be a new function, perhaps in the testing suite, such as xarray.testing.all_differences(dt1: DataTree, dt2: DataTree)
. This could be based on the diff_datatree_repr
function that is used in assert_isomorphic
:
xarray/xarray/testing/assertions.py
Line 81 in 1486bea
xarray/xarray/core/formatting.py
Line 1053 in 1486bea
Describe alternatives you've considered
Showing differences between Datatrees will achieve similar goals to https://github.com/nasa/ncompare. However, a solution in xarray
would be different than ncompare
, because ncompare
looks directly at the netCDF/HDF files, and makes assumptions that that is the data model you care about. xarray
instead opens netCDF (or a range of other formats) into an in-memory object which has a data model that is almost but not quite the same as netCDF's data model, then xarray
's assertions compare those. For example, netCDF can have dimensions with no corresponding coordinate values, which aren't a part of xarray
's data model. In addition, a solution in xarray
would be applicable to data coming from additional formats like Zarr.
Additional context
No response