Skip to content

Commit 2fe7875

Browse files
authored
Fix deepcopy of Variables and DataArrays (#7089)
* fix deepcopy IndexVariable and dataset encodings * add fix to whats-new * fix also attrs silently changing * fix typpo * fix breaking attrs change * remove xfail from copyattrs test * [skip-ci] fix merge conflict in whats-new
1 parent 6b2fdab commit 2fe7875

File tree

6 files changed

+59
-58
lines changed

6 files changed

+59
-58
lines changed

doc/whats-new.rst

+23-25
Original file line numberDiff line numberDiff line change
@@ -42,70 +42,68 @@ Deprecations
4242
Bug fixes
4343
~~~~~~~~~
4444

45-
- Allow reading netcdf files where the 'units' attribute is a number(:pull:`7085`)
45+
- Allow reading netcdf files where the 'units' attribute is a number. (:pull:`7085`)
4646
By `Ghislain Picard <https://github.com/ghislainp>`_.
47-
- Allow decoding of 0 sized datetimes(:issue:`1329`, :pull:`6882`)
47+
- Allow decoding of 0 sized datetimes. (:issue:`1329`, :pull:`6882`)
4848
By `Deepak Cherian <https://github.com/dcherian>`_.
49-
- Make sure DataArray.name is always a string when used as label for plotting.
50-
(:issue:`6826`, :pull:`6832`)
49+
- Make sure DataArray.name is always a string when used as label for plotting. (:issue:`6826`, :pull:`6832`)
5150
By `Jimmy Westling <https://github.com/illviljan>`_.
52-
- :py:attr:`DataArray.nbytes` now uses the ``nbytes`` property of the underlying array if available.
53-
(:pull:`6797`)
51+
- :py:attr:`DataArray.nbytes` now uses the ``nbytes`` property of the underlying array if available. (:pull:`6797`)
5452
By `Max Jones <https://github.com/maxrjones>`_.
5553
- Rely on the array backend for string formatting. (:pull:`6823`).
5654
By `Jimmy Westling <https://github.com/illviljan>`_.
57-
- Fix incompatibility with numpy 1.20 (:issue:`6818`, :pull:`6821`)
55+
- Fix incompatibility with numpy 1.20. (:issue:`6818`, :pull:`6821`)
5856
By `Michael Niklas <https://github.com/headtr1ck>`_.
5957
- Fix side effects on index coordinate metadata after aligning objects. (:issue:`6852`, :pull:`6857`)
6058
By `Benoît Bovy <https://github.com/benbovy>`_.
61-
- Make FacetGrid.set_titles send kwargs correctly using `handle.udpate(kwargs)`.
62-
(:issue:`6839`, :pull:`6843`)
59+
- Make FacetGrid.set_titles send kwargs correctly using `handle.udpate(kwargs)`. (:issue:`6839`, :pull:`6843`)
6360
By `Oliver Lopez <https://github.com/lopezvoliver>`_.
64-
- Fix bug where index variables would be changed inplace (:issue:`6931`, :pull:`6938`)
61+
- Fix bug where index variables would be changed inplace. (:issue:`6931`, :pull:`6938`)
6562
By `Michael Niklas <https://github.com/headtr1ck>`_.
6663
- Allow taking the mean over non-time dimensions of datasets containing
67-
dask-backed cftime arrays (:issue:`5897`, :pull:`6950`). By `Spencer Clark
68-
<https://github.com/spencerkclark>`_.
69-
- Harmonize returned multi-indexed indexes when applying ``concat`` along new dimension (:issue:`6881`, :pull:`6889`)
64+
dask-backed cftime arrays. (:issue:`5897`, :pull:`6950`)
65+
By `Spencer Clark <https://github.com/spencerkclark>`_.
66+
- Harmonize returned multi-indexed indexes when applying ``concat`` along new dimension. (:issue:`6881`, :pull:`6889`)
7067
By `Fabian Hofmann <https://github.com/FabianHofmann>`_.
7168
- Fix step plots with ``hue`` arg. (:pull:`6944`)
7269
By `András Gunyhó <https://github.com/mgunyho>`_.
73-
- Avoid use of random numbers in `test_weighted.test_weighted_operations_nonequal_coords` (:issue:`6504`, :pull:`6961`).
70+
- Avoid use of random numbers in `test_weighted.test_weighted_operations_nonequal_coords`. (:issue:`6504`, :pull:`6961`)
7471
By `Luke Conibear <https://github.com/lukeconibear>`_.
7572
- Fix multiple regression issues with :py:meth:`Dataset.set_index` and
76-
:py:meth:`Dataset.reset_index` (:pull:`6992`)
73+
:py:meth:`Dataset.reset_index`. (:pull:`6992`)
7774
By `Benoît Bovy <https://github.com/benbovy>`_.
7875
- Raise a ``UserWarning`` when renaming a coordinate or a dimension creates a
7976
non-indexed dimension coordinate, and suggest the user creating an index
80-
either with ``swap_dims`` or ``set_index`` (:issue:`6607`, :pull:`6999`). By
81-
`Benoît Bovy <https://github.com/benbovy>`_.
82-
- Use ``keep_attrs=True`` in grouping and resampling operations by default (:issue:`7012`).
77+
either with ``swap_dims`` or ``set_index``. (:issue:`6607`, :pull:`6999`)
78+
By `Benoît Bovy <https://github.com/benbovy>`_.
79+
- Use ``keep_attrs=True`` in grouping and resampling operations by default. (:issue:`7012`)
8380
This means :py:attr:`Dataset.attrs` and :py:attr:`DataArray.attrs` are now preserved by default.
8481
By `Deepak Cherian <https://github.com/dcherian>`_.
85-
- ``Dataset.encoding['source']`` now exists when reading from a Path object (:issue:`5888`, :pull:`6974`)
82+
- ``Dataset.encoding['source']`` now exists when reading from a Path object. (:issue:`5888`, :pull:`6974`)
8683
By `Thomas Coleman <https://github.com/ColemanTom>`_.
8784
- Better dtype consistency for ``rolling.mean()``. (:issue:`7062`, :pull:`7063`)
8885
By `Sam Levang <https://github.com/slevang>`_.
89-
- Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler
90-
(:issue:`7013`, :pull:`7040`).
86+
- Allow writing NetCDF files including only dimensionless variables using the distributed or multiprocessing scheduler. (:issue:`7013`, :pull:`7040`)
9187
By `Francesco Nattino <https://github.com/fnattino>`_.
92-
- Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect (:issue:`7078`, :pull:`7080`)
88+
- Fix deepcopy of attrs and encoding of DataArrays and Variables. (:issue:`2835`, :pull:`7089`)
89+
By `Michael Niklas <https://github.com/headtr1ck>`_.
90+
- Fix bug where subplot_kwargs were not working when plotting with figsize, size or aspect. (:issue:`7078`, :pull:`7080`)
9391
By `Michael Niklas <https://github.com/headtr1ck>`_.
9492

9593
Documentation
9694
~~~~~~~~~~~~~
97-
- Update merge docstrings (:issue:`6935`, :pull:`7033`).
95+
- Update merge docstrings. (:issue:`6935`, :pull:`7033`)
9896
By `Zach Moon <https://github.com/zmoon>`_.
9997
- Raise a more informative error when trying to open a non-existent zarr store. (:issue:`6484`, :pull:`7060`)
10098
By `Sam Levang <https://github.com/slevang>`_.
101-
- Added examples to docstrings for :py:meth:`DataArray.expand_dims`, :py:meth:`DataArray.drop_duplicates`, :py:meth:`DataArray.reset_coords`, :py:meth:`DataArray.equals`, :py:meth:`DataArray.identical`, :py:meth:`DataArray.broadcast_equals`, :py:meth:`DataArray.bfill`, :py:meth:`DataArray.ffill`, :py:meth:`DataArray.fillna`, :py:meth:`DataArray.dropna`, :py:meth:`DataArray.drop_isel`, :py:meth:`DataArray.drop_sel`, :py:meth:`DataArray.head`, :py:meth:`DataArray.tail`. (:issue:`5816`, :pull:`7088`).
99+
- Added examples to docstrings for :py:meth:`DataArray.expand_dims`, :py:meth:`DataArray.drop_duplicates`, :py:meth:`DataArray.reset_coords`, :py:meth:`DataArray.equals`, :py:meth:`DataArray.identical`, :py:meth:`DataArray.broadcast_equals`, :py:meth:`DataArray.bfill`, :py:meth:`DataArray.ffill`, :py:meth:`DataArray.fillna`, :py:meth:`DataArray.dropna`, :py:meth:`DataArray.drop_isel`, :py:meth:`DataArray.drop_sel`, :py:meth:`DataArray.head`, :py:meth:`DataArray.tail`. (:issue:`5816`, :pull:`7088`)
102100
By `Patrick Naylor <https://github.com/patrick-naylor>`_.
103101
- Add missing docstrings to various array properties. (:pull:`7090`)
104102
By `Tom Nicholas <https://github.com/TomNicholas>`_.
105103

106104
Internal Changes
107105
~~~~~~~~~~~~~~~~
108-
- Added test for DataArray attrs deepcopy recursion/nested attrs (:issue:`2835`).
106+
- Added test for DataArray attrs deepcopy recursion/nested attrs. (:issue:`2835`, :pull:`7086`)
109107
By `Paul hockett <https://github.com/phockett>`_.
110108

111109
.. _whats-new.2022.06.0:

xarray/core/dataarray.py

+3-5
Original file line numberDiff line numberDiff line change
@@ -853,25 +853,23 @@ def loc(self) -> _LocIndexer:
853853
return _LocIndexer(self)
854854

855855
@property
856-
# Key type needs to be `Any` because of mypy#4167
857856
def attrs(self) -> dict[Any, Any]:
858857
"""Dictionary storing arbitrary metadata with this array."""
859858
return self.variable.attrs
860859

861860
@attrs.setter
862861
def attrs(self, value: Mapping[Any, Any]) -> None:
863-
# Disable type checking to work around mypy bug - see mypy#4167
864-
self.variable.attrs = value # type: ignore[assignment]
862+
self.variable.attrs = dict(value)
865863

866864
@property
867-
def encoding(self) -> dict[Hashable, Any]:
865+
def encoding(self) -> dict[Any, Any]:
868866
"""Dictionary of format-specific settings for how this array should be
869867
serialized."""
870868
return self.variable.encoding
871869

872870
@encoding.setter
873871
def encoding(self, value: Mapping[Any, Any]) -> None:
874-
self.variable.encoding = value
872+
self.variable.encoding = dict(value)
875873

876874
@property
877875
def indexes(self) -> Indexes:

xarray/core/dataset.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -633,7 +633,7 @@ def variables(self) -> Frozen[Hashable, Variable]:
633633
return Frozen(self._variables)
634634

635635
@property
636-
def attrs(self) -> dict[Hashable, Any]:
636+
def attrs(self) -> dict[Any, Any]:
637637
"""Dictionary of global attributes on this dataset"""
638638
if self._attrs is None:
639639
self._attrs = {}
@@ -644,7 +644,7 @@ def attrs(self, value: Mapping[Any, Any]) -> None:
644644
self._attrs = dict(value)
645645

646646
@property
647-
def encoding(self) -> dict[Hashable, Any]:
647+
def encoding(self) -> dict[Any, Any]:
648648
"""Dictionary of global encoding attributes on this dataset"""
649649
if self._encoding is None:
650650
self._encoding = {}
@@ -1123,7 +1123,7 @@ def _overwrite_indexes(
11231123
return replaced
11241124

11251125
def copy(
1126-
self: T_Dataset, deep: bool = False, data: Mapping | None = None
1126+
self: T_Dataset, deep: bool = False, data: Mapping[Any, ArrayLike] | None = None
11271127
) -> T_Dataset:
11281128
"""Returns a copy of this dataset.
11291129
@@ -1252,8 +1252,9 @@ def copy(
12521252
variables[k] = v.copy(deep=deep, data=data.get(k))
12531253

12541254
attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
1255+
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
12551256

1256-
return self._replace(variables, indexes=indexes, attrs=attrs)
1257+
return self._replace(variables, indexes=indexes, attrs=attrs, encoding=encoding)
12571258

12581259
def as_numpy(self: T_Dataset) -> T_Dataset:
12591260
"""

xarray/core/indexes.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1107,7 +1107,7 @@ def dims(self) -> Mapping[Hashable, int]:
11071107

11081108
return Frozen(self._dims)
11091109

1110-
def copy(self):
1110+
def copy(self) -> Indexes:
11111111
return type(self)(dict(self._indexes), dict(self._variables))
11121112

11131113
def get_unique(self) -> list[T_PandasOrXarrayIndex]:

xarray/core/variable.py

+27-22
Original file line numberDiff line numberDiff line change
@@ -894,7 +894,7 @@ def __setitem__(self, key, value):
894894
indexable[index_tuple] = value
895895

896896
@property
897-
def attrs(self) -> dict[Hashable, Any]:
897+
def attrs(self) -> dict[Any, Any]:
898898
"""Dictionary of local attributes on this variable."""
899899
if self._attrs is None:
900900
self._attrs = {}
@@ -905,7 +905,7 @@ def attrs(self, value: Mapping[Any, Any]) -> None:
905905
self._attrs = dict(value)
906906

907907
@property
908-
def encoding(self):
908+
def encoding(self) -> dict[Any, Any]:
909909
"""Dictionary of encodings on this variable."""
910910
if self._encoding is None:
911911
self._encoding = {}
@@ -918,7 +918,7 @@ def encoding(self, value):
918918
except ValueError:
919919
raise ValueError("encoding must be castable to a dictionary")
920920

921-
def copy(self, deep=True, data=None):
921+
def copy(self, deep: bool = True, data: ArrayLike | None = None):
922922
"""Returns a copy of this object.
923923
924924
If `deep=True`, the data array is loaded into memory and copied onto
@@ -929,7 +929,7 @@ def copy(self, deep=True, data=None):
929929
930930
Parameters
931931
----------
932-
deep : bool, optional
932+
deep : bool, default: True
933933
Whether the data array is loaded into memory and copied onto
934934
the new object. Default is True.
935935
data : array_like, optional
@@ -975,28 +975,29 @@ def copy(self, deep=True, data=None):
975975
pandas.DataFrame.copy
976976
"""
977977
if data is None:
978-
data = self._data
978+
ndata = self._data
979979

980-
if isinstance(data, indexing.MemoryCachedArray):
980+
if isinstance(ndata, indexing.MemoryCachedArray):
981981
# don't share caching between copies
982-
data = indexing.MemoryCachedArray(data.array)
982+
ndata = indexing.MemoryCachedArray(ndata.array)
983983

984984
if deep:
985-
data = copy.deepcopy(data)
985+
ndata = copy.deepcopy(ndata)
986986

987987
else:
988-
data = as_compatible_data(data)
989-
if self.shape != data.shape:
988+
ndata = as_compatible_data(data)
989+
if self.shape != ndata.shape:
990990
raise ValueError(
991991
"Data shape {} must match shape of object {}".format(
992-
data.shape, self.shape
992+
ndata.shape, self.shape
993993
)
994994
)
995995

996-
# note:
997-
# dims is already an immutable tuple
998-
# attributes and encoding will be copied when the new Array is created
999-
return self._replace(data=data)
996+
attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
997+
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
998+
999+
# note: dims is already an immutable tuple
1000+
return self._replace(data=ndata, attrs=attrs, encoding=encoding)
10001001

10011002
def _replace(
10021003
self: T_Variable,
@@ -2877,7 +2878,7 @@ def concat(
28772878

28782879
return cls(first_var.dims, data, attrs)
28792880

2880-
def copy(self, deep=True, data=None):
2881+
def copy(self, deep: bool = True, data: ArrayLike | None = None):
28812882
"""Returns a copy of this object.
28822883
28832884
`deep` is ignored since data is stored in the form of
@@ -2889,7 +2890,7 @@ def copy(self, deep=True, data=None):
28892890
28902891
Parameters
28912892
----------
2892-
deep : bool, optional
2893+
deep : bool, default: True
28932894
Deep is ignored when data is given. Whether the data array is
28942895
loaded into memory and copied onto the new object. Default is True.
28952896
data : array_like, optional
@@ -2902,16 +2903,20 @@ def copy(self, deep=True, data=None):
29022903
data copied from original.
29032904
"""
29042905
if data is None:
2905-
data = self._data.copy(deep=deep)
2906+
ndata = self._data.copy(deep=deep)
29062907
else:
2907-
data = as_compatible_data(data)
2908-
if self.shape != data.shape:
2908+
ndata = as_compatible_data(data)
2909+
if self.shape != ndata.shape:
29092910
raise ValueError(
29102911
"Data shape {} must match shape of object {}".format(
2911-
data.shape, self.shape
2912+
ndata.shape, self.shape
29122913
)
29132914
)
2914-
return self._replace(data=data)
2915+
2916+
attrs = copy.deepcopy(self._attrs) if deep else copy.copy(self._attrs)
2917+
encoding = copy.deepcopy(self._encoding) if deep else copy.copy(self._encoding)
2918+
2919+
return self._replace(data=ndata, attrs=attrs, encoding=encoding)
29152920

29162921
def equals(self, other, equiv=None):
29172922
# if equiv is specified, super up

xarray/tests/test_dataarray.py

-1
Original file line numberDiff line numberDiff line change
@@ -6458,7 +6458,6 @@ def test_delete_coords() -> None:
64586458
assert set(a1.coords.keys()) == {"x"}
64596459

64606460

6461-
@pytest.mark.xfail
64626461
def test_deepcopy_nested_attrs() -> None:
64636462
"""Check attrs deep copy, see :issue:`2835`"""
64646463
da1 = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"), coords={"x": [10, 20]})

0 commit comments

Comments
 (0)