Releases: google/xarray-beam
Releases · google/xarray-beam
v0.11.5
Improve `xarray_beam.Dataset` `__repr__`. The `__repr__` now replaces dask array representations within the template with `...` for brevity and clarity, as the dask chunks don't necessarily match the `xarray_beam` chunks. The test is updated to check for a more complete and accurate representation. PiperOrigin-RevId: 828189621
v0.11.4
Preserve existing dimension order in replace_template_dims This fixes a bug where replace_template_dims() could inadvertently change the order of dimensions. PiperOrigin-RevId: 824563534
v0.11.3
Allow for passing pipelines to xbeam.Dataset constructors. Associating a beam.Pipeline with an xbeam.Dataset means that a pipeline doesn't need to be applied later (e.g., to the result of `to_zarr`). This is both a little cleaner, and also potentially a significant optimization, because it means that Beam understands that it can reuse a ptransform rather than recomputing it. This includes a new `_LazyPCollection` class to ensure that our optimizations for Transforms applied directly after xbeam.DatasetToChunks still works. PiperOrigin-RevId: 824281688
v0.11.2
Add more validation to xbeam.Dataset.map_blocks PiperOrigin-RevId: 823674644
v0.11.1
Allow specifying default chunks per shard in `to_zarr`. The `zarr_chunks_per_shard` argument in `xbeam.Dataset.to_zarr` now supports using `...` as a key to set a default number of chunks per shard for all dimensions not explicitly listed. Dimensions not included in the mapping default to 1 chunk per shard. This simplifies specifying Zarr chunking strategies. PiperOrigin-RevId: 819301545
v0.11.0
Add local staging to Zarr setup in xarray_beam. Fixes https://github.com/google/xarray-beam/issues/122 This change introduces a `stage_locally` parameter to `setup_zarr`, `ChunksToZarr` and `Dataset.to_zarr`. When enabled, Zarr metadata is first written to a local temporary directory and then copied to the final destination in parallel using `fsspec`. This can significantly speed up the setup process on high-latency filesystems, e.g., in one example, I found it sped up Zarr setup by a factor of 25x, from 100 seconds to 4 seconds. This adds a hard dependency on fsspec in Xarray-Beam. Hopefully in the future Xarray will have concurrent writing to stores built in (see https://github.com/pydata/xarray/issues/10622), which will eliminate the primary need for this. Alternatively, we might be able to eventually leverage Zarr's built-in stores to do this copying rather than fsspec. Zarr has all the necessary functionality (including atomic writes, which would be nice) but does not expose the required public APIs for copying store objects from a synchronous function. PiperOrigin-RevId: 817684876
v0.10.5
Allow `Dataset.rechunk` to change `split_vars`. This is convenient because the optimal ordering of splitting and rechunking is not obvious. Also make consolidate_variables() and split_variables() no-ops when appropriate. PiperOrigin-RevId: 816805573
v0.10.4
Add Dataset.from_ptransform This is a variant of the Dataset constructor with extensive validation. Also add documentation explaining how it works. PiperOrigin-RevId: 814972825
v0.10.3
Allow using `...` as a key in chunk specifications.
This change enables specifying a default chunk size for all dimensions not explicitly listed in the `chunks` mapping by using `...` as a key. For example, `{'x': 10, ...: 20}` will chunk dimension 'x' into sizes of 10 and all other dimensions into sizes of 20.
PiperOrigin-RevId: 814430585
v0.10.2
Add xbeam.normalize_chunks() and update xbeam.Dataset docstrings PiperOrigin-RevId: 813800457