You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once you find a splitting configuration you like, remember to persist it on-disk using `repo.save_config`.
58
+
49
59
This particular example splits manifests so that each manifest contains `365 * 24` chunks along the time dimension, and every chunk along every other dimension in a single file.
50
60
51
61
Options for specifying the arrays whose manifest you want to split are:
@@ -92,3 +102,105 @@ will result in splitting manifests so that each manifest contains (3 longitude c
92
102
!!! note
93
103
94
104
Python dictionaries preserve insertion order, so the first condition encountered takes priority.
105
+
106
+
107
+
108
+
### Splitting behaviour
109
+
110
+
By default, Icechunk minimizes the number of chunk refs that are written in a single commit.
111
+
112
+
Consider this simple example: a 1D array with split size 1 along axis 0.
Look carefully, only one new manifest with the 3 new chunk refs has been written.
187
+
188
+
Why?
189
+
190
+
Icechunk minimizes how many chunk references are rewritten at each commit (to save time and memory). The previous splitting configuration (split size of 1) results in manifests that are _compatible_ with the current configuration (split size of 5) because the bounding box of every existing manifest `[slice(0, 1), slice(1, 2), ...]` is fully contained in the the bounding boxes implied by the new configuration `[slice(0, 5), slice(5, 10)]`.
191
+
192
+
To force icechunk to rewrite all chunk refs to the current splitting configuration use `repo.rewrite_manifests` --- for the current example this will consolidate to two manifests.
0 commit comments