You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: icechunk-python/docs/docs/moving-chunks.md
+47-99Lines changed: 47 additions & 99 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,41 +15,27 @@ This enables **rolling time windows**—continuously updating datasets like fore
15
15
16
16
| Method | Best For | Flexibility |
17
17
|--------|----------|-------------|
18
-
|[`shift_array`][icechunk.Session.shift_array]| Uniform shifts with edge handling | Simple—just specify offset and mode |
18
+
|[`shift_array`][icechunk.Session.shift_array]| Shift with discard — rolling time windows | Simple—just specify offset |
19
+
|[`roll_array`][icechunk.Session.roll_array]| Circular shift — no data loss | Simple—just specify offset |
19
20
|[`reindex_array`][icechunk.Session.reindex_array]| Custom transformations | Maximum—you control every chunk |
20
21
21
22
## Offsets Are in Chunks, Not Elements
22
23
23
-
Both methods work with **chunk indices**, not array indices. If your array has `chunk_size=2`, then an offset of `(-1,)` shifts by 1 chunk, which is 2 elements:
24
+
All three methods work with **chunk indices**, not array indices. If your array has `chunk_size=2`, then an offset of `(-1,)` shifts by 1 chunk, which is 2 elements:
24
25
25
26
```python
26
27
# With chunk_size=2:
27
-
shift_array("/arr", (-1,), "wrap") # → shifts by 1 chunk = 2 elements
28
-
shift_array("/arr", (-2,), "wrap")# → shifts by 2 chunks = 4 elements
28
+
shift_array("/arr", (-1,)) # → shifts by 1 chunk = 2 elements
29
+
shift_array("/arr", (-2,)) # → shifts by 2 chunks = 4 elements
29
30
```
30
31
31
32
Why chunks instead of elements? Because these are **metadata-only operations**. Shifting by partial chunks would require splitting and rewriting chunk data.
32
33
33
-
For convenience, `shift_array`returns the shift converted to element space—so you don't need to manually track chunk sizes when determining where to write new data.
34
+
For convenience, `shift_array`and `roll_array` return the shift converted to element space—so you don't need to manually track chunk sizes when determining where to write new data.
34
35
35
36
## shift_array { #shift_array }
36
37
37
-
The [`shift_array`][icechunk.Session.shift_array] method moves all chunks by a fixed offset per dimension (negative to shift toward index 0, positive toward higher indices), with built-in handling for what happens at the boundaries. For convenience, it returns the **index shift** (`chunk_offset × chunk_size` for each dimension).
38
-
39
-
### Shift Modes
40
-
41
-
The `mode` parameter controls what happens to chunks that shift out of bounds:
42
-
43
-
| Mode | Behavior | Data Loss |
44
-
|------|----------|-----------|
45
-
|`"wrap"`| Chunks wrap to the other side | None |
46
-
|`"discard"`| Out-of-bounds chunks are dropped | Yes |
47
-
48
-
You can use strings (`"wrap"`, `"discard"`) or the enum ([`ic.ShiftMode.WRAP`][icechunk.ShiftMode], [`ic.ShiftMode.DISCARD`][icechunk.ShiftMode]).
49
-
50
-
#### WRAP Mode
51
-
52
-
Chunks that shift out of one end reappear at the other—no data is lost.
38
+
The [`shift_array`][icechunk.Session.shift_array] method moves all chunks by a fixed offset per dimension (negative to shift toward index 0, positive toward higher indices). Chunks that shift out of bounds are discarded, and vacated positions retain stale data — the caller typically writes new data there. It returns the **index shift** (`chunk_offset × chunk_size` for each dimension).
The chunks containing `[0, 1, 2, 3]` were discarded, and the vacated end filled with `-1`.
89
+
### Example: Rolling Time Window
103
90
104
-
### Preserving Data with Resize
91
+
Imagine a sensor array storing the last 7 days of hourly readings—shape `(168,)`with one chunk per day `(24,)`. Each day, you want to discard the oldest day and make room for new data:
105
92
106
-
With `"discard"` mode, chunks that shift out of bounds are lost. To preserve everything when shifting, resize first:
93
+
```python
94
+
# Each day: shift left by 1 chunk, discarding the oldest
session.commit(f"Updated sensor data for {today}")
125
102
```
126
103
104
+
The return value tells you exactly where to write new data—no need to manually track chunk sizes.
105
+
106
+
This pattern works identically whether your array is 1 KB or 1 PB, and whether it's on local disk or cloud object storage—the shift is always instant with zero data transfer.
107
+
127
108
### Multi-dimensional Arrays
128
109
129
110
For N-dimensional arrays, provide an offset for each dimension:
session.shift_array("/arr2d", (1, 0), "discard") # Shift down 1 chunk
127
+
session.shift_array("/arr2d", (1, 0)) # Shift down 1 chunk
147
128
print("\nAfter shift (1, 0):")
148
129
print(arr[:])
149
130
```
150
131
151
-
### Example: Rolling Time Window
132
+
##roll_array { #roll_array }
152
133
153
-
Imagine a sensor array storing the last 7 days of hourly readings—shape `(168,)` with one chunk per day `(24,)`. Each day, you want to discard the oldest day and make room for new data:
154
-
155
-
```python
156
-
# Each day: shift left by 1 chunk, discarding the oldest
# element_shift = (-24,) — the shift in element space
159
-
160
-
# Write new day's data to the vacated region
161
-
arr[element_shift[0]:] = todays_readings
162
-
163
-
session.commit(f"Updated sensor data for {today}")
164
-
```
165
-
166
-
The return value tells you exactly where to write new data—no need to manually track chunk sizes.
167
-
168
-
This pattern works identically whether your array is 1 KB or 1 PB, and whether it's on local disk or cloud object storage—the shift is always instant with zero data transfer.
169
-
170
-
## reindex_array { #reindex_array }
171
-
172
-
For transformations that [`shift_array`][icechunk.Session.shift_array] can't express, [`reindex_array`][icechunk.Session.reindex_array] gives you complete control. You provide a function that maps each chunk's old position to its new position.
173
-
174
-
Your function receives a chunk index (as a list) and returns:
175
-
176
-
- A new index (as a list) to move the chunk there
177
-
-`None` to discard the chunk
134
+
The [`roll_array`][icechunk.Session.roll_array] method performs a circular shift — chunks that go out of one end wrap around to the other side. No data is lost.
session.roll_array("/arr", (-2,)) # Roll left by 2 chunks
199
151
print("After: ", arr[:])
200
152
```
201
153
202
-
### The delete_vacated Parameter
154
+
Notice how `[0, 1, 2, 3]` wrapped around to the end.
155
+
156
+
## reindex_array { #reindex_array }
203
157
204
-
The `delete_vacated` parameter controls what happens to source positions after chunks move away:
158
+
For transformations that [`shift_array`][icechunk.Session.shift_array] and [`roll_array`][icechunk.Session.roll_array] can't express, [`reindex_array`][icechunk.Session.reindex_array] gives you complete control. You provide a function that maps each chunk's old position to its new position.
159
+
160
+
Your function receives a chunk index (as a list) and returns:
205
161
206
-
| Value | Behavior |
207
-
|-------|----------|
208
-
|`True`| Vacated positions are deleted (return fill value) |
209
-
|`False`| Vacated positions keep stale references |
0 commit comments