Skip to content

Commit 3f7f62b

Browse files
ravwojdylaclaude
andauthored
zephyr: refactor chunking, improve shuffle scalability (#3839)
* closes #3643 * closes #3798 * closes #3994 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 18f02a4 commit 3f7f62b

9 files changed

Lines changed: 1357 additions & 539 deletions

File tree

lib/zephyr/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ When you call `ctx.execute(dataset)`:
217217

218218
```python
219219
# 1. Create plan from dataset operations
220-
plan = compute_plan(dataset, hints)
220+
plan = compute_plan(dataset)
221221

222222
# 2. Get or create coordinator + workers
223223
coordinator = self._get_or_create_coordinator()

lib/zephyr/src/zephyr/__init__.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from zephyr.dataset import Dataset, ShardInfo
99
from zephyr.execution import WorkerContext, ZephyrContext, zephyr_worker_ctx
1010
from zephyr.expr import Expr, col, lit
11-
from zephyr.plan import ExecutionHint, compute_plan
11+
from zephyr.plan import compute_plan
1212
from zephyr.readers import InputFileSpec, load_file, load_jsonl, load_parquet, load_vortex, load_zip_members
1313
from zephyr.writers import atomic_rename, write_jsonl_file, write_levanter_cache, write_parquet_file, write_vortex_file
1414

@@ -17,7 +17,6 @@
1717

1818
__all__ = [
1919
"Dataset",
20-
"ExecutionHint",
2120
"Expr",
2221
"InputFileSpec",
2322
"ShardInfo",

0 commit comments

Comments
 (0)