Conversation
Adds a zephyr shuffle integration test that exercises scatter/reduce at
10 GB across four scenarios: uniform vs. 90% skew crossed with small
(250 B) vs. large (1 MB) items. Each scenario runs as its own GitHub
Actions matrix leg, submitting an iris job to marin-dev and polling to
terminal state.
- `lib/zephyr/tests/benchmark_shuffle.py` — synthetic shuffle driver
with `--hot-shard-frac`/`--hot-key-pool` options to bias one reducer.
Prints a single `RESULT: {json}` line for log scraping.
- `.github/workflows/zephyr-shuffle-itest.yaml` — matrix workflow. Manual
(`workflow_dispatch`) only for now; add a cron once the shuffle
implementation can pass the skew scenarios (baseline Parquet OOMs).
- Adds `--repeat N` to benchmark_shuffle.py; each iteration emits its own RESULT line so downstream tooling can compute min/median/max walltime. - Workflow submits each scenario with `--priority production` so the integration tests get scheduling priority on marin-dev. - Each scenario now runs 3 shuffles sequentially in the same iris job; workflow timeout bumped to 180 min and prints all RESULT lines.
|
Should we delete the probably-now-irrelevant lib/zephyr/tests/benchmark_dedup_pipeline.py ? I'm a little hesitant about a plethora of workflows, I find they're hard to keep track of. I admit I don't have great ideas in this space though. I feel we might want to invest in having a "data" workflow that manages a bunch of runs? No great ideas here. |
Superseded by benchmark_shuffle.py added in this PR; per reviewer in #4784.
|
🤖 removed |
@rjpower Agreed. I'm less worried about the proliferation of workflows as long every workflow has a person (or people) that care about that workflow. And specifically there shouldn't be a single person carry about all of them at once.
|
lib/zephyr/tests/benchmark_shuffle.py— synthetic shuffle driver with--hot-shard-frac/--hot-key-poolto bias one reducer and--repeat N1 so one iris job can emit multipleRESULT:lines for variance tracking.github/workflows/zephyr-shuffle-itest.yaml— matrix over 4 scenarios (uniform/skew × small/large items), each leg submits one iris job and polls to terminal stateuniform-small/uniform-large/skew90-small/skew90-large--hot-shard-frac 0.9 --hot-key-pool 128--repeat 3per scenario,--priority productionon submission,fail-fast: falseso all four legs always runworkflow_dispatchonly for now — scheduled run would fail every night onskew90-*until [zephyr] Replace Parquet shuffle with zstd-chunk format #4782 lands (baseline parquet OOMs the hot-shard reducer)🤖 Generated with Claude Code
Footnotes
each repeat reuses the same
ZephyrContext, so worker actor startup happens once and the per-iteration walltime isolates shuffle time from bootstrap variance. ↩