Skip to content

zephyr: shuffle integration tests#4784

Merged
ravwojdyla merged 4 commits intomainfrom
rav/shuffle-itest
Apr 15, 2026
Merged

zephyr: shuffle integration tests#4784
ravwojdyla merged 4 commits intomainfrom
rav/shuffle-itest

Conversation

@ravwojdyla-agent
Copy link
Copy Markdown
Contributor

@ravwojdyla-agent ravwojdyla-agent commented Apr 15, 2026

  • adds a shuffle integration-test lane that exercises scatter/reduce at 10 GB on marin-dev
  • lib/zephyr/tests/benchmark_shuffle.py — synthetic shuffle driver with --hot-shard-frac / --hot-key-pool to bias one reducer and --repeat N 1 so one iris job can emit multiple RESULT: lines for variance tracking
  • .github/workflows/zephyr-shuffle-itest.yaml — matrix over 4 scenarios (uniform/skew × small/large items), each leg submits one iris job and polls to terminal state
    • scenarios: uniform-small / uniform-large / skew90-small / skew90-large
      • small = 600k items × 250 B, large = 160 items × 1 MB — both ≈ 10 GB per run
      • skew = 90 % of items routed to a single hot reducer via --hot-shard-frac 0.9 --hot-key-pool 128
    • --repeat 3 per scenario, --priority production on submission, fail-fast: false so all four legs always run
  • workflow_dispatch only for now — scheduled run would fail every night on skew90-* until [zephyr] Replace Parquet shuffle with zstd-chunk format #4782 lands (baseline parquet OOMs the hot-shard reducer)

🤖 Generated with Claude Code

Footnotes

  1. each repeat reuses the same ZephyrContext, so worker actor startup happens once and the per-iteration walltime isolates shuffle time from bootstrap variance.

Adds a zephyr shuffle integration test that exercises scatter/reduce at
10 GB across four scenarios: uniform vs. 90% skew crossed with small
(250 B) vs. large (1 MB) items. Each scenario runs as its own GitHub
Actions matrix leg, submitting an iris job to marin-dev and polling to
terminal state.

- `lib/zephyr/tests/benchmark_shuffle.py` — synthetic shuffle driver
  with `--hot-shard-frac`/`--hot-key-pool` options to bias one reducer.
  Prints a single `RESULT: {json}` line for log scraping.
- `.github/workflows/zephyr-shuffle-itest.yaml` — matrix workflow. Manual
  (`workflow_dispatch`) only for now; add a cron once the shuffle
  implementation can pass the skew scenarios (baseline Parquet OOMs).
@ravwojdyla-agent ravwojdyla-agent added the agent-generated Created by automation/agent label Apr 15, 2026
- Adds `--repeat N` to benchmark_shuffle.py; each iteration emits its own
  RESULT line so downstream tooling can compute min/median/max walltime.
- Workflow submits each scenario with `--priority production` so the
  integration tests get scheduling priority on marin-dev.
- Each scenario now runs 3 shuffles sequentially in the same iris job;
  workflow timeout bumped to 180 min and prints all RESULT lines.
@ravwojdyla-agent ravwojdyla-agent changed the title [zephyr] Shuffle integration tests + CI workflow zephyr: shuffle integration tests Apr 15, 2026
@ravwojdyla ravwojdyla requested a review from rjpower April 15, 2026 18:57
@rjpower
Copy link
Copy Markdown
Collaborator

rjpower commented Apr 15, 2026

Should we delete the probably-now-irrelevant lib/zephyr/tests/benchmark_dedup_pipeline.py ?

I'm a little hesitant about a plethora of workflows, I find they're hard to keep track of. I admit I don't have great ideas in this space though. I feel we might want to invest in having a "data" workflow that manages a bunch of runs? No great ideas here.

Superseded by benchmark_shuffle.py added in this PR; per reviewer in #4784.
@ravwojdyla-agent
Copy link
Copy Markdown
Contributor Author

🤖 removed lib/zephyr/tests/benchmark_dedup_pipeline.py in 2e335ab.

@ravwojdyla
Copy link
Copy Markdown
Contributor

ravwojdyla commented Apr 15, 2026

I'm a little hesitant about a plethora of workflows, I find they're hard to keep track of. I admit I don't have great ideas in this space though. I feel we might want to invest in having a "data" workflow that manages a bunch of runs? No great ideas here.

@rjpower Agreed. I'm less worried about the proliferation of workflows as long every workflow has a person (or people) that care about that workflow. And specifically there shouldn't be a single person carry about all of them at once.

benchmark_dedup_pipeline.py removed - ptal

@ravwojdyla ravwojdyla merged commit 5eb5b60 into main Apr 15, 2026
40 checks passed
@ravwojdyla ravwojdyla deleted the rav/shuffle-itest branch April 15, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants