zephyr: shuffle integration tests by ravwojdyla-agent · Pull Request #4784 · marin-community/marin

ravwojdyla-agent · 2026-04-15T18:25:11Z

adds a shuffle integration-test lane that exercises scatter/reduce at 10 GB on marin-dev
lib/zephyr/tests/benchmark_shuffle.py — synthetic shuffle driver with --hot-shard-frac / --hot-key-pool to bias one reducer and --repeat N ¹ so one iris job can emit multiple RESULT: lines for variance tracking
.github/workflows/zephyr-shuffle-itest.yaml — matrix over 4 scenarios (uniform/skew × small/large items), each leg submits one iris job and polls to terminal state
- scenarios: uniform-small / uniform-large / skew90-small / skew90-large
  - small = 600k items × 250 B, large = 160 items × 1 MB — both ≈ 10 GB per run
  - skew = 90 % of items routed to a single hot reducer via --hot-shard-frac 0.9 --hot-key-pool 128
- --repeat 3 per scenario, --priority production on submission, fail-fast: false so all four legs always run
workflow_dispatch only for now — scheduled run would fail every night on skew90-* until [zephyr] Replace Parquet shuffle with zstd-chunk format #4782 lands (baseline parquet OOMs the hot-shard reducer)

🤖 Generated with Claude Code

each repeat reuses the same ZephyrContext, so worker actor startup happens once and the per-iteration walltime isolates shuffle time from bootstrap variance. ↩

Adds a zephyr shuffle integration test that exercises scatter/reduce at 10 GB across four scenarios: uniform vs. 90% skew crossed with small (250 B) vs. large (1 MB) items. Each scenario runs as its own GitHub Actions matrix leg, submitting an iris job to marin-dev and polling to terminal state. - `lib/zephyr/tests/benchmark_shuffle.py` — synthetic shuffle driver with `--hot-shard-frac`/`--hot-key-pool` options to bias one reducer. Prints a single `RESULT: {json}` line for log scraping. - `.github/workflows/zephyr-shuffle-itest.yaml` — matrix workflow. Manual (`workflow_dispatch`) only for now; add a cron once the shuffle implementation can pass the skew scenarios (baseline Parquet OOMs).

- Adds `--repeat N` to benchmark_shuffle.py; each iteration emits its own RESULT line so downstream tooling can compute min/median/max walltime. - Workflow submits each scenario with `--priority production` so the integration tests get scheduling priority on marin-dev. - Each scenario now runs 3 shuffles sequentially in the same iris job; workflow timeout bumped to 180 min and prints all RESULT lines.

rjpower · 2026-04-15T19:32:30Z

Should we delete the probably-now-irrelevant lib/zephyr/tests/benchmark_dedup_pipeline.py ?

I'm a little hesitant about a plethora of workflows, I find they're hard to keep track of. I admit I don't have great ideas in this space though. I feel we might want to invest in having a "data" workflow that manages a bunch of runs? No great ideas here.

Superseded by benchmark_shuffle.py added in this PR; per reviewer in #4784.

ravwojdyla-agent · 2026-04-15T19:50:11Z

🤖 removed lib/zephyr/tests/benchmark_dedup_pipeline.py in 2e335ab.

ravwojdyla · 2026-04-15T20:02:35Z

I'm a little hesitant about a plethora of workflows, I find they're hard to keep track of. I admit I don't have great ideas in this space though. I feel we might want to invest in having a "data" workflow that manages a bunch of runs? No great ideas here.

@rjpower Agreed. I'm less worried about the proliferation of workflows as long every workflow has a person (or people) that care about that workflow. And specifically there shouldn't be a single person carry about all of them at once.

benchmark_dedup_pipeline.py removed - ptal

ravwojdyla-agent added the agent-generated Created by automation/agent label Apr 15, 2026

ravwojdyla added 2 commits April 15, 2026 18:39

style: black-format benchmark_shuffle.py

0439fd8

ravwojdyla-agent changed the title ~~[zephyr] Shuffle integration tests + CI workflow~~ zephyr: shuffle integration tests Apr 15, 2026

ravwojdyla requested a review from rjpower April 15, 2026 18:57

zephyr: remove obsolete benchmark_dedup_pipeline.py

2e335ab

Superseded by benchmark_shuffle.py added in this PR; per reviewer in #4784.

rjpower approved these changes Apr 15, 2026

View reviewed changes

ravwojdyla merged commit 5eb5b60 into main Apr 15, 2026
40 checks passed

ravwojdyla deleted the rav/shuffle-itest branch April 15, 2026 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zephyr: shuffle integration tests#4784

zephyr: shuffle integration tests#4784
ravwojdyla merged 4 commits intomainfrom
rav/shuffle-itest

ravwojdyla-agent commented Apr 15, 2026 •

edited

Loading

Uh oh!

rjpower commented Apr 15, 2026

Uh oh!

ravwojdyla-agent commented Apr 15, 2026

Uh oh!

ravwojdyla commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ravwojdyla-agent commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

rjpower commented Apr 15, 2026

Uh oh!

ravwojdyla-agent commented Apr 15, 2026

Uh oh!

ravwojdyla commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ravwojdyla-agent commented Apr 15, 2026 •

edited

Loading

ravwojdyla commented Apr 15, 2026 •

edited

Loading