Skip to content

Commit 96b76df

Browse files
yonromaiclaude
andcommitted
docs: drop coordinator memory below 4GB validator threshold
iris job run rejects --memory >= 4 GB without --enable-extra-resources (lib/iris/src/iris/cli/job.py:432). The mechanical rewrite produced --memory=4G, which trips the validator; reduce to --memory=2G to match the canonical ferries pattern (experiments/ferries/OPS.md:17). Also correct the stale --memory 16g example in lib/iris/OPS.md:42. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c84acef commit 96b76df

19 files changed

Lines changed: 27 additions & 27 deletions

.agents/projects/ferry_framework.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ gh issue list \
274274
Launch shape (illustrative, to pin in recipe):
275275

276276
```bash
277-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
277+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
278278
-- python -m experiments.ferries.daily --run_name "daily-125m-$(date +%F)"
279279
```
280280

.agents/skills/ferries/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ Then push the launch commit (no proposal PR by default).
124124
Before launch, confirm requester approval in-thread unless they already gave explicit "launch without asking" permission.
125125

126126
```bash
127-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
127+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
128128
-- python -m experiments.ferries.daily
129129
```
130130

@@ -136,7 +136,7 @@ After launch, capture and post to the issue:
136136

137137
Optional deterministic daily rerun name:
138138
```bash
139-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
139+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
140140
-e FERRY_DATE "$(date +%Y%m%d-%H%M%S)-daily-ferry" \
141141
-- python -m experiments.ferries.daily
142142
```

docs/explanations/executor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ as the entrypoint of a CPU-only Iris job. The script then uses `executor_main`
103103
to spawn the accelerated sub-jobs via Fray:
104104

105105
```bash
106-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
106+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
107107
-- python -m experiments.tutorials.hello_world
108108
```
109109

docs/recipes/add_scaling_heuristic.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Nemotron mix. Then sweep the optimizer hyperparameter space with
4242
`experiments/references/reference_hyperparameter_sweep.py` or an equivalent setup.
4343

4444
```sh
45-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
45+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
4646
-e WANDB_API_KEY "$WANDB_API_KEY" \
4747
-- python -m experiments.references.reference_hyperparameter_sweep
4848
```
@@ -174,7 +174,7 @@ SCALING_SUITES = {
174174
Submit:
175175

176176
```sh
177-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
177+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
178178
-e WANDB_API_KEY "$WANDB_API_KEY" \
179179
-- python -m experiments.isoflop_sweep
180180
```

docs/tutorials/train-an-lm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ Iris entrypoint job. `executor_main` inside the script spawns the TPU/GPU
151151
sub-tasks via Fray:
152152

153153
```bash
154-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
154+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
155155
-e WANDB_API_KEY "$WANDB_API_KEY" \
156156
-- python -m experiments.${YOUR_EXPERIMENT_SCRIPT}
157157
```

docs/tutorials/train-dpo.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ Submit the job to the shared Iris cluster (CPU-only entrypoint; the script's
141141
`executor_main` spawns the TPU sub-task via Fray):
142142

143143
```bash
144-
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=4G --extra=cpu \
144+
uv run iris --cluster=marin job run --no-wait --cpu=1 --memory=2G --extra=cpu \
145145
-e WANDB_API_KEY "$WANDB_API_KEY" \
146146
-- python -m experiments.my_dpo_experiment
147147
```

experiments/README_sft.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The default doc reproduces OLMO SFT
1111
Run Olmo SFT with:
1212

1313
```bash
14-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
14+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
1515
-e HF_TOKEN "$HF_TOKEN" \
1616
-- python -m experiments.exp227_sft
1717
```
@@ -45,12 +45,12 @@ In `train_step`, essential parameters:
4545

4646
```bash
4747
# Basic run
48-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
48+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
4949
-e HF_TOKEN "$HF_TOKEN" \
5050
-- python -m experiments.my_sft
5151

5252
# Force specific steps
53-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
53+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
5454
-e HF_TOKEN "$HF_TOKEN" \
5555
-- python -m experiments.my_sft --force_run '["your_step_name"]'
5656
```

experiments/grug/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ uv run python experiments/grug/base/launch.py
4040
Iris cluster run (from a dev box, on `marin` prod cluster):
4141

4242
```bash
43-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
43+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
4444
-e WANDB_API_KEY "$WANDB_API_KEY" \
4545
-- python -m experiments.grug.base.launch
4646
```

experiments/rollout_data/coderforge.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"""togethercomputer/CoderForge-Preview rollout dataset.
55
66
Usage:
7-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
7+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
88
-- python -m experiments.rollout_data.coderforge
99
"""
1010

experiments/rollout_data/gpt_oss_rollouts.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"""andyrdt/gpt-oss-20b-rollouts rollout dataset (non-benchmark subsets).
55
66
Usage:
7-
uv run iris --cluster=marin job run --cpu=1 --memory=4G --extra=cpu \
7+
uv run iris --cluster=marin job run --cpu=1 --memory=2G --extra=cpu \
88
-- python -m experiments.rollout_data.gpt_oss_rollouts
99
"""
1010

0 commit comments

Comments
 (0)