Skip to content

Commit a7f5e68

Browse files
committed
Simplify autoresearch run artifacts
1 parent c9c7812 commit a7f5e68

22 files changed

Lines changed: 307 additions & 359 deletions

examples/autoresearch/README.md

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ The command can be any runnable benchmark or training loop. It just needs to:
4141
- exit nonzero on failure
4242
- log the configured metric to `run_dir/metrics.jsonl`
4343

44+
Useful command placeholders are `{run_dir}` for the attempt artifact directory,
45+
`{attempt_name}` for numeric attempt ids like `001`, and `{run_root}` for the
46+
assembled `LOG_ROOT/RUN_NAME` directory.
47+
4448
```python
4549
ml_logger.log_metrics({"accuracy": 0.73}, step=1)
4650
```
@@ -67,27 +71,27 @@ recipe-specific settings.
6771
The easiest Kubernetes path is the small CLI. From `examples/autoresearch`:
6872

6973
```bash
70-
uv run --project .. python -m harness.cli run recipes/text_sql name=text-sql-v1
74+
uv run --project .. python -m harness.cli run recipes/text_sql run_name=text-sql-v1
7175
```
7276

7377
That command creates an ignored generated overlay under `.runs/text-sql-v1`,
7478
copies the flat recipe directory into a ConfigMap, mounts it into the stable
75-
researcher image, sets `RECIPE`, `LOG_ROOT`, and `SPEC_HASH`, and runs
76-
`kubectl apply -k`.
79+
researcher image, sets `RECIPE`, `LOG_ROOT`, `RUN_NAME`, and `SPEC_HASH`, and
80+
runs `kubectl apply -k`.
7781

7882
Preview without applying:
7983

8084
```bash
8185
uv run --project .. python -m harness.cli run recipes/text_sql \
82-
name=text-sql-v1 \
86+
run_name=text-sql-v1 \
8387
apply=False
8488
```
8589

8690
Pass common recipe env directly:
8791

8892
```bash
8993
uv run --project .. python -m harness.cli run recipes/my_recipe \
90-
name=my-recipe-v1 \
94+
run_name=my-recipe-v1 \
9195
tinker_base_url=http://open-rl-gateway-service:8000 \
9296
base_model=google/gemma-4-e2b
9397
```
@@ -107,6 +111,14 @@ and calls the shared OpenRL/Tinker services.
107111

108112
## Cluster Run
109113

114+
These manifests require the official Agent Sandbox CRD. The researcher resource
115+
kind is `agents.x-k8s.io/v1alpha1/Sandbox`; there is no plain Kubernetes `Job`
116+
fallback in this demo. Verify the CRD before applying a recipe:
117+
118+
```bash
119+
kubectl api-resources | grep -i sandbox
120+
```
121+
110122
Create the API secret for agent-backed researcher pods:
111123

112124
```bash
@@ -156,7 +168,7 @@ agent starts only after those endpoints are reachable.
156168
```text
157169
harness/cli.py # creates/applies a generated overlay for a recipe dir
158170
harness/agent.py # prepares git, records baseline, launches Gemini
159-
harness/attempt.py # runs one measured attempt and writes attempt.json
171+
harness/attempt.py # runs one measured attempt and writes metadata.json
160172
harness/serve.py # read-only UI server over researcher/attempt manifests
161173
harness/utils.py # shared JSON, git, hashing, process helpers
162174
k8s/base/ # reusable Sandbox/UI resources
@@ -171,13 +183,14 @@ workspace at `RECIPE`'s parent and committed as the run baseline. That lets the
171183
image stay stable while recipe files come from shared storage.
172184

173185
`harness.attempt` runs recipe code and writes artifacts. The UI reads
174-
`LOG_ROOT/researchers/*/researcher.json`,
175-
`LOG_ROOT/researchers/*/attempts/*/attempt.json`, and fixed artifact filenames
176-
next to those manifests. Clearing `LOG_ROOT` resets the visible run.
177-
178-
The launcher records the unmodified default config as `000-baseline`, then
179-
passes the recipe-adjacent `program.md` to Gemini as the prompt. That program
180-
tells the agent to edit only the declared target, commit the attempt, run
186+
`LOG_ROOT/RUN_NAME/researchers/*/metadata.json`,
187+
`LOG_ROOT/RUN_NAME/researchers/*/attempts/*/metadata.json`, and fixed artifact
188+
filenames next to those manifests. Clearing `LOG_ROOT/RUN_NAME` resets the
189+
visible run.
190+
191+
The launcher records the unmodified default config as attempt `000`, then passes
192+
the recipe-adjacent `program.md` to Gemini as the prompt. That program tells the
193+
agent to edit only the declared target, commit the attempt, run
181194
`eval "${RUN_ATTEMPT_COMMAND}"`, record the metric, and reset if the metric did
182195
not improve.
183196

@@ -189,7 +202,7 @@ Copy one existing recipe directory and update:
189202
- `autoresearch.toml`
190203
- the command target, if you keep one
191204
- the editable target
192-
- `kustomization.yaml` settings: `RECIPE`, `LOG_ROOT`, and
205+
- `kustomization.yaml` settings: `RECIPE`, `LOG_ROOT`, `RUN_NAME`, and
193206
`ATTEMPT_TIMEOUT_MINUTES`
194207
- optionally `RECIPE_DIR`, if Kubernetes should use a recipe uploaded to shared
195208
storage instead of the recipe already in the image
@@ -222,7 +235,8 @@ To also clear shared run data:
222235

223236
```bash
224237
DELETE_ARTIFACTS=1 \
225-
LOG_ROOT=/mnt/shared/open-rl/autoresearch/text_sql \
238+
LOG_ROOT=/mnt/shared/open-rl/autoresearch \
239+
RUN_NAME=text-sql \
226240
OVERLAY=examples/autoresearch/recipes/text_sql \
227241
examples/autoresearch/cleanup_research_session.sh
228242
```

examples/autoresearch/cleanup_research_session.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,14 @@ OVERLAY="${OVERLAY:-examples/autoresearch/recipes/text_sql}"
55
NAMESPACE="${NAMESPACE:-default}"
66
DELETE_ARTIFACTS="${DELETE_ARTIFACTS:-0}"
77
LOG_ROOT="${LOG_ROOT:-}"
8+
RUN_NAME="${RUN_NAME:-}"
89

910
kubectl -n "${NAMESPACE}" delete -k "${OVERLAY}" --ignore-not-found=true
1011

1112
if [ "${DELETE_ARTIFACTS}" = "1" ]; then
12-
if [ -z "${LOG_ROOT}" ]; then
13-
echo "DELETE_ARTIFACTS=1 requires LOG_ROOT" >&2
13+
if [ -z "${LOG_ROOT}" ] || [ -z "${RUN_NAME}" ]; then
14+
echo "DELETE_ARTIFACTS=1 requires LOG_ROOT and RUN_NAME" >&2
1415
exit 2
1516
fi
16-
rm -rf "${LOG_ROOT}"
17+
rm -rf "${LOG_ROOT%/}/${RUN_NAME}"
1718
fi

0 commit comments

Comments
 (0)