@@ -41,6 +41,10 @@ The command can be any runnable benchmark or training loop. It just needs to:
4141- exit nonzero on failure
4242- log the configured metric to ` run_dir/metrics.jsonl `
4343
44+ Useful command placeholders are ` {run_dir} ` for the attempt artifact directory,
45+ ` {attempt_name} ` for numeric attempt ids like ` 001 ` , and ` {run_root} ` for the
46+ assembled ` LOG_ROOT/RUN_NAME ` directory.
47+
4448``` python
4549ml_logger.log_metrics({" accuracy" : 0.73 }, step = 1 )
4650```
@@ -67,27 +71,27 @@ recipe-specific settings.
6771The easiest Kubernetes path is the small CLI. From ` examples/autoresearch ` :
6872
6973``` bash
70- uv run --project .. python -m harness.cli run recipes/text_sql name =text-sql-v1
74+ uv run --project .. python -m harness.cli run recipes/text_sql run_name =text-sql-v1
7175```
7276
7377That command creates an ignored generated overlay under ` .runs/text-sql-v1 ` ,
7478copies the flat recipe directory into a ConfigMap, mounts it into the stable
75- researcher image, sets ` RECIPE ` , ` LOG_ROOT ` , and ` SPEC_HASH ` , and runs
76- ` kubectl apply -k ` .
79+ researcher image, sets ` RECIPE ` , ` LOG_ROOT ` , ` RUN_NAME ` , and ` SPEC_HASH ` , and
80+ runs ` kubectl apply -k ` .
7781
7882Preview without applying:
7983
8084``` bash
8185uv run --project .. python -m harness.cli run recipes/text_sql \
82- name =text-sql-v1 \
86+ run_name =text-sql-v1 \
8387 apply=False
8488```
8589
8690Pass common recipe env directly:
8791
8892``` bash
8993uv run --project .. python -m harness.cli run recipes/my_recipe \
90- name =my-recipe-v1 \
94+ run_name =my-recipe-v1 \
9195 tinker_base_url=http://open-rl-gateway-service:8000 \
9296 base_model=google/gemma-4-e2b
9397```
@@ -107,6 +111,14 @@ and calls the shared OpenRL/Tinker services.
107111
108112## Cluster Run
109113
114+ These manifests require the official Agent Sandbox CRD. The researcher resource
115+ kind is ` agents.x-k8s.io/v1alpha1/Sandbox ` ; there is no plain Kubernetes ` Job `
116+ fallback in this demo. Verify the CRD before applying a recipe:
117+
118+ ``` bash
119+ kubectl api-resources | grep -i sandbox
120+ ```
121+
110122Create the API secret for agent-backed researcher pods:
111123
112124``` bash
@@ -156,7 +168,7 @@ agent starts only after those endpoints are reachable.
156168``` text
157169harness/cli.py # creates/applies a generated overlay for a recipe dir
158170harness/agent.py # prepares git, records baseline, launches Gemini
159- harness/attempt.py # runs one measured attempt and writes attempt .json
171+ harness/attempt.py # runs one measured attempt and writes metadata .json
160172harness/serve.py # read-only UI server over researcher/attempt manifests
161173harness/utils.py # shared JSON, git, hashing, process helpers
162174k8s/base/ # reusable Sandbox/UI resources
@@ -171,13 +183,14 @@ workspace at `RECIPE`'s parent and committed as the run baseline. That lets the
171183image stay stable while recipe files come from shared storage.
172184
173185` harness.attempt ` runs recipe code and writes artifacts. The UI reads
174- ` LOG_ROOT/researchers/*/researcher.json ` ,
175- ` LOG_ROOT/researchers/*/attempts/*/attempt.json ` , and fixed artifact filenames
176- next to those manifests. Clearing ` LOG_ROOT ` resets the visible run.
177-
178- The launcher records the unmodified default config as ` 000-baseline ` , then
179- passes the recipe-adjacent ` program.md ` to Gemini as the prompt. That program
180- tells the agent to edit only the declared target, commit the attempt, run
186+ ` LOG_ROOT/RUN_NAME/researchers/*/metadata.json ` ,
187+ ` LOG_ROOT/RUN_NAME/researchers/*/attempts/*/metadata.json ` , and fixed artifact
188+ filenames next to those manifests. Clearing ` LOG_ROOT/RUN_NAME ` resets the
189+ visible run.
190+
191+ The launcher records the unmodified default config as attempt ` 000 ` , then passes
192+ the recipe-adjacent ` program.md ` to Gemini as the prompt. That program tells the
193+ agent to edit only the declared target, commit the attempt, run
181194` eval "${RUN_ATTEMPT_COMMAND}" ` , record the metric, and reset if the metric did
182195not improve.
183196
@@ -189,7 +202,7 @@ Copy one existing recipe directory and update:
189202- ` autoresearch.toml `
190203- the command target, if you keep one
191204- the editable target
192- - ` kustomization.yaml ` settings: ` RECIPE ` , ` LOG_ROOT ` , and
205+ - ` kustomization.yaml ` settings: ` RECIPE ` , ` LOG_ROOT ` , ` RUN_NAME ` , and
193206 ` ATTEMPT_TIMEOUT_MINUTES `
194207- optionally ` RECIPE_DIR ` , if Kubernetes should use a recipe uploaded to shared
195208 storage instead of the recipe already in the image
@@ -222,7 +235,8 @@ To also clear shared run data:
222235
223236``` bash
224237DELETE_ARTIFACTS=1 \
225- LOG_ROOT=/mnt/shared/open-rl/autoresearch/text_sql \
238+ LOG_ROOT=/mnt/shared/open-rl/autoresearch \
239+ RUN_NAME=text-sql \
226240OVERLAY=examples/autoresearch/recipes/text_sql \
227241 examples/autoresearch/cleanup_research_session.sh
228242```
0 commit comments