feat(judge-ft): update program.md + config.yaml for 6-field policy

## Context

Child of #90. Blocks on #104 (prepare.py) so the policy guidance can reference the right metrics.

`program.md` is the task guidance the autoresearch-rl LLM policy reads to decide what hyperparameters to try and what code diffs to propose. `config.yaml` is the autoresearch-rl experiment config.

## Scope

### 1. `program.md` rewrite

Update `autoresearch-rl/examples/security-judge/program.md` to:

- State the new objective: maximise `eval_score` (composite of 7 components; weights in `eval_protocol.json`).
- List every component with its weight, so the LLM knows how to trade them off.
- Provide **code-diff guidance** per component — examples:
  - JSON compliance tanking → tighten response_format prompt, lower temperature, add format-penalty reward
  - is_threat accuracy lagging category accuracy → emphasise binary reward component
  - Reasoning length runaway → length-penalty past 400 chars

The existing program.md has a good skeleton; the substance needs rewriting for 6-field.

### 2. `config.yaml` updates

- `policy.params`:
  ```yaml
  learning_rate: [5e-5, 1e-4, 3e-4]
  max_steps: [50, 100, 150]       # 6-field needs more steps; was [30, 50, 80]
  num_generations: [3, 4]
  temperature: [0.6, 0.8]         # lower default; helps schema compliance
  lora_rank: [8, 16, 32]          # higher baseline; was [4, 8, 16]
  ```
- `target.basilica.setup_cmd`: ensure `openai` (or compatible HTTP client) is installed if the reasoning distillation script needs to be re-run on the runner.
- `objective.metric`: still `eval_score`, direction `max`. Unchanged.
- `stop.no_improve_streak`: raise to 20 (was 15) — 6-field is harder, needs more exploration.

## Acceptance

- [ ] `program.md` explicitly lists all 7 reward components with their weights.
- [ ] `config.yaml` param grid updated.
- [ ] `autoresearch-rl run config.yaml --dry-run` (or equivalent) reports the new search space correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(judge-ft): update program.md + config.yaml for 6-field policy #106

Context

Scope

1. `program.md` rewrite

2. `config.yaml` updates

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(judge-ft): update program.md + config.yaml for 6-field policy #106

Description

Context

Scope

1. program.md rewrite

2. config.yaml updates

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `program.md` rewrite

2. `config.yaml` updates