Skip to content

feat(judge-ft): update program.md + config.yaml for 6-field policy #106

@epappas

Description

@epappas

Context

Child of #90. Blocks on #104 (prepare.py) so the policy guidance can reference the right metrics.

program.md is the task guidance the autoresearch-rl LLM policy reads to decide what hyperparameters to try and what code diffs to propose. config.yaml is the autoresearch-rl experiment config.

Scope

1. program.md rewrite

Update autoresearch-rl/examples/security-judge/program.md to:

  • State the new objective: maximise eval_score (composite of 7 components; weights in eval_protocol.json).
  • List every component with its weight, so the LLM knows how to trade them off.
  • Provide code-diff guidance per component — examples:
    • JSON compliance tanking → tighten response_format prompt, lower temperature, add format-penalty reward
    • is_threat accuracy lagging category accuracy → emphasise binary reward component
    • Reasoning length runaway → length-penalty past 400 chars

The existing program.md has a good skeleton; the substance needs rewriting for 6-field.

2. config.yaml updates

  • policy.params:
    learning_rate: [5e-5, 1e-4, 3e-4]
    max_steps: [50, 100, 150]       # 6-field needs more steps; was [30, 50, 80]
    num_generations: [3, 4]
    temperature: [0.6, 0.8]         # lower default; helps schema compliance
    lora_rank: [8, 16, 32]          # higher baseline; was [4, 8, 16]
  • target.basilica.setup_cmd: ensure openai (or compatible HTTP client) is installed if the reasoning distillation script needs to be re-run on the runner.
  • objective.metric: still eval_score, direction max. Unchanged.
  • stop.no_improve_streak: raise to 20 (was 15) — 6-field is harder, needs more exploration.

Acceptance

  • program.md explicitly lists all 7 reward components with their weights.
  • config.yaml param grid updated.
  • autoresearch-rl run config.yaml --dry-run (or equivalent) reports the new search space correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions