Context
Child of #90. Blocks on #104, #105, #106.
This issue owns the actual training execution — spending GPU time, iterating on autoresearch-rl's hybrid policy, getting the model to clear the acceptance gates. It's the most open-ended of the children because "we're done when the numbers say so".
Scope
1. Run setup
- Target: Basilica GPU cloud (A100 or H100; config already present in
examples/security-judge/config.yaml).
- Runtime budget: 12 hours of GPU time max per try before escalating to hyperparameter review.
- Experiment name:
security-judge-6field-v1 (increment per major retry).
2. Hybrid-mode progression
autoresearch-rl's hybrid policy does param search until no_improve_streak hits stall_threshold, then switches to code diffs. Expected progression:
- Iter 0–5: random param draws to seed the history.
- Iter 5–20: llm-guided param tuning.
- Iter 20+: if param mode stalls, LLM proposes diffs to
train.py reward weights and generation logic.
3. Monitoring
Check-in cadence:
- Live: autoresearch-rl dashboard (iteration history + metrics per run).
- Daily: summary post in team channel — best score so far, current params, what the LLM is trying.
- On each component regression > 10 % between iterations, flag in the issue thread.
4. Stop conditions
Stop training (claim success) when all of:
eval_score ≥ 0.80 on held-out val set.
json_compliance ≥ 0.98.
is_threat_acc ≥ 0.87 (within 5 points of DeBERTa-protectai-v2 on the same data).
brier_score ≤ 0.15 (confidence calibrated).
Or: stop and re-scope when any of:
- Three consecutive 12-hour GPU runs show no improvement in best score.
- Total GPU cost exceeds $200 (approx; check with finance).
- A fundamental reward-hacking pathology is identified (model emits valid JSON but clearly gaming one component).
5. Failure branch
If stop-and-rescope: decide between (a) change base model to Qwen2.5-1.5B or 3B, (b) change teacher (regenerate reasoning with Claude-Haiku), (c) add curriculum learning (train on is_threat only first, then add category, etc.). Each rescope is its own sub-issue.
Acceptance
Reference issue for rescope options
Open at time of first stop-and-rescope.
Context
Child of #90. Blocks on #104, #105, #106.
This issue owns the actual training execution — spending GPU time, iterating on autoresearch-rl's hybrid policy, getting the model to clear the acceptance gates. It's the most open-ended of the children because "we're done when the numbers say so".
Scope
1. Run setup
examples/security-judge/config.yaml).security-judge-6field-v1(increment per major retry).2. Hybrid-mode progression
autoresearch-rl's hybrid policy does param search until
no_improve_streakhitsstall_threshold, then switches to code diffs. Expected progression:train.pyreward weights and generation logic.3. Monitoring
Check-in cadence:
4. Stop conditions
Stop training (claim success) when all of:
eval_score ≥ 0.80on held-out val set.json_compliance ≥ 0.98.is_threat_acc ≥ 0.87(within 5 points of DeBERTa-protectai-v2 on the same data).brier_score ≤ 0.15(confidence calibrated).Or: stop and re-scope when any of:
5. Failure branch
If stop-and-rescope: decide between (a) change base model to Qwen2.5-1.5B or 3B, (b) change teacher (regenerate reasoning with Claude-Haiku), (c) add curriculum learning (train on is_threat only first, then add category, etc.). Each rescope is its own sub-issue.
Acceptance
autoresearch-rl/artifacts/security-judge-6field-v1/versions/best/with weights + training config + metrics.TRAINING_LOG.mdcommitted under the same directory.Reference issue for rescope options
Open at time of first stop-and-rescope.