You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -251,7 +251,9 @@ Runtime storage, adapter configuration, and worker retention are described in [`
251
251
252
252
Future learning-runtime boundaries are designed in [`docs/learning-runtime-boundaries.md`](docs/learning-runtime-boundaries.md). That document connects existing `WorkflowState` and RunContract projection responsibilities to a future `RunState` execution envelope while separating completion authority, runtime readiness authority, and advisory evaluation/learning signals.
253
253
254
-
The domain `PolicySelector` in `src/domain/policy-selector.ts` is an advisory pre-dispatch primitive. It generates a fixed initial policy set, simulates objective-weighted candidate outcomes, records a selected policy plus rejected alternatives, and emits prediction ids that reward-ledger entries can later calibrate. It does not launch agents, mutate workflow state, or hard-block execution from simulated score alone.
254
+
The domain `PolicySelector` in `src/domain/policy-selector.ts` is an advisory pre-dispatch primitive. It generates a fixed initial policy set across conservative, balanced, aggressive, high-assurance, and learning-exploration strategies; simulates objective-weighted candidate outcomes from task complexity, expected module touch count, dependency depth, adapter mix, isolation mode, verification depth, historical success, and recent reward calibration; records estimator outputs for conflict risk, regression risk, repair likelihood, elapsed/tool cost, review burden, learning value, confidence, and utility; and emits prediction ids that reward-ledger entries can later calibrate. Human overrides are explicit (`selector: "human"` plus reason) and remain bounded by exploration/conflict/regression safety caps. The selector does not launch agents, mutate workflow state, or hard-block execution from simulated score alone.
255
+
256
+
The objective/reward domain in `src/domain/objective.ts` converts evidence-backed `EvaluationResult` records into append-only `reward-ledger.v1` events. Reward records include outcome status, prediction-vs-actual delta, penalty taxonomy, anti-Goodhart checks tied to [`docs/runcontract-harness-evaluator.md`](docs/runcontract-harness-evaluator.md), advisory `PolicyHint` values, and calibration metadata. Reward data may inform future `PolicySelection` records, but it must not silently mutate objective weights, selected policy, worker count, adapter choices, or completion authority; human-approved objective calibration must be recorded explicitly.
if(!Number.isFinite(input.actualOutcome.elapsedSeconds)||input.actualOutcome.elapsedSeconds<0)thrownewError("elapsedSeconds must be non-negative");
102
108
if(!Number.isInteger(input.actualOutcome.repairCount)||input.actualOutcome.repairCount<0)thrownewError("repairCount must be a non-negative integer");
103
-
for(constpenaltyofinput.penalties??[])if(!Number.isFinite(penalty.amount)||penalty.amount<0)thrownewError(`penalty ${penalty.id} amount must be non-negative`);
109
+
for(constpenaltyofinput.penalties??[]){
110
+
if(!penalty.id.trim())thrownewError("penalty id is required");
111
+
if(!Number.isFinite(penalty.amount)||penalty.amount<0)thrownewError(`penalty ${penalty.id} amount must be non-negative`);
0 commit comments