Skip to content

feat(runtime): EffortPolicy — one effort dial over the existing knobs, eco→max tiers, fleet-parallelism cap #268

@drewstone

Description

@drewstone

Motivation

Every compute/optimization knob in the runtime already exists and is individually switchable — but they're scattered across five surfaces with no bundle, no named tiers, and no single seam where a product (or a paying user) picks how hard the system runs. Two real user shapes, neither served today:

The product promise is one dial (with every individual knob still overridable): effort: 'eco' | 'standard' | 'thorough' | 'max'.

Knob inventory (what exists, where, default)

Knob Surface Default
maxIterations / maxTokens / maxUsd / deadlineMs Budget (conserved pool, supervise/types.ts) caller-set at root
fanout (children per round) ShapeBudget (personify/persona.ts) 3
innerTurns (depth per shot) AgenticOptions (strategy.ts) 4
strategy (depth refine vs breadth sample) Strategy (strategy.ts) depth
analyst on/off + instruction + model createScopeAnalyst / AgenticOptions.analystInstruction on when wired
cross-run learning AgenticOptions.corpus + corpusTags off (undefined)
progressive widening WidenGate (supervise/types.ts) flat
task-level parallelism BenchmarkConfig.concurrency 3
temperature / per-shot maxTokens AgenticOptions 0.7 / provider
lifecycle maintenance (drift-watch, refresh, dedupe cadence) #267 defineAgent.lifecycles (proposed)

Missing entirely: a fleet-capacity knob ("use N concurrent sandboxes / the whole allotment"). concurrency exists per-benchmark; nothing expresses "max parallelism over the Tangle fleet" as a runtime policy.

Design: EffortPolicy as pure data

type EffortPolicy = {
  budget: Budget                      // the conserved pool — 'max' just hands it a big one
  shape: { fanout: number; widen: boolean }
  depth: { innerTurns: number; strategy: 'depth' | 'breadth' }
  analysts: { enabled: boolean; instruction?: string; model?: string }
  learning: { corpus: boolean; tags?: string[] }
  lifecycle: { driftCheckCadence?: string; refresh: boolean }   // feeds #267's lifecycles config
  parallelism: { taskConcurrency: number; fleetSlots?: number | 'max' }
}
  • Consumed at the runPersonified / runAgentic / defineAgent seam — one place, then fanned to the existing surfaces. No new execution model; this is a config composer over knobs that all exist (except fleetSlots).
  • Tiers are named presets of this object, nothing more. Every field individually overridable on top of a tier. Off-by-default stays off-by-default at eco.
  • The conserved budget pool is what makes max safe: spend accounting and equal-compute hold by construction regardless of tier; max is just a bigger root reservation plus wider shape plus all optional loops enabled.

Defaults must be evidence-driven, not vibes

runBenchmark already emits the (score, $) Pareto frontier — that frontier IS the tier table. The ledger says compute reliably buys score (compute-alone +12.2pp on HumanEval; depth+analyst +16.4pp on EOPS) but some knobs are domain-conditional (within-run steering null on closed-form domains, positive on agentic ones). So:

  • Tier presets should be per-domain-shape (agentic/workspace vs closed-form/answer), seeded from existing gate results.
  • max is the one tier allowed to turn on knobs without positive evidence — the user opted into exhaustiveness; everything still lands on the Pareto report so the spend is auditable.

Relationship to #267

Orthogonal axes that compose: #267 = what the system maintains (skills/tools/MCPs lifecycle); this = how hard it runs (compute, loops, analysts, parallelism). #267's lifecycles cadence (driftCheck: '7d', refresh, dedupe) should read from the effort policy rather than carry its own dial — at eco the lifecycle idles, at max it churns continuously.

Phasing

  1. EffortPolicy type + the composer that fans it onto Budget/ShapeBudget/AgenticOptions/Strategy/analyst wiring. Tiers as data, per-domain-shape defaults from the existing gate ledger.
  2. fleetSlots — the executor-layer concurrency cap (sandbox fleet allotment), the one genuinely new knob.
  3. Wire feat(lifecycle): profile-artifact lifecycle as a runtime primitive — generate/eval/promote/maintain skills, tools, MCPs, hooks, subagents for free #267's lifecycles cadence to the policy when that lands.
  4. Pareto-frontier report per tier (runBenchmark reuse) so each tier ships with its measured score/$ point — the receipt that makes the dial honest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions