You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Every compute/optimization knob in the runtime already exists and is individually switchable — but they're scattered across five surfaces with no bundle, no named tiers, and no single seam where a product (or a paying user) picks how hard the system runs. Two real user shapes, neither served today:
Cost-sensitive: single shot, no analyst, no corpus, no loops, fanout 1. Possible today only by knowing all five surfaces and zeroing each one by hand.
Missing entirely: a fleet-capacity knob ("use N concurrent sandboxes / the whole allotment"). concurrency exists per-benchmark; nothing expresses "max parallelism over the Tangle fleet" as a runtime policy.
Design: EffortPolicy as pure data
typeEffortPolicy={budget: Budget// the conserved pool — 'max' just hands it a big oneshape: {fanout: number;widen: boolean}depth: {innerTurns: number;strategy: 'depth'|'breadth'}analysts: {enabled: boolean;instruction?: string;model?: string}learning: {corpus: boolean;tags?: string[]}lifecycle: {driftCheckCadence?: string;refresh: boolean}// feeds #267's lifecycles configparallelism: {taskConcurrency: number;fleetSlots?: number|'max'}}
Consumed at the runPersonified / runAgentic / defineAgent seam — one place, then fanned to the existing surfaces. No new execution model; this is a config composer over knobs that all exist (except fleetSlots).
Tiers are named presets of this object, nothing more. Every field individually overridable on top of a tier. Off-by-default stays off-by-default at eco.
The conserved budget pool is what makes max safe: spend accounting and equal-compute hold by construction regardless of tier; max is just a bigger root reservation plus wider shape plus all optional loops enabled.
Defaults must be evidence-driven, not vibes
runBenchmark already emits the (score, $) Pareto frontier — that frontier IS the tier table. The ledger says compute reliably buys score (compute-alone +12.2pp on HumanEval; depth+analyst +16.4pp on EOPS) but some knobs are domain-conditional (within-run steering null on closed-form domains, positive on agentic ones). So:
Tier presets should be per-domain-shape (agentic/workspace vs closed-form/answer), seeded from existing gate results.
max is the one tier allowed to turn on knobs without positive evidence — the user opted into exhaustiveness; everything still lands on the Pareto report so the spend is auditable.
Orthogonal axes that compose: #267 = what the system maintains (skills/tools/MCPs lifecycle); this = how hard it runs (compute, loops, analysts, parallelism). #267's lifecycles cadence (driftCheck: '7d', refresh, dedupe) should read from the effort policy rather than carry its own dial — at eco the lifecycle idles, at max it churns continuously.
Phasing
EffortPolicy type + the composer that fans it onto Budget/ShapeBudget/AgenticOptions/Strategy/analyst wiring. Tiers as data, per-domain-shape defaults from the existing gate ledger.
fleetSlots — the executor-layer concurrency cap (sandbox fleet allotment), the one genuinely new knob.
Motivation
Every compute/optimization knob in the runtime already exists and is individually switchable — but they're scattered across five surfaces with no bundle, no named tiers, and no single seam where a product (or a paying user) picks how hard the system runs. Two real user shapes, neither served today:
The product promise is one dial (with every individual knob still overridable):
effort: 'eco' | 'standard' | 'thorough' | 'max'.Knob inventory (what exists, where, default)
maxIterations/maxTokens/maxUsd/deadlineMsBudget(conserved pool,supervise/types.ts)fanout(children per round)ShapeBudget(personify/persona.ts)innerTurns(depth per shot)AgenticOptions(strategy.ts)Strategy(strategy.ts)createScopeAnalyst/AgenticOptions.analystInstructionAgenticOptions.corpus+corpusTagsWidenGate(supervise/types.ts)BenchmarkConfig.concurrencymaxTokensAgenticOptionsdefineAgent.lifecycles(proposed)Missing entirely: a fleet-capacity knob ("use N concurrent sandboxes / the whole allotment").
concurrencyexists per-benchmark; nothing expresses "max parallelism over the Tangle fleet" as a runtime policy.Design:
EffortPolicyas pure datarunPersonified/runAgentic/defineAgentseam — one place, then fanned to the existing surfaces. No new execution model; this is a config composer over knobs that all exist (exceptfleetSlots).eco.maxsafe: spend accounting and equal-compute hold by construction regardless of tier;maxis just a bigger root reservation plus wider shape plus all optional loops enabled.Defaults must be evidence-driven, not vibes
runBenchmarkalready emits the (score, $) Pareto frontier — that frontier IS the tier table. The ledger says compute reliably buys score (compute-alone +12.2pp on HumanEval; depth+analyst +16.4pp on EOPS) but some knobs are domain-conditional (within-run steering null on closed-form domains, positive on agentic ones). So:maxis the one tier allowed to turn on knobs without positive evidence — the user opted into exhaustiveness; everything still lands on the Pareto report so the spend is auditable.Relationship to #267
Orthogonal axes that compose: #267 = what the system maintains (skills/tools/MCPs lifecycle); this = how hard it runs (compute, loops, analysts, parallelism). #267's
lifecyclescadence (driftCheck: '7d', refresh, dedupe) should read from the effort policy rather than carry its own dial — atecothe lifecycle idles, atmaxit churns continuously.Phasing
EffortPolicytype + the composer that fans it ontoBudget/ShapeBudget/AgenticOptions/Strategy/analyst wiring. Tiers as data, per-domain-shape defaults from the existing gate ledger.fleetSlots— the executor-layer concurrency cap (sandbox fleet allotment), the one genuinely new knob.lifecyclescadence to the policy when that lands.runBenchmarkreuse) so each tier ships with its measured score/$ point — the receipt that makes the dial honest.