Skip to content

Resolve orchestration default models to latest-within-class via config table#561

Merged
lukstafi merged 5 commits into
mainfrom
ludics/task-c48b7beb-s1/root
Jun 6, 2026
Merged

Resolve orchestration default models to latest-within-class via config table#561
lukstafi merged 5 commits into
mainfrom
ludics/task-c48b7beb-s1/root

Conversation

@lukstafi

@lukstafi lukstafi commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Summary

Orchestration sessions that start with no explicit model previously inherited hardcoded minor-version pins in src/adapters/t3code.ts (DEFAULT_MODEL = "gpt-5.4", CLAUDE_OPUS_MODEL = "claude-opus-4-6", fallback tokens coder:codex:gpt-5.4 / reviewer:codex:gpt-5.4). These went stale as new minors shipped. This implements approach B from the proposal: a single per-class model_classes table in config.yaml under mag.orchestration is the source of truth, and the code resolves each unspecified default by reading it — so bumping a class is a config edit, not a code change.

Implements docs/proposals/resolve-default-models-latest-within-class.md (task-c48b7beb).

What changed

  • New src/orchestration/model-defaults.tsTRACKED_MODEL_CLASSES, resolveModelClass(table, cls) (throws loudly on missing/blank/non-string — no silent fallback), classForProvider.
  • src/adapters/t3code.ts — removed the three pinned constants + two fallback tokens; classModel/providerDefaultModel read orchCfg?.model_classes (the lint:config-reference adapter-read seam); resolveAgentModel's lowest tier resolves the table (below every task-1fbd4edf override); selectOrchestrationFlags sources the claude-code suffix from the table; models are pre-resolved before createWorktrees so a missing class throws before any side effect.
  • src/adapters/tmux-adapter.ts — shares the resolver via providerDefaultModel; same pre-resolve ordering.
  • templates/config.reference.yaml + templates/harness/config.yaml — documented model_classes table (codex→gpt-5.5, claude-opus→claude-opus-4-8, claude-sonnet→claude-sonnet-4-6) + a "how to bump" comment.
  • Single-thread carve-out (codex review follow-up, c8537d3) — the loud missing-table throw is reserved for orchestration resolution; the non-orchestration slot start path uses singleThreadCodexModel(), which degrades to the provider default (never throws, no hardcoded version) so minimal/legacy configs without the table still open.

Precedence (unchanged, no regression)

adapter-arg flag (--coder-model/--reviewer-model) > explicit token model > coder_model/reviewer_model config > model_classes table > provider default.

Tests

  • src/orchestration/model-defaults.test.ts — per-class loud-failure enumeration (missing/blank/whitespace/non-string).
  • src/adapters/t3code.test.ts — precedence ladder, runtime-path throw, table-sourced suffix, empty fallback tokens, single-thread non-throwing carve-out.
  • src/adapters/tmux-adapter.test.ts — shared resolver parity + loud failure.
  • scripts/lint-config-reference.test.tsmodel_classes covered (not inert) against the live repo.
  • Full suite: 2878 pass / 0 fail; all lints + typecheck + build green. Stale-version and constants-name greps clean in non-test src/.

Verification

orch status evidence for a no-explicit-model session resolving from the config table: coder claude-codeclaude-opus-4-8, reviewer codexgpt-5.5 (tiny → claude-sonnet-4-6).

Note: the live state-repo config.yaml (~/<state-repo>/harness/config.yaml) must carry the model_classes block for the running harness; the deployed config in this environment was updated accordingly.

🤖 Generated with Claude Code

@lukstafi

lukstafi commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

@codex review Focus on bugs, correctness issues, and edge cases. Do not check adherence to a spec or plan.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af8a2f3385

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/adapters/t3code.ts Outdated
lukstafi added a commit that referenced this pull request Jun 6, 2026
…-thread starts

Codex review (PR #561, P2): after routing defaults through the config table, a
non-orchestration `slot start` with no --model reached the codex-default
fallback in startSingleThread/ensureThread, which threw when
mag.orchestration.model_classes was absent — breaking plain t3code starts on
minimal/legacy configs that previously opened with the built-in codex default.

Reserve the loud missing-table throw for orchestration default resolution
(resolveAgentModel / selectOrchestrationFlags). Add singleThreadCodexModel():
returns the table's codex value when present, else "" (provider/CLI picks its
own default) — never throws, no hardcoded version string (AC2 preserved).

Regression test in t3code.test.ts: absent table → "" (no throw), present →
configured value, with a contrast assertion that orchestration classModel()
still throws on the same table-less config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lukstafi

lukstafi commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

note: branch state has drifted since this body was written (baseline: 4 commits at 2026-06-06T20:13:58Z, current: 5 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/561 to refresh.

@lukstafi

lukstafi commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

note: branch state has drifted since this body was written (baseline: 4 commits at 2026-06-06T20:14:11Z, current: 5 commits). consider gh pr edit https://github.com/lukstafi/ludics/pull/561 to refresh.

lukstafi and others added 5 commits June 6, 2026 22:33
…sses table

Replace the pinned minor-version constants in src/adapters/t3code.ts
(DEFAULT_MODEL=gpt-5.4, DEFAULT_CLAUDE_MODEL, CLAUDE_OPUS_MODEL=claude-opus-4-6)
and the two "coder:codex:gpt-5.4"/"reviewer:codex:gpt-5.4" fallback tokens with
a config-driven latest-within-class resolver. The per-class latest mapping is
the single source of truth under mag.orchestration.model_classes in config.yaml;
bumping a class is a config edit, not a code change.

- New src/orchestration/model-defaults.ts: TRACKED_MODEL_CLASSES,
  resolveModelClass (throws loudly on missing/blank/non-string class — AC3),
  classForProvider.
- t3code.ts: classModel/providerDefaultModel read orchCfg?.model_classes (the
  lint:config-reference Direction-4 adapter-read seam); resolveAgentModel's
  lowest tier resolves the class table (below all task-1fbd4edf overrides);
  selectOrchestrationFlags sources the claude-code suffix from the table;
  models are pre-resolved BEFORE createWorktrees so a missing class throws
  before any side effect.
- templates/config.reference.yaml + templates/harness/config.yaml: documented
  model_classes table + how-to-bump comment (AC2/AC6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e + tests

- tmux-adapter.ts: resolveAgentModel's lowest tier calls the shared
  providerDefaultModel (exported from t3code.ts); models pre-resolved before
  createWorktrees / tmux sessions so a missing class throws before side effects.
  Exported resolveAgentModel as a test seam.
- tmux-adapter.test.ts: bare-codex resolves to the table value, reviewer_model
  config still wins, missing class throws loudly.
- orchestration-defaults.test.ts: the two default_coder:null fakeConfigs now
  carry model_classes (claude-code resolution reads the table).
- lint-config-reference.test.ts: world-sanity guard that model_classes is
  covered (not inert) against the live repo.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
slotStart auto-fill drives selectOrchestrationFlagsForTask with the default
claude-code coder, which now resolves through the config table; the test
writeConfig() helper must carry mag.orchestration.model_classes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…clean

The bottom-tier comment named the removed DEFAULT_MODEL/DEFAULT_CLAUDE_MODEL
constants verbatim, so a constants-name grep still hit it. Reword to describe
them generically; no behaviour change.

Also (outside this repo) added mag.orchestration.model_classes to the live
state-repo config ~/self-improve/harness/config.yaml so new no-explicit-model
orchestration sessions resolve latest-within-class via the live loadConfigSync
path (AC1/AC2 deploy step).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-thread starts

Codex review (PR #561, P2): after routing defaults through the config table, a
non-orchestration `slot start` with no --model reached the codex-default
fallback in startSingleThread/ensureThread, which threw when
mag.orchestration.model_classes was absent — breaking plain t3code starts on
minimal/legacy configs that previously opened with the built-in codex default.

Reserve the loud missing-table throw for orchestration default resolution
(resolveAgentModel / selectOrchestrationFlags). Add singleThreadCodexModel():
returns the table's codex value when present, else "" (provider/CLI picks its
own default) — never throws, no hardcoded version string (AC2 preserved).

Regression test in t3code.test.ts: absent table → "" (no throw), present →
configured value, with a contrast assertion that orchestration classModel()
still throws on the same table-less config.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lukstafi lukstafi force-pushed the ludics/task-c48b7beb-s1/root branch from c8537d3 to 47a4509 Compare June 6, 2026 20:34
@lukstafi lukstafi merged commit 0024d59 into main Jun 6, 2026
1 check passed
@lukstafi lukstafi deleted the ludics/task-c48b7beb-s1/root branch June 6, 2026 20:34
@lukstafi

lukstafi commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

Suggest Refactor — task-c48b7beb (coder retrospective)

What I'd do differently next time on this change (and adjacent cleanups worth a follow-up).

Process / would-do-differently

  1. Grep a removed constant's consumers, not just the AC's symbols. The
    biggest miss this round: DEFAULT_MODEL fed both the orchestration defaults
    (the AC's target) and the non-orchestration single-thread t3code start. Routing
    everything through the throw-on-missing resolver regressed the single-thread
    path — caught by codex (P2), not by my plan. The plan's occurrence list
    enumerated the constant's definition sites but didn't classify each caller
    by whether the new loud-failure contract was appropriate. Next time: for every
    removed shared constant, list call sites and decide per-caller (orchestration →
    throw; non-orchestration → degrade), before writing code.

  2. Run the FULL suite before claiming green, every round. Targeted tests
    passed while src/slots/index.test.ts auto-fill tests failed — only the full
    bun test surfaced them. Under a pure-throw resolver, any test whose config
    lacks the new key flips red; the blast radius is wider than the files you edit.

  3. Don't over-claim AC coverage. Round 2 wrote "All 7 satisfied" while AC1's
    running-slot evidence was hand-wavy ("manual post-deploy"). The reviewer treats
    a hand-wavy verification line as blocking, which cost a full round. Either
    produce the evidence or explicitly mark the AC not-fully-verified with a reason.

  4. For "world-state" ACs, find the cheap faithful probe early. AC1 ("verified
    by a running slot's orch status") looked like it needed a live slot restart.
    The faithful-but-cheap path — drive the real resolution chain → persistState
    into a scratch LUDICS_HARNESS_DIR → run the real CLI — should have been the
    round-1 plan, not a round-3 discovery. Saved a live-slot restart and two rounds.

Code-shape refactors worth a follow-up (not done here — out of scope)

  1. resolveAgentModel is still duplicated in t3code.ts and
    tmux-adapter.ts (the override-precedence ladder, not just the class tier).
    I shared only the bottom tier (providerDefaultModel). The two copies can drift
    on the flag/token/config tiers. A single shared resolveAgentModel in
    src/orchestration/model-defaults.ts (taking provider/role/overrides/orchCfg)
    consumed by both adapters would remove the duplication — but it touches both
    adapters' hot paths and deserves its own task with focused tests.

  2. The single-thread vs orchestration model-default split is now implicit.
    singleThreadCodexModel (never throws) vs classModel/providerDefaultModel
    (throws) encode two contracts in adjacent helpers; a future reader could pick
    the wrong one. Worth a short doc note or a naming convention
    (*OrThrow / *OrProviderDefault) if more call sites appear.

  3. mag stays freeform (Record<string, unknown>). The reviewer's plan
    floated a typed MagOrchestrationConfig; I kept it freeform to avoid rippling
    through many freeform consumers, and the lint:config-reference adapter-read
    path covers the new key. A broader follow-up could type the whole
    mag.orchestration block once, but that's a cross-cutting change, not this task.

What worked (keep doing)

  • Pre-resolving models before createWorktrees so a missing-class throw
    precedes side effects.
  • Keeping the orchCfg?.model_classes literal read in an adapter file so
    lint:config-reference Direction-4 sees coverage (a shared-module move would
    have made the key report inert).
  • The config table as single source of truth with a loud test — no built-in
    fallback constant that would re-introduce the staleness this task removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant