Skip to content

refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405

Open
Deftera186 wants to merge 5 commits into
code-yeongyu:devfrom
Deftera186:feat/promote-claude-opus-4-8-canonical-default
Open

refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405
Deftera186 wants to merge 5 commits into
code-yeongyu:devfrom
Deftera186:feat/promote-claude-opus-4-8-canonical-default

Conversation

@Deftera186

@Deftera186 Deftera186 commented Jun 18, 2026

Copy link
Copy Markdown

Summary

Promotes claude-opus-4-8 to canonical Anthropic Opus default across the codebase, satisfying the model-catalog gate documented in docs/guide/agent-model-matching.md ("promote Opus 4.8 to chain defaults once those land in the model catalog").

Closes #5084.

Approach

Mirrors the structure of the merged precedent #3486 (4-6 → 4-7), authored and reviewed by @code-yeongyu, with four logical commits:

  1. Source + snapshot + model-core tests — flips defaults in category-model-requirements, agent-model-requirements, claude-model-mapper, model-capability-aliases (added claude-opus-4.8 exact alias and widened legacy thinking-pattern to canonicalize 4-7-thinking → 4-8), event-model-fallback, cli-program, tui-install-prompts, delegate-task/anthropic-categories, model-availability, and think-mode/switcher HIGH_VARIANT_MAP (added 4-8 entry, kept 4-7).
  2. Bulk test bump — 90 test files updated where 4-7 represented the canonical default, with carve-outs preserved for files that legitimately test 4-7-specific contracts (model-family-detectors, format-normalizer, error-classifier, migration paths, the dedicated 4-7 Sisyphus prompt, the snapshot).
  3. MigrationMODEL_VERSION_MAP chains 4-4 → 4-8 and adds 4-7 → 4-8; MODEL_TO_CATEGORY_MAP registers 4-8 alongside 4-7 (kept per refactor(models): bump claude-opus-4-6 to claude-opus-4-7 across fallback chains, categories, and docs #3486 commit 5478bab review pattern); migrations-sidecar doc breadcrumb updated.
  4. Docs — READMEs (en/ko/ja/zh-cn/ru), docs/guide/*, docs/reference/*, docs/examples/*.jsonc, agent AGENTS.md files, hyperplan + debugging skill references. The "Opus 4.7 still appears as default" disclaimer in agent-model-matching.md is rewritten now that the snapshot ships 4-8 entries (K2.7 disclaimer preserved). Priority table keeps 4-7 as legacy fallback alongside 4-8 (max).
  5. Test fixes — Surgical fixes for the bulk-sed test failures across 7 semantic categories (migration fixtures, provider-model-id-transform expected dotted-form, Atlas variant-resolver extension to Path A reuse, think-mode dotted-form inputs, fuzzyMatchModel github-copilot available-set, model-fallback expected chains, runtime-fallback equivalence-pair revert). Also extended prompts-core/variant-resolver to use isClaudeOpus47OrLaterModel for the opus-4-7 variant so 4-8 reuses the tuned opus-4-7 Atlas prompt (Path A consistency with the Sisyphus prompt strategy).

Snapshot Strategy: Manual Patch (Path B)

bun run build:model-capabilities against current models.dev produces an unrelated 40k-line schema migration (~2,500 models) — out of scope for a defaults flip. Instead, this PR ships a surgical snapshot patch: 7 mirrored 4-8 entries (anthropic/, bare, 4 bedrock regional, vertex @default) added alongside existing 4-7 entries (+147 lines clean). This preserves the 4-7 legacy fallback contract while making 4-8 a first-class chain target.

Backward Compat (per #3486 review pattern)

Legacy 4-7 references intentionally preserved:

  • claude-opus-4.7 exact dotted alias
  • HIGH_VARIANT_MAP 4-7 entry
  • MODEL_TO_CATEGORY_MAP 4-7 entry
  • Snapshot 4-7 entries (still resolvable)
  • prompts-core/prompts/atlas/opus-4-7.md (dedicated Atlas 4-7 prompt, now reused for 4-8 via Path A extension)
  • agents/sisyphus/claude-opus-4-7.ts (dedicated 4-7 Sisyphus prompt, reused for 4-8 via existing isClaudeOpus47OrLaterModel check in the factory)

Test Plan

bun test shows 0 new failures introduced by this PR.

The 44 remaining baseline failures are all pre-existing on upstream/dev tip b949c3478 and orthogonal to a model defaults bump: codex CLI bundling (lazycodex executor verify CLI, start-work continuation CLI, codex ultrawork directive source), runtime wrappers (bun absent + node fallback), plugin metadata entrypoints, Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot, package metadata, ast-grep manifest pins, createRuntimeTmuxConfig, CodeGraph SessionStart hook, etc.

Verified pre-existing nature by re-running individual failing test files against the upstream/dev baseline (e.g. auto-retry-dispatch.test.ts fails on baseline too with identical "Expected: 1, Received: 0" assertion).

Migration smoke (verified by passing tests): configs containing anthropic/claude-opus-4-7 resolve via MODEL_VERSION_MAP to anthropic/claude-opus-4-8 on next load; both versions resolve through the bundled snapshot for variant + context-window correctness.

References

…+snapshot+model-core tests)

Bumps the canonical Anthropic Opus model from 4-7 to 4-8 across the
Sisyphus / Oracle / Prometheus / Metis / Momus agent fallback chains,
the unspecified-high / visual-engineering / ultrabrain / deep / artistry
category fallback chains, the Claude Code alias map, the think-mode
HIGH_VARIANT_MAP, the event-model-fallback default, the
delegate-task anthropic-categories fallback, and the CLI / TUI prompt
hints.

Patches the bundled model-capabilities snapshot to mirror the seven
existing claude-opus-4-7 entries (anthropic/claude-opus-4-8,
claude-opus-4-8, four bedrock regional variants, the vertex @default
form). The patch reuses the 4-7 capability shape verbatim
(reasoning/temperature/toolCall flags, modalities, 1M context, 128k
output) which matches the live entry on models.dev. A full snapshot
regen was attempted first but pulled in an unrelated upstream schema
migration affecting ~2500 unrelated models; the manual 7-entry patch
keeps this commit focused on the bump.

Updates model-capability-aliases:
- Adds claude-opus-4.8 to the dotted-version exact alias list
  (claude-opus-4.7 entry preserved for legacy compat).
- Widens the legacy thinking pattern alias to match both
  claude-opus-4-7-thinking and claude-opus-4-8-thinking, canonicalizing
  both to claude-opus-4-8 (mirrors PR code-yeongyu#3486 commit 5478bab review-fix
  pattern).

Adds claude-opus-4-8 to HIGH_VARIANT_MAP alongside 4-7 so /think can
escalate the new canonical to its high-reasoning variant. The 4-7 entry
is intentionally kept so legacy pinned configs still escalate.

Updates 9 model-core test files to assert the new 4-8 default while
leaving 4-7-specific contract tests intact. Adds a claude-opus-4-8
mock entry to model-capabilities.test.ts so the legacy-thinking-alias
canonicalization path still resolves through the bundled snapshot.

Refs code-yeongyu#5084. Path B (forked 4.8 prompt) chosen by maintainer in
97bcdb4; this commit completes the canonical-default flip.
…oss test expectations

Bulk-updates 90 test files where 'claude-opus-4-7' represents the
canonical default Sisyphus / Opus model. Carve-outs preserved for
files that legitimately test 4-7-specific contracts:
- model-family-detectors (tests isClaudeOpus47Model returns true for 4-7)
- model-format-normalizer / model-error-classifier / model-resolver
  (legacy 4-7 strings)
- migration paths (4-4->4-7, 4-7 entries kept for legacy compat per
  PR code-yeongyu#3486 commit 5478bab review-fix pattern)
- prompts-core/variant-resolver and the dedicated 4-7 Sisyphus prompt
- the bundled snapshot (4-7 entries kept alongside the new 4-8 mirrors)

Known remaining failures requiring nuanced fixes (NOT addressed in
this WIP commit, follow-up needed):
- provider-model-id-transform tests where sed turned a 4-7->4.7
  dotted-transform test into a 4-8->4.7 mismatch; source transformer
  may also need a 4-8->4.8 mapping added.
- think-mode/switcher tests with dotted-form 4.7 input still expecting
  4-7-high (correct legacy contract) but sed bumped expected to 4-8-high.
- atlas/prompt-routing test asserting 4-7 routes to opus-4-7.md (still
  correct) but sed bumped the input to 4-8.
- fallback chain ordering tests (runtime-fallback, model-fallback hook)
  that assert a specific model order in the chain.

Refs code-yeongyu#5084.
…RSION_MAP

- MODEL_VERSION_MAP: bump 4-4 chain target to 4-8, add 4-7 -> 4-8 hop so
  legacy configs land on the new canonical default in one migration pass.
- agent-category MODEL_TO_CATEGORY_MAP: register 4-8 -> unspecified-high
  alongside the existing 4-7 entry (kept per code-yeongyu#3486 review pattern for
  legacy compat).
- migrations-sidecar: update example breadcrumb to reflect the new hop.

Refs code-yeongyu#5084.
…facing docs

Updates README (en, ko, ja, zh-cn, ru), docs/guide/*, docs/reference/*,
docs/examples/*.jsonc, agent AGENTS.md files, and the hyperplan +
debugging skill references to reflect claude-opus-4-8 as the canonical
default.

In docs/guide/agent-model-matching.md, the 'Current top tier vs the
auto-resolution chain' explainer is rewritten: now that the bundled
capability snapshot ships 4-8 entries, the disclaimer about Opus 4.7
being the snapshot-backed floor no longer applies. The Kimi K2.7
disclaimer is preserved (K2.7 is still pending in the catalog).

The Claude Family priority table keeps claude-opus-4-7 as a legacy
fallback entry alongside the new claude-opus-4-8 (max) primary, mirroring
the precedent from PR code-yeongyu#3486 commit 5478bab review-fix pattern.

Carve-outs (intentionally kept):
- packages/prompts-core/prompts/atlas/opus-4-7.md (dedicated 4-7
  Atlas prompt file, still selected by getAtlasPromptSource)
- dist/ build artifacts (regenerated at build time)

Refs code-yeongyu#5084.
@github-actions github-actions Bot added claude-code-compat-core Changes under packages/claude-code-compat-core shared-skills Changes under packages/shared-skills utils Changes under packages/utils model-core Changes under packages/model-core skills-loader-core Changes under packages/skills-loader-core opencode OpenCode edition: packages/omo-opencode labels Jun 18, 2026
Surgical fixes for the 20 test failures introduced by the bulk-sed in
commit efcfacb and the source flips in fd36857. All resolved tests
fall into clear semantic categories:

1. **Migration test fixtures** (~16 tests in
   packages/utils/src/migration/* and
   packages/omo-opencode/src/shared/migration*):
   Bumped MIGRATION_KEY and assertion strings from
   'anthropic/claude-opus-4-7' to 'anthropic/claude-opus-4-8' to match
   the new MODEL_VERSION_MAP chain (4-4 -> 4-8, 4-7 -> 4-8). Historical
   sidecar entries (4-5 -> 4-7 and 4-6 -> 4-7) preserved as legacy
   breadcrumbs - they're test data for sidecar storage semantics, not
   live migration map references. The orphan-lsp fixture model bumped
   to 4-8 (already-canonical, no migration triggered, no rewrite
   expected).

2. **Provider-model-id-transform tests** (model-core + omo-opencode
   mirror): Bumped expected dotted-form output from 'claude-opus-4.7'
   to 'claude-opus-4.8' since the input is now '-4-8' and the source
   transformer correctly produces '-4.8' for github-copilot/vercel
   providers.

3. **Atlas prompt routing**: Extended variant-resolver to use
   isClaudeOpus47OrLaterModel (was isClaudeOpus47Model) so the
   dedicated opus-4-7.md prompt is now reused for 4-8 as well, matching
   the Path A strategy chosen in the PR comment on code-yeongyu#5084. Atlas test
   now passes asserting 4-8 routes to opus-4-7 prompt.

4. **think-mode/switcher dotted-form tests**: Bumped dotted-form input
   from 'claude-opus-4.7' to 'claude-opus-4.8' to match the new
   canonical default. The dotted-normalization contract is unchanged;
   the test just uses the newer model as its example.

5. **fuzzyMatchModel github-copilot dotted-form test**: Bumped
   available-set entry from 'github-copilot/claude-opus-4.7' to
   'github-copilot/claude-opus-4.8' so the hyphen-vs-dot matching
   contract is tested against the new canonical default version.

6. **model-fallback / generate-omo-config tests**: Updated expected
   hardcoded chain output for github-copilot to include
   'github-copilot/claude-opus-4.8' (the actual dotted form produced
   by the github-copilot provider transformer after the bump) and
   momus fallback chain expected dotted version bumped to 4.8.

7. **runtime-fallback/index.test.ts equivalence-pair chain entries**
   (7 chain entries + 1 failing-model line): Reverted the bulk-sed
   bump back to 'anthropic/claude-opus-4-7' for the specific chain
   slots that are paired with 'github-copilot/claude-opus-4.7' as
   equivalence-skip test setups. These tests assert that
   areRuntimeFallbackModelsEquivalent() correctly skips a chain
   candidate when its canonical modelID matches the current model -
   they need both sides of the pair at the same version. Bumping only
   one half broke the equivalence-skip semantic. The L2805
   failing-model line in 'preserve resolved agent during auto-retry'
   reverted for the same reason (chain[0] is still 4.7-copilot).

After these fixes the test suite shows 0 new failures introduced by
this PR. The 44 remaining failures are all pre-existing on
upstream/dev tip b949c34 and orthogonal to a model defaults bump:
codex CLI bundling, runtime wrappers, plugin metadata entrypoints,
Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot,
package metadata, ast-grep manifest pins, etc.

Refs code-yeongyu#5084.
@Deftera186 Deftera186 marked this pull request as ready for review June 18, 2026 20:02
@github-actions github-actions Bot added the prompts-core Changes under packages/prompts-core label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code-compat-core Changes under packages/claude-code-compat-core model-core Changes under packages/model-core opencode OpenCode edition: packages/omo-opencode prompts-core Changes under packages/prompts-core shared-skills Changes under packages/shared-skills skills-loader-core Changes under packages/skills-loader-core utils Changes under packages/utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Promote claude-opus-4-8 to canonical default for Sisyphus / Prometheus / high-tier categories

1 participant