refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405
Open
Deftera186 wants to merge 5 commits into
Open
refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405Deftera186 wants to merge 5 commits into
Deftera186 wants to merge 5 commits into
Conversation
…+snapshot+model-core tests) Bumps the canonical Anthropic Opus model from 4-7 to 4-8 across the Sisyphus / Oracle / Prometheus / Metis / Momus agent fallback chains, the unspecified-high / visual-engineering / ultrabrain / deep / artistry category fallback chains, the Claude Code alias map, the think-mode HIGH_VARIANT_MAP, the event-model-fallback default, the delegate-task anthropic-categories fallback, and the CLI / TUI prompt hints. Patches the bundled model-capabilities snapshot to mirror the seven existing claude-opus-4-7 entries (anthropic/claude-opus-4-8, claude-opus-4-8, four bedrock regional variants, the vertex @default form). The patch reuses the 4-7 capability shape verbatim (reasoning/temperature/toolCall flags, modalities, 1M context, 128k output) which matches the live entry on models.dev. A full snapshot regen was attempted first but pulled in an unrelated upstream schema migration affecting ~2500 unrelated models; the manual 7-entry patch keeps this commit focused on the bump. Updates model-capability-aliases: - Adds claude-opus-4.8 to the dotted-version exact alias list (claude-opus-4.7 entry preserved for legacy compat). - Widens the legacy thinking pattern alias to match both claude-opus-4-7-thinking and claude-opus-4-8-thinking, canonicalizing both to claude-opus-4-8 (mirrors PR code-yeongyu#3486 commit 5478bab review-fix pattern). Adds claude-opus-4-8 to HIGH_VARIANT_MAP alongside 4-7 so /think can escalate the new canonical to its high-reasoning variant. The 4-7 entry is intentionally kept so legacy pinned configs still escalate. Updates 9 model-core test files to assert the new 4-8 default while leaving 4-7-specific contract tests intact. Adds a claude-opus-4-8 mock entry to model-capabilities.test.ts so the legacy-thinking-alias canonicalization path still resolves through the bundled snapshot. Refs code-yeongyu#5084. Path B (forked 4.8 prompt) chosen by maintainer in 97bcdb4; this commit completes the canonical-default flip.
…oss test expectations Bulk-updates 90 test files where 'claude-opus-4-7' represents the canonical default Sisyphus / Opus model. Carve-outs preserved for files that legitimately test 4-7-specific contracts: - model-family-detectors (tests isClaudeOpus47Model returns true for 4-7) - model-format-normalizer / model-error-classifier / model-resolver (legacy 4-7 strings) - migration paths (4-4->4-7, 4-7 entries kept for legacy compat per PR code-yeongyu#3486 commit 5478bab review-fix pattern) - prompts-core/variant-resolver and the dedicated 4-7 Sisyphus prompt - the bundled snapshot (4-7 entries kept alongside the new 4-8 mirrors) Known remaining failures requiring nuanced fixes (NOT addressed in this WIP commit, follow-up needed): - provider-model-id-transform tests where sed turned a 4-7->4.7 dotted-transform test into a 4-8->4.7 mismatch; source transformer may also need a 4-8->4.8 mapping added. - think-mode/switcher tests with dotted-form 4.7 input still expecting 4-7-high (correct legacy contract) but sed bumped expected to 4-8-high. - atlas/prompt-routing test asserting 4-7 routes to opus-4-7.md (still correct) but sed bumped the input to 4-8. - fallback chain ordering tests (runtime-fallback, model-fallback hook) that assert a specific model order in the chain. Refs code-yeongyu#5084.
…RSION_MAP - MODEL_VERSION_MAP: bump 4-4 chain target to 4-8, add 4-7 -> 4-8 hop so legacy configs land on the new canonical default in one migration pass. - agent-category MODEL_TO_CATEGORY_MAP: register 4-8 -> unspecified-high alongside the existing 4-7 entry (kept per code-yeongyu#3486 review pattern for legacy compat). - migrations-sidecar: update example breadcrumb to reflect the new hop. Refs code-yeongyu#5084.
…facing docs Updates README (en, ko, ja, zh-cn, ru), docs/guide/*, docs/reference/*, docs/examples/*.jsonc, agent AGENTS.md files, and the hyperplan + debugging skill references to reflect claude-opus-4-8 as the canonical default. In docs/guide/agent-model-matching.md, the 'Current top tier vs the auto-resolution chain' explainer is rewritten: now that the bundled capability snapshot ships 4-8 entries, the disclaimer about Opus 4.7 being the snapshot-backed floor no longer applies. The Kimi K2.7 disclaimer is preserved (K2.7 is still pending in the catalog). The Claude Family priority table keeps claude-opus-4-7 as a legacy fallback entry alongside the new claude-opus-4-8 (max) primary, mirroring the precedent from PR code-yeongyu#3486 commit 5478bab review-fix pattern. Carve-outs (intentionally kept): - packages/prompts-core/prompts/atlas/opus-4-7.md (dedicated 4-7 Atlas prompt file, still selected by getAtlasPromptSource) - dist/ build artifacts (regenerated at build time) Refs code-yeongyu#5084.
7 tasks
Surgical fixes for the 20 test failures introduced by the bulk-sed in commit efcfacb and the source flips in fd36857. All resolved tests fall into clear semantic categories: 1. **Migration test fixtures** (~16 tests in packages/utils/src/migration/* and packages/omo-opencode/src/shared/migration*): Bumped MIGRATION_KEY and assertion strings from 'anthropic/claude-opus-4-7' to 'anthropic/claude-opus-4-8' to match the new MODEL_VERSION_MAP chain (4-4 -> 4-8, 4-7 -> 4-8). Historical sidecar entries (4-5 -> 4-7 and 4-6 -> 4-7) preserved as legacy breadcrumbs - they're test data for sidecar storage semantics, not live migration map references. The orphan-lsp fixture model bumped to 4-8 (already-canonical, no migration triggered, no rewrite expected). 2. **Provider-model-id-transform tests** (model-core + omo-opencode mirror): Bumped expected dotted-form output from 'claude-opus-4.7' to 'claude-opus-4.8' since the input is now '-4-8' and the source transformer correctly produces '-4.8' for github-copilot/vercel providers. 3. **Atlas prompt routing**: Extended variant-resolver to use isClaudeOpus47OrLaterModel (was isClaudeOpus47Model) so the dedicated opus-4-7.md prompt is now reused for 4-8 as well, matching the Path A strategy chosen in the PR comment on code-yeongyu#5084. Atlas test now passes asserting 4-8 routes to opus-4-7 prompt. 4. **think-mode/switcher dotted-form tests**: Bumped dotted-form input from 'claude-opus-4.7' to 'claude-opus-4.8' to match the new canonical default. The dotted-normalization contract is unchanged; the test just uses the newer model as its example. 5. **fuzzyMatchModel github-copilot dotted-form test**: Bumped available-set entry from 'github-copilot/claude-opus-4.7' to 'github-copilot/claude-opus-4.8' so the hyphen-vs-dot matching contract is tested against the new canonical default version. 6. **model-fallback / generate-omo-config tests**: Updated expected hardcoded chain output for github-copilot to include 'github-copilot/claude-opus-4.8' (the actual dotted form produced by the github-copilot provider transformer after the bump) and momus fallback chain expected dotted version bumped to 4.8. 7. **runtime-fallback/index.test.ts equivalence-pair chain entries** (7 chain entries + 1 failing-model line): Reverted the bulk-sed bump back to 'anthropic/claude-opus-4-7' for the specific chain slots that are paired with 'github-copilot/claude-opus-4.7' as equivalence-skip test setups. These tests assert that areRuntimeFallbackModelsEquivalent() correctly skips a chain candidate when its canonical modelID matches the current model - they need both sides of the pair at the same version. Bumping only one half broke the equivalence-skip semantic. The L2805 failing-model line in 'preserve resolved agent during auto-retry' reverted for the same reason (chain[0] is still 4.7-copilot). After these fixes the test suite shows 0 new failures introduced by this PR. The 44 remaining failures are all pre-existing on upstream/dev tip b949c34 and orthogonal to a model defaults bump: codex CLI bundling, runtime wrappers, plugin metadata entrypoints, Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot, package metadata, ast-grep manifest pins, etc. Refs code-yeongyu#5084.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes
claude-opus-4-8to canonical Anthropic Opus default across the codebase, satisfying the model-catalog gate documented indocs/guide/agent-model-matching.md("promote Opus 4.8 to chain defaults once those land in the model catalog").Closes #5084.
Approach
Mirrors the structure of the merged precedent #3486 (4-6 → 4-7), authored and reviewed by @code-yeongyu, with four logical commits:
category-model-requirements,agent-model-requirements,claude-model-mapper,model-capability-aliases(addedclaude-opus-4.8exact alias and widened legacy thinking-pattern to canonicalize 4-7-thinking → 4-8),event-model-fallback,cli-program,tui-install-prompts,delegate-task/anthropic-categories,model-availability, andthink-mode/switcherHIGH_VARIANT_MAP(added 4-8 entry, kept 4-7).MODEL_VERSION_MAPchains4-4 → 4-8and adds4-7 → 4-8;MODEL_TO_CATEGORY_MAPregisters 4-8 alongside 4-7 (kept per refactor(models): bump claude-opus-4-6 to claude-opus-4-7 across fallback chains, categories, and docs #3486 commit5478babreview pattern);migrations-sidecardoc breadcrumb updated.docs/guide/*,docs/reference/*,docs/examples/*.jsonc, agent AGENTS.md files, hyperplan + debugging skill references. The "Opus 4.7 still appears as default" disclaimer inagent-model-matching.mdis rewritten now that the snapshot ships 4-8 entries (K2.7 disclaimer preserved). Priority table keeps 4-7 as legacy fallback alongside 4-8 (max).prompts-core/variant-resolverto useisClaudeOpus47OrLaterModelfor theopus-4-7variant so 4-8 reuses the tuned opus-4-7 Atlas prompt (Path A consistency with the Sisyphus prompt strategy).Snapshot Strategy: Manual Patch (Path B)
bun run build:model-capabilitiesagainst current models.dev produces an unrelated 40k-line schema migration (~2,500 models) — out of scope for a defaults flip. Instead, this PR ships a surgical snapshot patch: 7 mirrored 4-8 entries (anthropic/, bare, 4 bedrock regional, vertex@default) added alongside existing 4-7 entries (+147 lines clean). This preserves the 4-7 legacy fallback contract while making 4-8 a first-class chain target.Backward Compat (per #3486 review pattern)
Legacy 4-7 references intentionally preserved:
claude-opus-4.7exact dotted aliasHIGH_VARIANT_MAP4-7 entryMODEL_TO_CATEGORY_MAP4-7 entryprompts-core/prompts/atlas/opus-4-7.md(dedicated Atlas 4-7 prompt, now reused for 4-8 via Path A extension)agents/sisyphus/claude-opus-4-7.ts(dedicated 4-7 Sisyphus prompt, reused for 4-8 via existingisClaudeOpus47OrLaterModelcheck in the factory)Test Plan
bun testshows 0 new failures introduced by this PR.The 44 remaining baseline failures are all pre-existing on
upstream/devtipb949c3478and orthogonal to a model defaults bump: codex CLI bundling (lazycodex executor verify CLI, start-work continuation CLI, codex ultrawork directive source), runtime wrappers (bun absent + node fallback), plugin metadata entrypoints, Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot, package metadata, ast-grep manifest pins, createRuntimeTmuxConfig, CodeGraph SessionStart hook, etc.Verified pre-existing nature by re-running individual failing test files against the upstream/dev baseline (e.g.
auto-retry-dispatch.test.tsfails on baseline too with identical "Expected: 1, Received: 0" assertion).Migration smoke (verified by passing tests): configs containing
anthropic/claude-opus-4-7resolve viaMODEL_VERSION_MAPtoanthropic/claude-opus-4-8on next load; both versions resolve through the bundled snapshot for variant + context-window correctness.References
docs/guide/agent-model-matching.mdL184-189 (rewritten in this PR)