refactor(models): promote claude-opus-4-8 to canonical default (closes #5084) by Deftera186 · Pull Request #5405 · code-yeongyu/oh-my-openagent

Deftera186 · 2026-06-18T19:37:54Z

Summary

Promotes claude-opus-4-8 to canonical Anthropic Opus default across the codebase, satisfying the model-catalog gate documented in docs/guide/agent-model-matching.md ("promote Opus 4.8 to chain defaults once those land in the model catalog").

Closes #5084.

Approach

Mirrors the structure of the merged precedent #3486 (4-6 → 4-7), authored and reviewed by @code-yeongyu, with four logical commits:

Source + snapshot + model-core tests — flips defaults in category-model-requirements, agent-model-requirements, claude-model-mapper, model-capability-aliases (added claude-opus-4.8 exact alias and widened legacy thinking-pattern to canonicalize 4-7-thinking → 4-8), event-model-fallback, cli-program, tui-install-prompts, delegate-task/anthropic-categories, model-availability, and think-mode/switcher HIGH_VARIANT_MAP (added 4-8 entry, kept 4-7).
Bulk test bump — 90 test files updated where 4-7 represented the canonical default, with carve-outs preserved for files that legitimately test 4-7-specific contracts (model-family-detectors, format-normalizer, error-classifier, migration paths, the dedicated 4-7 Sisyphus prompt, the snapshot).
Migration — MODEL_VERSION_MAP chains 4-4 → 4-8 and adds 4-7 → 4-8; MODEL_TO_CATEGORY_MAP registers 4-8 alongside 4-7 (kept per refactor(models): bump claude-opus-4-6 to claude-opus-4-7 across fallback chains, categories, and docs #3486 commit 5478bab review pattern); migrations-sidecar doc breadcrumb updated.
Docs — READMEs (en/ko/ja/zh-cn/ru), docs/guide/*, docs/reference/*, docs/examples/*.jsonc, agent AGENTS.md files, hyperplan + debugging skill references. The "Opus 4.7 still appears as default" disclaimer in agent-model-matching.md is rewritten now that the snapshot ships 4-8 entries (K2.7 disclaimer preserved). Priority table keeps 4-7 as legacy fallback alongside 4-8 (max).
Test fixes — Surgical fixes for the bulk-sed test failures across 7 semantic categories (migration fixtures, provider-model-id-transform expected dotted-form, Atlas variant-resolver extension to Path A reuse, think-mode dotted-form inputs, fuzzyMatchModel github-copilot available-set, model-fallback expected chains, runtime-fallback equivalence-pair revert). Also extended prompts-core/variant-resolver to use isClaudeOpus47OrLaterModel for the opus-4-7 variant so 4-8 reuses the tuned opus-4-7 Atlas prompt (Path A consistency with the Sisyphus prompt strategy).

Snapshot Strategy: Manual Patch (Path B)

bun run build:model-capabilities against current models.dev produces an unrelated 40k-line schema migration (~2,500 models) — out of scope for a defaults flip. Instead, this PR ships a surgical snapshot patch: 7 mirrored 4-8 entries (anthropic/, bare, 4 bedrock regional, vertex @default) added alongside existing 4-7 entries (+147 lines clean). This preserves the 4-7 legacy fallback contract while making 4-8 a first-class chain target.

Backward Compat (per #3486 review pattern)

Legacy 4-7 references intentionally preserved:

claude-opus-4.7 exact dotted alias
HIGH_VARIANT_MAP 4-7 entry
MODEL_TO_CATEGORY_MAP 4-7 entry
Snapshot 4-7 entries (still resolvable)
prompts-core/prompts/atlas/opus-4-7.md (dedicated Atlas 4-7 prompt, now reused for 4-8 via Path A extension)
agents/sisyphus/claude-opus-4-7.ts (dedicated 4-7 Sisyphus prompt, reused for 4-8 via existing isClaudeOpus47OrLaterModel check in the factory)

Test Plan

bun test shows 0 new failures introduced by this PR.

The 44 remaining baseline failures are all pre-existing on upstream/dev tip b949c3478 and orthogonal to a model defaults bump: codex CLI bundling (lazycodex executor verify CLI, start-work continuation CLI, codex ultrawork directive source), runtime wrappers (bun absent + node fallback), plugin metadata entrypoints, Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot, package metadata, ast-grep manifest pins, createRuntimeTmuxConfig, CodeGraph SessionStart hook, etc.

Verified pre-existing nature by re-running individual failing test files against the upstream/dev baseline (e.g. auto-retry-dispatch.test.ts fails on baseline too with identical "Expected: 1, Received: 0" assertion).

Migration smoke (verified by passing tests): configs containing anthropic/claude-opus-4-7 resolve via MODEL_VERSION_MAP to anthropic/claude-opus-4-8 on next load; both versions resolve through the bundled snapshot for variant + context-window correctness.

References

Issue: [Feature]: Promote claude-opus-4-8 to canonical default for Sisyphus / Prometheus / high-tier categories #5084
Precedent PR: refactor(models): bump claude-opus-4-6 to claude-opus-4-7 across fallback chains, categories, and docs #3486 (4-6 → 4-7, self-merged by maintainer)
Earlier 4-8 contributor PRs: fix(model-core): recognize claude-opus-4-8 as GA 1M-context model (fixes #5070) #5080, fix(model-core): add claude-fable-5, claude-mythos-5, and claude-opus-4-8 to hasGA1MContext #5209
Maintainer's gating doc: docs/guide/agent-model-matching.md L184-189 (rewritten in this PR)

@default

…+snapshot+model-core tests) Bumps the canonical Anthropic Opus model from 4-7 to 4-8 across the Sisyphus / Oracle / Prometheus / Metis / Momus agent fallback chains, the unspecified-high / visual-engineering / ultrabrain / deep / artistry category fallback chains, the Claude Code alias map, the think-mode HIGH_VARIANT_MAP, the event-model-fallback default, the delegate-task anthropic-categories fallback, and the CLI / TUI prompt hints. Patches the bundled model-capabilities snapshot to mirror the seven existing claude-opus-4-7 entries (anthropic/claude-opus-4-8, claude-opus-4-8, four bedrock regional variants, the vertex @default form). The patch reuses the 4-7 capability shape verbatim (reasoning/temperature/toolCall flags, modalities, 1M context, 128k output) which matches the live entry on models.dev. A full snapshot regen was attempted first but pulled in an unrelated upstream schema migration affecting ~2500 unrelated models; the manual 7-entry patch keeps this commit focused on the bump. Updates model-capability-aliases: - Adds claude-opus-4.8 to the dotted-version exact alias list (claude-opus-4.7 entry preserved for legacy compat). - Widens the legacy thinking pattern alias to match both claude-opus-4-7-thinking and claude-opus-4-8-thinking, canonicalizing both to claude-opus-4-8 (mirrors PR code-yeongyu#3486 commit 5478bab review-fix pattern). Adds claude-opus-4-8 to HIGH_VARIANT_MAP alongside 4-7 so /think can escalate the new canonical to its high-reasoning variant. The 4-7 entry is intentionally kept so legacy pinned configs still escalate. Updates 9 model-core test files to assert the new 4-8 default while leaving 4-7-specific contract tests intact. Adds a claude-opus-4-8 mock entry to model-capabilities.test.ts so the legacy-thinking-alias canonicalization path still resolves through the bundled snapshot. Refs code-yeongyu#5084. Path B (forked 4.8 prompt) chosen by maintainer in 97bcdb4; this commit completes the canonical-default flip.

…oss test expectations Bulk-updates 90 test files where 'claude-opus-4-7' represents the canonical default Sisyphus / Opus model. Carve-outs preserved for files that legitimately test 4-7-specific contracts: - model-family-detectors (tests isClaudeOpus47Model returns true for 4-7) - model-format-normalizer / model-error-classifier / model-resolver (legacy 4-7 strings) - migration paths (4-4->4-7, 4-7 entries kept for legacy compat per PR code-yeongyu#3486 commit 5478bab review-fix pattern) - prompts-core/variant-resolver and the dedicated 4-7 Sisyphus prompt - the bundled snapshot (4-7 entries kept alongside the new 4-8 mirrors) Known remaining failures requiring nuanced fixes (NOT addressed in this WIP commit, follow-up needed): - provider-model-id-transform tests where sed turned a 4-7->4.7 dotted-transform test into a 4-8->4.7 mismatch; source transformer may also need a 4-8->4.8 mapping added. - think-mode/switcher tests with dotted-form 4.7 input still expecting 4-7-high (correct legacy contract) but sed bumped expected to 4-8-high. - atlas/prompt-routing test asserting 4-7 routes to opus-4-7.md (still correct) but sed bumped the input to 4-8. - fallback chain ordering tests (runtime-fallback, model-fallback hook) that assert a specific model order in the chain. Refs code-yeongyu#5084.

…RSION_MAP - MODEL_VERSION_MAP: bump 4-4 chain target to 4-8, add 4-7 -> 4-8 hop so legacy configs land on the new canonical default in one migration pass. - agent-category MODEL_TO_CATEGORY_MAP: register 4-8 -> unspecified-high alongside the existing 4-7 entry (kept per code-yeongyu#3486 review pattern for legacy compat). - migrations-sidecar: update example breadcrumb to reflect the new hop. Refs code-yeongyu#5084.

…facing docs Updates README (en, ko, ja, zh-cn, ru), docs/guide/*, docs/reference/*, docs/examples/*.jsonc, agent AGENTS.md files, and the hyperplan + debugging skill references to reflect claude-opus-4-8 as the canonical default. In docs/guide/agent-model-matching.md, the 'Current top tier vs the auto-resolution chain' explainer is rewritten: now that the bundled capability snapshot ships 4-8 entries, the disclaimer about Opus 4.7 being the snapshot-backed floor no longer applies. The Kimi K2.7 disclaimer is preserved (K2.7 is still pending in the catalog). The Claude Family priority table keeps claude-opus-4-7 as a legacy fallback entry alongside the new claude-opus-4-8 (max) primary, mirroring the precedent from PR code-yeongyu#3486 commit 5478bab review-fix pattern. Carve-outs (intentionally kept): - packages/prompts-core/prompts/atlas/opus-4-7.md (dedicated 4-7 Atlas prompt file, still selected by getAtlasPromptSource) - dist/ build artifacts (regenerated at build time) Refs code-yeongyu#5084.

Surgical fixes for the 20 test failures introduced by the bulk-sed in commit efcfacb and the source flips in fd36857. All resolved tests fall into clear semantic categories: 1. **Migration test fixtures** (~16 tests in packages/utils/src/migration/* and packages/omo-opencode/src/shared/migration*): Bumped MIGRATION_KEY and assertion strings from 'anthropic/claude-opus-4-7' to 'anthropic/claude-opus-4-8' to match the new MODEL_VERSION_MAP chain (4-4 -> 4-8, 4-7 -> 4-8). Historical sidecar entries (4-5 -> 4-7 and 4-6 -> 4-7) preserved as legacy breadcrumbs - they're test data for sidecar storage semantics, not live migration map references. The orphan-lsp fixture model bumped to 4-8 (already-canonical, no migration triggered, no rewrite expected). 2. **Provider-model-id-transform tests** (model-core + omo-opencode mirror): Bumped expected dotted-form output from 'claude-opus-4.7' to 'claude-opus-4.8' since the input is now '-4-8' and the source transformer correctly produces '-4.8' for github-copilot/vercel providers. 3. **Atlas prompt routing**: Extended variant-resolver to use isClaudeOpus47OrLaterModel (was isClaudeOpus47Model) so the dedicated opus-4-7.md prompt is now reused for 4-8 as well, matching the Path A strategy chosen in the PR comment on code-yeongyu#5084. Atlas test now passes asserting 4-8 routes to opus-4-7 prompt. 4. **think-mode/switcher dotted-form tests**: Bumped dotted-form input from 'claude-opus-4.7' to 'claude-opus-4.8' to match the new canonical default. The dotted-normalization contract is unchanged; the test just uses the newer model as its example. 5. **fuzzyMatchModel github-copilot dotted-form test**: Bumped available-set entry from 'github-copilot/claude-opus-4.7' to 'github-copilot/claude-opus-4.8' so the hyphen-vs-dot matching contract is tested against the new canonical default version. 6. **model-fallback / generate-omo-config tests**: Updated expected hardcoded chain output for github-copilot to include 'github-copilot/claude-opus-4.8' (the actual dotted form produced by the github-copilot provider transformer after the bump) and momus fallback chain expected dotted version bumped to 4.8. 7. **runtime-fallback/index.test.ts equivalence-pair chain entries** (7 chain entries + 1 failing-model line): Reverted the bulk-sed bump back to 'anthropic/claude-opus-4-7' for the specific chain slots that are paired with 'github-copilot/claude-opus-4.7' as equivalence-skip test setups. These tests assert that areRuntimeFallbackModelsEquivalent() correctly skips a chain candidate when its canonical modelID matches the current model - they need both sides of the pair at the same version. Bumping only one half broke the equivalence-skip semantic. The L2805 failing-model line in 'preserve resolved agent during auto-retry' reverted for the same reason (chain[0] is still 4.7-copilot). After these fixes the test suite shows 0 new failures introduced by this PR. The 44 remaining failures are all pre-existing on upstream/dev tip b949c34 and orthogonal to a model defaults bump: codex CLI bundling, runtime wrappers, plugin metadata entrypoints, Windows Git Bash bundled rule, lsp-daemon MCP proxy, findProjectRoot, package metadata, ast-grep manifest pins, etc. Refs code-yeongyu#5084.

Deftera186 added 4 commits June 18, 2026 22:24

Deftera186 mentioned this pull request Jun 18, 2026

[Feature]: Promote claude-opus-4-8 to canonical default for Sisyphus / Prometheus / high-tier categories #5084

Open

7 tasks

Deftera186 marked this pull request as ready for review June 18, 2026 20:02

github-actions Bot added the prompts-core Changes under packages/prompts-core label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405

refactor(models): promote claude-opus-4-8 to canonical default (closes #5084)#5405
Deftera186 wants to merge 5 commits into
code-yeongyu:devfrom
Deftera186:feat/promote-claude-opus-4-8-canonical-default

Deftera186 commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Deftera186 commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approach

Snapshot Strategy: Manual Patch (Path B)

Backward Compat (per #3486 review pattern)

Test Plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Deftera186 commented Jun 18, 2026 •

edited

Loading