feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing by EvangelosMoschou · Pull Request #5403 · code-yeongyu/oh-my-openagent

EvangelosMoschou · 2026-06-18T14:58:03Z

What

Adds DeepSeek V4 Pro/Flash model detection, Sisyphus prompt routing, a research-backed Sisyphus prompt with V4-specific guardrails, AND a harness-level earlier compaction threshold to prevent tool call drift.

Why

DeepSeek V4 Pro is a strong agentic coding model (80.6% SWE-bench, 73.6 MCPAtlas, 1M context) at 7x lower cost than Claude. Previously, V4 models fell through to the generic GPT fallback prompt. This PR gives them a dedicated prompt optimized for their strengths and aware of their weaknesses, plus a harness improvement that keeps context under the drift threshold.

Research-backed guardrails

Based on community findings (deepseek-ai/DeepSeek-V3#1244, Maestro, OpenSymphony, akitaonrails benchmark, advance-minimax-m3-cursor-rules):

Known issue	Source	Mitigation
Tool call format drift after ~40 tools	Issue #1244	Prompt: "use tool_calls field ONLY" + Harness: compaction at 35% (vs 78%) to stay under drift threshold
High hallucination rate (94%)	TechJacks	VERIFICATION REQUIRED section + subagent verification: "A subagent report is a lead, not evidence"
tool_choice rejection in thinking mode	Issues #1376, #836	Prompt: prefer "auto", avoid forced tool calls
Language bleed (Chinese tokens)	NV forums	Explicit language constraint
Vague spec fabrication	Maestro project	Precise spec delegation: "Give each delegate a PRECISE spec: file paths, acceptance criteria, scope boundaries"
Pro vs Flash coherence	DeepSeek docs	Variant-specific capability descriptions

Changes (9 files, +350 lines)

Prompt + Detection (7 files)

Detectors (model-family-detectors.ts): isDeepSeekV4Model, isDeepSeekV4ProModel, isDeepSeekV4FlashModel
Types barrel (agents/types.ts): re-export new detectors
Routing (sisyphus-agent-factory.ts): V4 models routed via buildGptSisyphusAgentConfig after Kimi, before GPT-5.5
Prompt (sisyphus/deepseek-v4.ts, new): 8-block prompt with 7 V4-specific guardrails + 3 research-backed improvements (context management, clarify-first, precise spec delegation)
Sisyphus barrel (sisyphus/index.ts): export buildDeepSeekV4SisyphusPrompt
Detectors test (model-family-detectors.test.ts): 9 new assertions
Routing test (sisyphus-agent-factory.test.ts): 2 new routing test cases

Harness Improvement (2 files)

Compaction trigger (preemptive-compaction-trigger.ts): V4 models get 35% compaction threshold (vs 78% default) to prevent tool call drift
Compaction test (preemptive-compaction.test.ts): 2 new tests (V4 triggers at 36%, non-V4 does not at 36%)

TDD Evidence

Detectors: RED → GREEN (9/9 tests pass)
Routing: 2 new cases assert Pro/Flash route to V4 prompt
Compaction: RED (V4 test failed at 36%) → GREEN (V4 triggers at 36%, non-V4 preserved at 78%)
All 18 existing preemptive-compaction tests still pass
All 20 routing/detector tests still pass
CI: green on all platforms

Rebase note

Branch was rebased onto upstream/dev. One conflict in model-family-detectors.test.ts (upstream added Claude Fable/Mythos tests at the same location) was resolved by keeping both test blocks.

Usage

{
  "agents": {
    "sisyphus": {
      "model": "deepseek/deepseek-v4-pro"
    }
  }
}

The plugin will automatically detect the model family, route to the DeepSeek V4 prompt, apply the V4-specific guardrails, and compact context at 35% to prevent tool call drift.

…aunch paths Kimi K2 (k2p5/k2p6/k2p7) is especially prone to follow the most recent/repeated instruction (recency bias). The launch return text and BACKGROUND_TASK_DESCRIPTION both advertised `Use \`background_output\` to get results` / `Use \`background_output\` with task_id="..." to check`. This CTA overrides the `Do NOT call background_output now` anti-polling line, so the model immediately calls background_output instead of waiting for the <system-reminder> completion notification. Removed the polling CTAs from all 6 launch/description paths and reworded them to lead with the no-polling instruction: - packages/omo-opencode/src/tools/background-task/constants.ts (BACKGROUND_TASK_DESCRIPTION) - packages/omo-opencode/src/tools/background-task/create-background-task.ts (launch return) - packages/omo-opencode/src/tools/call-omo-agent/background-agent-executor.ts (launch return) - packages/omo-opencode/src/tools/call-omo-agent/background-executor.ts (launch return) - packages/omo-opencode/src/tools/delegate-task/background-task.ts (launch return) - packages/omo-opencode/src/tools/delegate-task/background-continuation.ts (continuation return) Tests (TDD, RED -> GREEN): Added 5 regression tests, one per affected file. Each asserts the launch/continuation return no longer contains the `Use \`background_output\` with task_id=` CTA while still carrying the anti-polling instruction. Note: The regex fix for isKimiK2Model (k2p7 detection) is already in upstream via 80ae810, so this PR is scoped to the CTA removal only. Refs code-yeongyu#5221

- Add isDeepSeekV4Model, isDeepSeekV4ProModel, isDeepSeekV4FlashModel detectors to model-family-detectors.ts - Route DeepSeek V4 models to a dedicated action-oriented Sisyphus prompt via buildGptSisyphusAgentConfig (OpenAI-compatible API format) - New deepseek-v4.ts prompt with 8-block architecture: identity, constraints, intent, explore, execution_loop, delegation, tasks, style - Re-export new detectors through agents/types.ts barrel - Tests: detector unit tests + factory routing test cases

Based on community research (deepseek-ai/DeepSeek-V3#1244, openclaw#72044, NousResearch/hermes-agent#17400, nvidia forums, pydantic-ai#5193): - TOOL CALL FORMAT: explicit instruction to use tool_calls field only, never serialize tool calls as raw content text (mitigates Issue code-yeongyu#1244) - VERIFICATION REQUIRED: warns about 94% hallucination rate, mandates verification of file paths, tool args, and results - THINKING MODE: guidance on Think High vs Non-think, warns that tool_choice="required" fails with HTTP 400 in thinking mode - LANGUAGE BLEED: warns about Chinese token leakage into English output - Pro vs Flash variants: different capability descriptions

Restored the original model-family-detectors.ts from upstream (which had isGlmModel and isGeminiModel that were accidentally deleted). Added DeepSeek V4 functions correctly after isMiniMaxModel. Fixed TS error in deepseek-v4.ts: buildTaskManagementSection takes 1 argument (useTaskSystem boolean), not 2.

…odels DeepSeek V4 suffers from tool call format drift after ~40 tool calls. The default 78% compaction threshold allows context to grow too large before compaction fires, by which point V4 has already drifted. This adds a V4-specific threshold of 35% that triggers compaction earlier, keeping context under the drift threshold. Non-V4 models retain the 78% threshold unchanged. TDD: 2 new tests (V4 triggers at 36%, non-V4 does not at 36%). All 18 existing preemptive-compaction tests still pass.

Adds 3 new research-backed improvements to the V4 Sisyphus prompt: 1. Context management guardrail: notes that the harness compacts at 35% for V4 models, so the model can rely on recent context being accurate. 2. Clarify-first instruction: ask only on real forks (2x+ effort difference), after inspecting first. From advance-minimax-m3-cursor-rules. 3. Precise spec delegation + subagent verification: V4 Pro is competent with clear specs but fabricates when specs are vague (Maestro project). Subagent reports are leads, not evidence — verify by inspecting files and rerunning checks (OpenSymphony experiment). All 20 routing/detector tests still pass.

github-actions Bot added model-core Changes under packages/model-core opencode OpenCode edition: packages/omo-opencode labels Jun 18, 2026

EvangelosMoschou added 4 commits June 19, 2026 16:35

EvangelosMoschou force-pushed the feat/deepseek-v4-sisyphus branch from d8a7eb8 to bb5375b Compare June 19, 2026 13:38

EvangelosMoschou added 4 commits June 19, 2026 16:43

ci: re-trigger pipeline after windows test failure

a00134c

ci: re-trigger pipeline (attempt 4)

2e41aa8

This was referenced Jun 19, 2026

feat(hooks): add V4 verification gate for DeepSeek V4 subagent results #5437

Open

feat(hooks): add V4 checkpoint writer for long DeepSeek V4 sessions #5438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403

feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403
EvangelosMoschou wants to merge 8 commits into
code-yeongyu:devfrom
EvangelosMoschou:feat/deepseek-v4-sisyphus

EvangelosMoschou commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EvangelosMoschou commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Research-backed guardrails

Changes (9 files, +350 lines)

Prompt + Detection (7 files)

Harness Improvement (2 files)

TDD Evidence

Rebase note

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EvangelosMoschou commented Jun 18, 2026 •

edited

Loading