Skip to content

feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403

Open
EvangelosMoschou wants to merge 8 commits into
code-yeongyu:devfrom
EvangelosMoschou:feat/deepseek-v4-sisyphus
Open

feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403
EvangelosMoschou wants to merge 8 commits into
code-yeongyu:devfrom
EvangelosMoschou:feat/deepseek-v4-sisyphus

Conversation

@EvangelosMoschou

@EvangelosMoschou EvangelosMoschou commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Adds DeepSeek V4 Pro/Flash model detection, Sisyphus prompt routing, a research-backed Sisyphus prompt with V4-specific guardrails, AND a harness-level earlier compaction threshold to prevent tool call drift.

Why

DeepSeek V4 Pro is a strong agentic coding model (80.6% SWE-bench, 73.6 MCPAtlas, 1M context) at 7x lower cost than Claude. Previously, V4 models fell through to the generic GPT fallback prompt. This PR gives them a dedicated prompt optimized for their strengths and aware of their weaknesses, plus a harness improvement that keeps context under the drift threshold.

Research-backed guardrails

Based on community findings (deepseek-ai/DeepSeek-V3#1244, Maestro, OpenSymphony, akitaonrails benchmark, advance-minimax-m3-cursor-rules):

Known issue Source Mitigation
Tool call format drift after ~40 tools Issue #1244 Prompt: "use tool_calls field ONLY" + Harness: compaction at 35% (vs 78%) to stay under drift threshold
High hallucination rate (94%) TechJacks VERIFICATION REQUIRED section + subagent verification: "A subagent report is a lead, not evidence"
tool_choice rejection in thinking mode Issues #1376, #836 Prompt: prefer "auto", avoid forced tool calls
Language bleed (Chinese tokens) NV forums Explicit language constraint
Vague spec fabrication Maestro project Precise spec delegation: "Give each delegate a PRECISE spec: file paths, acceptance criteria, scope boundaries"
Pro vs Flash coherence DeepSeek docs Variant-specific capability descriptions

Changes (9 files, +350 lines)

Prompt + Detection (7 files)

  1. Detectors (model-family-detectors.ts): isDeepSeekV4Model, isDeepSeekV4ProModel, isDeepSeekV4FlashModel
  2. Types barrel (agents/types.ts): re-export new detectors
  3. Routing (sisyphus-agent-factory.ts): V4 models routed via buildGptSisyphusAgentConfig after Kimi, before GPT-5.5
  4. Prompt (sisyphus/deepseek-v4.ts, new): 8-block prompt with 7 V4-specific guardrails + 3 research-backed improvements (context management, clarify-first, precise spec delegation)
  5. Sisyphus barrel (sisyphus/index.ts): export buildDeepSeekV4SisyphusPrompt
  6. Detectors test (model-family-detectors.test.ts): 9 new assertions
  7. Routing test (sisyphus-agent-factory.test.ts): 2 new routing test cases

Harness Improvement (2 files)

  1. Compaction trigger (preemptive-compaction-trigger.ts): V4 models get 35% compaction threshold (vs 78% default) to prevent tool call drift
  2. Compaction test (preemptive-compaction.test.ts): 2 new tests (V4 triggers at 36%, non-V4 does not at 36%)

TDD Evidence

  • Detectors: RED → GREEN (9/9 tests pass)
  • Routing: 2 new cases assert Pro/Flash route to V4 prompt
  • Compaction: RED (V4 test failed at 36%) → GREEN (V4 triggers at 36%, non-V4 preserved at 78%)
  • All 18 existing preemptive-compaction tests still pass
  • All 20 routing/detector tests still pass
  • CI: green on all platforms

Rebase note

Branch was rebased onto upstream/dev. One conflict in model-family-detectors.test.ts (upstream added Claude Fable/Mythos tests at the same location) was resolved by keeping both test blocks.

Usage

{
  "agents": {
    "sisyphus": {
      "model": "deepseek/deepseek-v4-pro"
    }
  }
}

The plugin will automatically detect the model family, route to the DeepSeek V4 prompt, apply the V4-specific guardrails, and compact context at 35% to prevent tool call drift.

@github-actions github-actions Bot added model-core Changes under packages/model-core opencode OpenCode edition: packages/omo-opencode labels Jun 18, 2026
…aunch paths

Kimi K2 (k2p5/k2p6/k2p7) is especially prone to follow the most
recent/repeated instruction (recency bias). The launch return text
and BACKGROUND_TASK_DESCRIPTION both advertised
`Use \`background_output\` to get results` /
`Use \`background_output\` with task_id="..." to check`.

This CTA overrides the `Do NOT call background_output now` anti-polling
line, so the model immediately calls background_output instead of
waiting for the <system-reminder> completion notification.

Removed the polling CTAs from all 6 launch/description paths and
reworded them to lead with the no-polling instruction:

- packages/omo-opencode/src/tools/background-task/constants.ts (BACKGROUND_TASK_DESCRIPTION)
- packages/omo-opencode/src/tools/background-task/create-background-task.ts (launch return)
- packages/omo-opencode/src/tools/call-omo-agent/background-agent-executor.ts (launch return)
- packages/omo-opencode/src/tools/call-omo-agent/background-executor.ts (launch return)
- packages/omo-opencode/src/tools/delegate-task/background-task.ts (launch return)
- packages/omo-opencode/src/tools/delegate-task/background-continuation.ts (continuation return)

Tests (TDD, RED -> GREEN):
Added 5 regression tests, one per affected file. Each asserts the
launch/continuation return no longer contains the
`Use \`background_output\` with task_id=` CTA while still carrying
the anti-polling instruction.

Note: The regex fix for isKimiK2Model (k2p7 detection) is already in
upstream via 80ae810, so this PR is scoped to the CTA removal only.

Refs code-yeongyu#5221
- Add isDeepSeekV4Model, isDeepSeekV4ProModel, isDeepSeekV4FlashModel
  detectors to model-family-detectors.ts
- Route DeepSeek V4 models to a dedicated action-oriented Sisyphus prompt
  via buildGptSisyphusAgentConfig (OpenAI-compatible API format)
- New deepseek-v4.ts prompt with 8-block architecture: identity,
  constraints, intent, explore, execution_loop, delegation, tasks, style
- Re-export new detectors through agents/types.ts barrel
- Tests: detector unit tests + factory routing test cases
Based on community research (deepseek-ai/DeepSeek-V3#1244, openclaw#72044,
NousResearch/hermes-agent#17400, nvidia forums, pydantic-ai#5193):

- TOOL CALL FORMAT: explicit instruction to use tool_calls field only,
  never serialize tool calls as raw content text (mitigates Issue code-yeongyu#1244)
- VERIFICATION REQUIRED: warns about 94% hallucination rate, mandates
  verification of file paths, tool args, and results
- THINKING MODE: guidance on Think High vs Non-think, warns that
  tool_choice="required" fails with HTTP 400 in thinking mode
- LANGUAGE BLEED: warns about Chinese token leakage into English output
- Pro vs Flash variants: different capability descriptions
Restored the original model-family-detectors.ts from upstream (which
had isGlmModel and isGeminiModel that were accidentally deleted).
Added DeepSeek V4 functions correctly after isMiniMaxModel.

Fixed TS error in deepseek-v4.ts: buildTaskManagementSection takes
1 argument (useTaskSystem boolean), not 2.
@EvangelosMoschou EvangelosMoschou force-pushed the feat/deepseek-v4-sisyphus branch from d8a7eb8 to bb5375b Compare June 19, 2026 13:38
…odels

DeepSeek V4 suffers from tool call format drift after ~40 tool calls.
The default 78% compaction threshold allows context to grow too large
before compaction fires, by which point V4 has already drifted.

This adds a V4-specific threshold of 35% that triggers compaction
earlier, keeping context under the drift threshold. Non-V4 models
retain the 78% threshold unchanged.

TDD: 2 new tests (V4 triggers at 36%, non-V4 does not at 36%).
All 18 existing preemptive-compaction tests still pass.
Adds 3 new research-backed improvements to the V4 Sisyphus prompt:

1. Context management guardrail: notes that the harness compacts at 35%
   for V4 models, so the model can rely on recent context being accurate.

2. Clarify-first instruction: ask only on real forks (2x+ effort
   difference), after inspecting first. From advance-minimax-m3-cursor-rules.

3. Precise spec delegation + subagent verification: V4 Pro is competent
   with clear specs but fabricates when specs are vague (Maestro project).
   Subagent reports are leads, not evidence — verify by inspecting files
   and rerunning checks (OpenSymphony experiment).

All 20 routing/detector tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model-core Changes under packages/model-core opencode OpenCode edition: packages/omo-opencode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant