feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403
Open
EvangelosMoschou wants to merge 8 commits into
Open
feat(agents): add DeepSeek V4 Pro/Flash Sisyphus prompt routing#5403EvangelosMoschou wants to merge 8 commits into
EvangelosMoschou wants to merge 8 commits into
Conversation
…aunch paths Kimi K2 (k2p5/k2p6/k2p7) is especially prone to follow the most recent/repeated instruction (recency bias). The launch return text and BACKGROUND_TASK_DESCRIPTION both advertised `Use \`background_output\` to get results` / `Use \`background_output\` with task_id="..." to check`. This CTA overrides the `Do NOT call background_output now` anti-polling line, so the model immediately calls background_output instead of waiting for the <system-reminder> completion notification. Removed the polling CTAs from all 6 launch/description paths and reworded them to lead with the no-polling instruction: - packages/omo-opencode/src/tools/background-task/constants.ts (BACKGROUND_TASK_DESCRIPTION) - packages/omo-opencode/src/tools/background-task/create-background-task.ts (launch return) - packages/omo-opencode/src/tools/call-omo-agent/background-agent-executor.ts (launch return) - packages/omo-opencode/src/tools/call-omo-agent/background-executor.ts (launch return) - packages/omo-opencode/src/tools/delegate-task/background-task.ts (launch return) - packages/omo-opencode/src/tools/delegate-task/background-continuation.ts (continuation return) Tests (TDD, RED -> GREEN): Added 5 regression tests, one per affected file. Each asserts the launch/continuation return no longer contains the `Use \`background_output\` with task_id=` CTA while still carrying the anti-polling instruction. Note: The regex fix for isKimiK2Model (k2p7 detection) is already in upstream via 80ae810, so this PR is scoped to the CTA removal only. Refs code-yeongyu#5221
- Add isDeepSeekV4Model, isDeepSeekV4ProModel, isDeepSeekV4FlashModel detectors to model-family-detectors.ts - Route DeepSeek V4 models to a dedicated action-oriented Sisyphus prompt via buildGptSisyphusAgentConfig (OpenAI-compatible API format) - New deepseek-v4.ts prompt with 8-block architecture: identity, constraints, intent, explore, execution_loop, delegation, tasks, style - Re-export new detectors through agents/types.ts barrel - Tests: detector unit tests + factory routing test cases
Based on community research (deepseek-ai/DeepSeek-V3#1244, openclaw#72044, NousResearch/hermes-agent#17400, nvidia forums, pydantic-ai#5193): - TOOL CALL FORMAT: explicit instruction to use tool_calls field only, never serialize tool calls as raw content text (mitigates Issue code-yeongyu#1244) - VERIFICATION REQUIRED: warns about 94% hallucination rate, mandates verification of file paths, tool args, and results - THINKING MODE: guidance on Think High vs Non-think, warns that tool_choice="required" fails with HTTP 400 in thinking mode - LANGUAGE BLEED: warns about Chinese token leakage into English output - Pro vs Flash variants: different capability descriptions
Restored the original model-family-detectors.ts from upstream (which had isGlmModel and isGeminiModel that were accidentally deleted). Added DeepSeek V4 functions correctly after isMiniMaxModel. Fixed TS error in deepseek-v4.ts: buildTaskManagementSection takes 1 argument (useTaskSystem boolean), not 2.
d8a7eb8 to
bb5375b
Compare
…odels DeepSeek V4 suffers from tool call format drift after ~40 tool calls. The default 78% compaction threshold allows context to grow too large before compaction fires, by which point V4 has already drifted. This adds a V4-specific threshold of 35% that triggers compaction earlier, keeping context under the drift threshold. Non-V4 models retain the 78% threshold unchanged. TDD: 2 new tests (V4 triggers at 36%, non-V4 does not at 36%). All 18 existing preemptive-compaction tests still pass.
Adds 3 new research-backed improvements to the V4 Sisyphus prompt: 1. Context management guardrail: notes that the harness compacts at 35% for V4 models, so the model can rely on recent context being accurate. 2. Clarify-first instruction: ask only on real forks (2x+ effort difference), after inspecting first. From advance-minimax-m3-cursor-rules. 3. Precise spec delegation + subagent verification: V4 Pro is competent with clear specs but fabricates when specs are vague (Maestro project). Subagent reports are leads, not evidence — verify by inspecting files and rerunning checks (OpenSymphony experiment). All 20 routing/detector tests still pass.
This was referenced Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds DeepSeek V4 Pro/Flash model detection, Sisyphus prompt routing, a research-backed Sisyphus prompt with V4-specific guardrails, AND a harness-level earlier compaction threshold to prevent tool call drift.
Why
DeepSeek V4 Pro is a strong agentic coding model (80.6% SWE-bench, 73.6 MCPAtlas, 1M context) at 7x lower cost than Claude. Previously, V4 models fell through to the generic GPT fallback prompt. This PR gives them a dedicated prompt optimized for their strengths and aware of their weaknesses, plus a harness improvement that keeps context under the drift threshold.
Research-backed guardrails
Based on community findings (deepseek-ai/DeepSeek-V3#1244, Maestro, OpenSymphony, akitaonrails benchmark, advance-minimax-m3-cursor-rules):
Changes (9 files, +350 lines)
Prompt + Detection (7 files)
model-family-detectors.ts):isDeepSeekV4Model,isDeepSeekV4ProModel,isDeepSeekV4FlashModelagents/types.ts): re-export new detectorssisyphus-agent-factory.ts): V4 models routed viabuildGptSisyphusAgentConfigafter Kimi, before GPT-5.5sisyphus/deepseek-v4.ts, new): 8-block prompt with 7 V4-specific guardrails + 3 research-backed improvements (context management, clarify-first, precise spec delegation)sisyphus/index.ts): exportbuildDeepSeekV4SisyphusPromptmodel-family-detectors.test.ts): 9 new assertionssisyphus-agent-factory.test.ts): 2 new routing test casesHarness Improvement (2 files)
preemptive-compaction-trigger.ts): V4 models get 35% compaction threshold (vs 78% default) to prevent tool call driftpreemptive-compaction.test.ts): 2 new tests (V4 triggers at 36%, non-V4 does not at 36%)TDD Evidence
Rebase note
Branch was rebased onto
upstream/dev. One conflict inmodel-family-detectors.test.ts(upstream added Claude Fable/Mythos tests at the same location) was resolved by keeping both test blocks.Usage
{ "agents": { "sisyphus": { "model": "deepseek/deepseek-v4-pro" } } }The plugin will automatically detect the model family, route to the DeepSeek V4 prompt, apply the V4-specific guardrails, and compact context at 35% to prevent tool call drift.