Trae Agent Architectural Overhaul - Multi-Agent Orchestration, Incremental CKG, Fuzzy Editing & I/O Resilience#413
Trae Agent Architectural Overhaul - Multi-Agent Orchestration, Incremental CKG, Fuzzy Editing & I/O Resilience#413BobcGn wants to merge 15 commits into
Conversation
Replace single-shot bash execution with persistent session management: auto-reconnect on timeout, structured error parsing, configurable HOME dir. Add comprehensive test coverage for session lifecycle and edge cases.
Replace brittle exact-match str_replace with difflib.SequenceMatcher-based fuzzy matching (0.85 threshold). Add full-file write command, line offset tracker for post-edit line number adjustment, atomic file writes via tempfile+os.replace, and whitespace normalization. Fix view_range -1 bug.
Expand AgentStepState with PLANNING/CODING/REVIEWING/WAITING/RETRYING. Add _compress_messages context compression and _reset_llm_client_history. Fix reflect_on_result returning None, fix _tool_call_handler for None tool_calls. Add OrchestratorAgent route in Agent facade, widen self.agent type to BaseAgent. Add per-phase system prompts (PLANNER/CODER/REVIEWER).
Add OrchestratorAgent with phase-isolated contexts, per-phase tool permissions, structured text handoff between phases, and phase completion detection (plan completed / task_done / review verdict). Add 42 tests covering phase constants, detection, tool isolation, context handoff, full execution, compression, and state changes.
Add .idea project configuration, architecture analysis docs, pain point documentation, and CLAUDE.md with code conventions and testing guidelines.
BobcGn
left a comment
There was a problem hiding this comment.
Overall Impression:
Incredible work on this PR! Taking Trae Agent from a flat ReAct loop to a resilient, multi-stage orchestrator is a massive leap forward. The implementations for Bash I/O resilience, the fuzzy editing engine, and the incremental CKG updates are exactly what this architecture needed to scale. A huge shoutout for including comprehensive documentation (docs/) and extensive unit tests (tests/agent/test_orchestrator_agent.py, etc.) alongside the changesets.
Action Required (Blocker):
Before we can merge this, there is one critical housekeeping issue that needs to be addressed:
• Remove .idea/ directory: I noticed that local IDE configuration files (.idea/inspectionProfiles, .idea/misc.xml, .idea/vcs.xml, etc.) have been committed. These are specific to your local JetBrains (PyCharm/IntelliJ) environment and should not be tracked in the global repository.
• Action: Please remove the .idea/ folder from this branch's Git tracking (git rm -r --cached .idea) and ensure .idea/ is listed in your local global .gitignore.
Architecture Notes:
• Phase 1 & 2 (I/O & Editing): The stall detection mechanisms and the fallback to fuzzy matching look highly robust.
• Phase 3 (CKG): The transition to incremental updates will significantly reduce latency.
• Phase 4 (Orchestration): The state machine additions in test_agent_basics.py and the context compression logic are well-structured.
Next Steps:
I am submitting this as Request Changes strictly due to the committed .idea/ files. Once those are removed from the commit history, I will gladly approve and merge this architectural overhaul. Excellent engineering overall!
Add DeepSeekClient via OpenAI-compatible base with default endpoint https://api.deepseek.com. Register DEEPSEEK provider in LLMProvider enum and LLMClient dispatch.
…alternation - Add reasoning_content to LLMMessage schema for DeepSeek R1/V4 chain-of-thought round-trip (capture from response, include in assistant payload on re-send) - Dynamically strip temperature/top_p and use max_completion_tokens for reasoning models (o1/o3/o4-mini/gpt-5) to avoid 400 errors - Enforce strict user/assistant alternation in Anthropic client via _normalize_alternation() — merge consecutive same-role messages - Fix _compress_messages tail boundary to avoid splitting tool_call/tool_result atomic pairs, preventing orphan tool results in OpenAI/DeepSeek providers
|
@chao-peng |
…oogle tests - Style: ruff format --fix on 8 files (base_agent, orchestrator_agent, edit_tool, tests) - Test: fix GoogleClient mock — use get_name()/get_description() return_value instead of attribute assignment; fix supports_tool_calling ModelConfig; fix init_with_env_key api_key source - Chore: add changesets for deepseek-provider, reasoning-content, anthropic-role-fix; add .changeset/config.json
…ion/global strategies - MicroCompressionStrategy: dual-trigger (SEMANTIC keyword + FORCED interval/errors) - SessionCompressionStrategy: phase-boundary context handoff - GlobalStateManager: persistent cross-phase state in WORKSPACE_STATE.md - Safe atomic cut: backtracking Algorithm B for tool_call/tool_result pair integrity - Lazy-load refs: tool output replacement with [lazy-ref:hash] for large content - FileBackend security: path traversal prevention + TOCTOU-safe read() - Markdown injection prevention: _escape_md_lines() for ## -prefixed lines - Add changeset: compression-refactor (minor)
…nd OrchestratorAgent - BaseAgent: delegate _compress_messages to shared MicroCompressionStrategy singleton with proper last_compression_step state tracking (方案 B) - BaseAgent: add _reset_llm_client_history() after compression for client consistency - BaseAgent: remove old manual HEAD/TAIL compression code (~50 lines) - OrchestratorAgent: per-phase micro-compression with dual-trigger model (SEMANTIC keyword + FORCED step interval / consecutive errors) - OrchestratorAgent: track last_assistant_message and consecutive_errors - OrchestratorAgent: add structured compression logging - ruff: fix I001 import ordering in orchestrator_agent.py
…r, context - test_phase2_compression.py (87 tests): find_safe_cut edge cases, from_markdown error recovery, MicroCompressionStrategy triggers, _escape_md_lines, SessionCompressionStrategy, GlobalStateManager - test_orchestrator_compression.py (5 tests TC-1~TC-5): step-interval trigger, consecutive error trigger, client history reset, semantic keyword trigger, no-trigger boundary condition - test_context_compression.py: adapt to unified MicroCompressionStrategy format - test_google_client.py: fix test method rename - test_phase1_smoke.py: compression module import smoke test - .gitignore: add /review/ directory
|
@chao-peng |
- 新增 ResolveLazyRefTool,支持 [lazy-ref:<hash>] 占位符的前缀匹配与消歧义 - 注册到 tools_registry 和 TraeAgentToolNames - 修正 LazyRef TypeAlias 文档与实际格式对齐 ([lazy-ref:<hash>])
- 新增 SkillsRegistry 动态技能引擎:项目探针 + 元架构模板注入 - 单数据源 LANGUAGE_DETECTION_PRIORITY 自动派生 LANGUAGE_DETECTORS - 四角色 Prompt 全面重写:XML 零逃逸契约、Tool-first 正面锚定 - Coder 闭环验证:Do NOT call task_done until ALL tests pass - Reviewer CI/CD MUST 强制执行 + resolve_lazy_ref 工具声明 - 压缩感知四角色对齐
…r tracking - Planner 阶段完成检测增强:XML 闭合 AND 信号双校验 - Reviewer CI/CD 强制执行:_reviewer_executed_bash() 运行时检测 - PHASE_TOOL_NAMES set 化 + resolve_lazy_ref 全阶段可用 - lazy-ref 集成:register_lazy_ref 回调 + _scrub_sensitive_data() 敏感数据过滤 - Session 压缩安全加固 + 消除重复 SHA256 计算 - BaseAgent consecutive_errors 正确追踪并传递到 CompressionContext
|
@chao-peng @trae-agent |
This PR introduces a comprehensive architectural overhaul of the Trae Agent. It addresses four critical pain points: brittle I/O blocking (Bash timeouts), strict exact-match editing failures, inefficient full-rebuild Code Knowledge Graph (CKG), and the limitations of the single-threaded ReAct loop. By introducing stall detection, fuzzy matching, incremental updates, and a multi-stage OrchestratorAgent, this overhaul transforms Trae into a highly fault-tolerant, scalable, and efficient multi-agent system.