Skip to content

Commit c4f03db

Browse files
ryaneggzclaude
andauthored
feat: MCP Sandbox backend, dispatch, frontend UX, and native tool overrides (#890) (#897)
* Specs and plans complete, start loop * feat: US-001 - McpSandboxBackend class skeleton and constructor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-003 through US-009 - MCP session init, execute, file ops, health, close Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-010 through US-014 - Unit tests for MCP sandbox backend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-015, US-016 - Add MCP to SandboxType enum and settings schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-017, US-018, US-019 - MCP sandbox factory, dispatch rules, error helper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-020, US-021 - Wire mcp_sandbox_url through LLMController and stream_generator - _resolve_user_settings() returns 4-tuple (model, api_key, default_sandbox, mcp_sandbox_url) - mcp_sandbox_url passed to resolve_sandbox_backend() in llm_invoke() and stream_generator() - MCP error handling in both llm_invoke() and stream_generator() mirroring Daytona pattern - stream_generator() emits mcp_sandbox_unreachable SSE event on MCP failure - Auto mode falls back to State backend on MCP errors - Updated controller unit tests for 4-tuple return value Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-022 - Wire mcp_sandbox_url through worker tasks - _execute_agent_stream() reads settings.default_mcp_sandbox_url - Value passed to resolve_sandbox_backend() with mcp_sandbox_url parameter - MCP error handling: sandbox_type='mcp' writes mcp_sandbox_unreachable to Redis stream - Auto mode with effective_type='mcp' falls back to State backend and retries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-036 through US-044 - Frontend MCP sandbox UX - SandboxType extended with 'mcp', normalizeSandboxValue recognizes 'mcp' - mcp_sandbox_url added to DefaultsResponse and patchDefaults types - MCP entry added to SANDBOX_OPTIONS - MCP Sandbox URL input in SandboxSettings (persists on blur/Enter) - MCP option visibility gated on non-empty mcp_sandbox_url - useSandboxHealth hook pings derived /health endpoint - Green/red health dot in SandboxSettings and ThreadSandboxStatus - mcp_sandbox_unreachable SSE event shows persistent error toast Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: US-046 through US-059 - Native MCP tool overrides with graceful degradation - Tool discovery: _ensure_initialized() calls tools/list, caches in _available_tools - _parse_meta() extracts exit_code and sandbox_id from _meta structure - id property returns _meta.sandbox_id when available, fallback to local format - execute() uses 'execute' tool on new servers, 'exec_command' on old - Native overrides: read(), write(), edit(), grep_raw(), glob_info(), ls_info() - Native upload_files()/download_files() via upload_file/download_file MCP tools - All methods fall back to super() (inherited shell delegation) when native tools unavailable - _parse_json_lines() helper for grep/glob/ls JSON lines parsing - 75 unit tests covering native paths, fallback paths, _meta parsing, graceful degradation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * feat: MCP sandbox API key wiring, unified settings UX, file sync, and health proxy Wire MCP_SANDBOX_API_KEY end-to-end using provider keys pattern (encrypted storage). Redesign sandbox settings so MCP config (URL + API key) only appears when MCP is selected. Fix McpSandboxBackend SSE response parsing for Streamable HTTP transport, populate files_update on write/edit for frontend file sync, and add backend health-check proxy to bypass CORS. - Add MCP_SANDBOX_API_KEY to UserTokenKey enum and env fallback - Wire mcp_api_key through agents factory, LLMController, stream_generator, and worker tasks - Redesign SandboxSettings: conditional MCP config panel with URL, API key, save button, and live health status - MCP option always visible in sandbox dropdown (settings + thread selector) - Add _parse_sse_json() to handle SSE-framed MCP responses (fixes tool discovery and all tool calls against new exec_server) - Return files_update from McpSandboxBackend.write() and edit() so files propagate to frontend file editor panel - Add GET /settings/mcp-sandbox-health proxy endpoint; update useSandboxHealth hook to use backend proxy instead of direct browser fetch - Update tests for 5-tuple _resolve_user_settings and SSE mock responses Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> * chore: update wiki and sandboxes submodules - wiki: MCP sandbox docs (10 tools, Default Sandbox UX, health check) - sandboxes: CI pipeline with conformance tests on branch push and tag create Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com> --------- Signed-off-by: ryaneggz <kre8mymedia@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3dffdac commit c4f03db

37 files changed

Lines changed: 7444 additions & 54 deletions

.claude/plans/cryptic-enchanting-wave.md

Lines changed: 1337 additions & 0 deletions
Large diffs are not rendered by default.

.claude/plans/greedy-tumbling-boole-agent-a03dc3ae9503ebd65.md

Lines changed: 395 additions & 0 deletions
Large diffs are not rendered by default.

.claude/plans/greedy-tumbling-boole-agent-a1f62fc892d2b5f0e.md

Lines changed: 199 additions & 0 deletions
Large diffs are not rendered by default.

.claude/plans/greedy-tumbling-boole-agent-a6e07ed02be61f7cd.md

Lines changed: 356 additions & 0 deletions
Large diffs are not rendered by default.

.claude/plans/greedy-tumbling-boole-agent-aec0915e73f3d3de4.md

Lines changed: 256 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# MCP Sandbox Artifact Alignment Audit
2+
3+
## Context
4+
5+
Three artifact layers describe Feature #890 (MCP Sandbox):
6+
- **Specs** (5 files): `.claude/specs/feat-890-mcp-sandbox/001-005`
7+
- **PRD Tasks** (5 files): `tasks/prd-mcp-sandbox-{backend,dispatch,exec-server,frontend,v2-native}.md`
8+
- **Ralph PRD**: `.ralph/prd.json` (59 user stories, US-001 through US-059)
9+
10+
This audit identifies alignment gaps that must be resolved **before Ralph execution begins**.
11+
12+
---
13+
14+
## Council Scores
15+
16+
| Auditor | Score | Summary |
17+
|---------|-------|---------|
18+
| Story Coverage | **9/10** | All 59 stories covered across all layers. Minor ID numbering drift. |
19+
| Acceptance Criteria | **5/10** | 8 critical contradictions, 10 major inconsistencies found. |
20+
| Dependency & Sequencing | **5/10** | 3 critical sequencing issues, 4 moderate gaps. |
21+
| Technical Consistency | **6.5/10** | 5 high-severity codebase mismatches, 5 medium. |
22+
23+
**Composite Score: 6.4/10** — Story coverage is excellent but implementation-level specs have dangerous contradictions.
24+
25+
---
26+
27+
## CRITICAL Issues (Must Fix Before Implementation)
28+
29+
### C-1: Constructor parameter name — `base_url` vs `url`
30+
- **Ralph/PRD**: `__init__(self, *, base_url: str, api_key: str | None = None)`
31+
- **Specs 001/002/005**: Use `url` as the parameter name; factory calls `McpSandboxBackend(url=mcp_sandbox_url)`
32+
- **Impact**: Factory call crashes with `TypeError` at runtime
33+
- **Resolution**: Standardize on `base_url` (matches PRD and Ralph). Update specs 001, 002, 005.
34+
35+
### C-2: URL semantics — base URL vs full MCP endpoint
36+
- **PRD**: Stores `http://localhost:3005`, appends `/mcp` internally
37+
- **Spec 001**: Stores `http://localhost:3005/mcp`, strips `/mcp` to derive base
38+
- **Impact**: Health check, DELETE, and id property generate wrong URLs
39+
- **Resolution**: Standardize on base URL convention (PRD approach). Update spec 001.
40+
41+
### C-3: Sync vs Async HTTP client
42+
- **PRD/Specs**: `httpx.AsyncClient` with `async def` methods
43+
- **Task-Backend**: Recommends `httpx.Client` (sync) for DaytonaSandbox consistency
44+
- **Impact**: Architecture-level decision affecting every method signature
45+
- **Resolution**: Verify whether `BaseSandbox.execute()` is sync or async in `deepagents`. If sync, use `httpx.Client`. If async, keep `AsyncClient`. Update the losing artifact.
46+
47+
### C-4: MCP protocol version
48+
- **Spec 001**: `"2024-11-05"`
49+
- **Task-Backend**: `"2025-03-26"`
50+
- **Impact**: Wrong version may cause handshake rejection
51+
- **Resolution**: Check actual exec_server's expected version. Standardize across all artifacts.
52+
53+
### C-5: SSE error event format — 4 different formats
54+
- **Spec 002**: `("mcp_sandbox_unreachable", "error string")`
55+
- **Task-Dispatch**: `("error", {"type": "mcp_sandbox_unreachable", "message": "..."})`
56+
- **Task-Frontend**: `("mcp_sandbox_unreachable", {"message": "..."})`
57+
- **Spec 003**: `("mcp_sandbox_unreachable", "error message")`
58+
- **Impact**: Frontend toast handler will never fire if format mismatches
59+
- **Resolution**: Check existing `daytona_unreachable` event format for precedent. Standardize all 4.
60+
61+
### C-6: Tool name rename without backward compat creates breakage window
62+
- **Phase 1** (P1): Backend hardcodes `exec_command`
63+
- **Phase 4** (P4): Exec server renames to `execute` with NO alias
64+
- **Phase 5** (P5): Backend adds dual-name support
65+
- **Impact**: Between P4 deploy and P5 deploy, sandbox is completely broken
66+
- **Resolution**: Either (a) Phase 4 keeps `exec_command` as alias until Phase 5 ships, or (b) Phase 1 uses dual-name from the start.
67+
68+
### C-7: Private method name — `_ensure_initialized()` vs `_ensure_session()`
69+
- **PRD/Task-Backend**: `_ensure_initialized()`
70+
- **Specs**: `_ensure_session()`
71+
- **Impact**: Tests reference wrong method name
72+
- **Resolution**: Standardize on `_ensure_initialized()` (matches PRD). Update specs.
73+
74+
### C-8: Default client timeout — 120s vs 130s
75+
- **PRD**: 120 seconds
76+
- **Spec 001**: 130 seconds (`httpx.Timeout(130.0)`)
77+
- **Resolution**: Pick one (120s aligns with exec_server's max timeout). Update spec 001.
78+
79+
---
80+
81+
## MAJOR Issues (Should Fix)
82+
83+
### M-1: Health check timeout disagreement
84+
- Backend: 5s (PRD/Task-Backend) vs Frontend: 3s (Task-Frontend)
85+
- **Resolution**: Backend health() uses 5s. Frontend useSandboxHealth uses its own timeout (3s is fine for UX). Document both.
86+
87+
### M-2: `_resolve_user_settings` breaking change (3-tuple to 4-tuple)
88+
- File: `backend/src/controllers/llm.py:70`
89+
- Must update BOTH `llm_invoke` (line ~121) and `llm_stream` (line ~211)
90+
- Worker at `backend/src/workers/tasks.py:421-431` duplicates this logic inline and also needs updating
91+
- **Resolution**: Update all 3 callsites in the same commit (US-020/US-021/US-022).
92+
93+
### M-3: MCP factory signature mismatch with `_SANDBOX_FACTORIES` pattern
94+
- Existing factories: `factory(runtime)` — single param
95+
- MCP factory: `_create_mcp_backend_checked(runtime, mcp_sandbox_url)` — two params
96+
- Can't register in generic `_SANDBOX_FACTORIES` dict and call through same dispatch
97+
- **Resolution**: Either special-case MCP in `resolve_sandbox_backend()` or add `mcp_sandbox_url` to all factory signatures (adapter pattern).
98+
99+
### M-4: `ThreadSandboxStatus` wrong file path in spec 003
100+
- Spec 003: `frontend/src/components/tools/ThreadSandboxStatus.tsx`
101+
- Actual: `frontend/src/components/status/ThreadSandboxStatus.tsx`
102+
- **Resolution**: Fix spec 003.
103+
104+
### M-5: `upload_files()`/`download_files()` path validation
105+
- Task-Backend adds `invalid_path` error for paths not starting with `/`
106+
- PRD has no such validation
107+
- **Resolution**: Add path validation (defense in depth). Update Ralph ACs.
108+
109+
### M-6: `execute()` timeout parameter type
110+
- `int | None` (PRD/Ralph) vs `float | None` (Spec 001)
111+
- **Resolution**: Use `int | None` (matches BaseSandbox convention). Update spec 001.
112+
113+
### M-7: `clientInfo.name` in initialize payload
114+
- `"orchestra"` (Spec 001) vs `"orchestra-backend"` (Task-Backend)
115+
- **Resolution**: Use `"orchestra"` for brevity. Update task.
116+
117+
### M-8: `_parse_meta` return type
118+
- `tuple[int | None, str | None]` (PRD) vs `tuple[int, str]` without Optional (Spec 005)
119+
- **Resolution**: Must be Optional since `_meta` may be absent. Use PRD version.
120+
121+
### M-9: Exit code regex — first match vs last match
122+
- Spec 001 says "last match wins", PRD doesn't specify
123+
- **Resolution**: Document "first match" (simpler, `re.search` default). Update spec.
124+
125+
### M-10: `is_mcp_sandbox_error()` — include McpSandboxError or not
126+
- PRD: includes it
127+
- Task-Dispatch: leaves as open question
128+
- **Resolution**: Include it (the purpose of the helper). Update task.
129+
130+
---
131+
132+
## STRUCTURAL Issues (Sequencing & Dependencies)
133+
134+
### S-1: Phase 4 is in a different git repository
135+
Ralph processes stories sequentially in the orchestra repo. Phase 4 (US-024-035) operates in the `sandboxes/` submodule (separate `ruska-ai/sandboxes` repo). Ralph cannot:
136+
- Commit to the submodule
137+
- Build Docker images
138+
- Start containers
139+
- Run `conformance.sh`
140+
141+
**Resolution**: Extract Phase 4 stories into a separate Ralph run targeting the sandboxes repo, OR execute Phase 4 manually before Ralph processes Phase 5.
142+
143+
### S-2: Phase 3/4 parallelism wasted by priority ordering
144+
Ralph serializes P4 (priorities 24-35) entirely before P3 (priorities 36-45). Both are independent.
145+
146+
**Resolution**: If sequential execution is required, this is acceptable (just slower). If parallelism is desired, Phase 3 priorities should be interleaved with Phase 4.
147+
148+
### S-3: Integration gates lack service prerequisites
149+
- US-023 (P2 integration): Needs running API
150+
- US-045 (P3 integration): Needs running frontend + exec_server for health dot
151+
- US-059 (P5 integration): Needs running exec_server
152+
153+
**Resolution**: Add notes to these stories clarifying what services must be running. Ralph can't start services, so these gates may need manual validation.
154+
155+
### S-4: PRD task files use local US-XXX numbering
156+
Each PRD task resets to US-001. Cross-referencing with Ralph's global US-001-059 requires knowing the phase mapping.
157+
158+
**Resolution**: Add a mapping header to each PRD task file: "This file covers Ralph US-015 through US-023."
159+
160+
---
161+
162+
## MINOR Issues
163+
164+
1. Spec 003 health dot `bg-green-500, h-2 w-2` matches PRD exactly — no issue
165+
2. PRD notes field is empty on all 59 stories — Phase 4 submodule constraint only noted in last few
166+
3. Open design questions in PRD tasks (CORS proxying, file size limits) not tracked in Ralph
167+
4. Spec files use numbered requirements (1-16) instead of US-XXX IDs
168+
169+
---
170+
171+
## Recommended Fix Order
172+
173+
1. **Resolve C-3 first** (sync vs async) — this is architectural and affects everything
174+
2. **Fix C-1 + C-2** (constructor + URL convention) — affects factory, health, id property
175+
3. **Fix C-5** (SSE event format) — check existing Daytona pattern for precedent
176+
4. **Fix C-6** (tool rename strategy) — decide alias vs dual-name-from-start
177+
5. **Fix C-4** (protocol version) — verify against actual exec_server
178+
6. **Fix C-7 + C-8** (method name + timeout) — straightforward standardization
179+
7. **Fix S-1** (Phase 4 execution strategy) — before Ralph starts
180+
8. **Apply all MAJOR fixes** — batch update to specs, PRD tasks, and prd.json
181+
9. **Re-run alignment audit** after fixes
182+
183+
---
184+
185+
## Verification
186+
187+
After applying fixes:
188+
- `grep -r "base_url\|url=" .claude/specs/ tasks/` — confirm constructor param standardized
189+
- `grep -r "AsyncClient\|Client" .claude/specs/ tasks/` — confirm sync/async standardized
190+
- `grep -r "mcp_sandbox_unreachable" .claude/specs/ tasks/` — confirm SSE format standardized
191+
- `grep -r "2024-11-05\|2025-03-26" .claude/specs/ tasks/` — confirm protocol version standardized
192+
- `grep -r "_ensure_initialized\|_ensure_session" .claude/specs/ tasks/` — confirm method name standardized
193+
- Cross-check Ralph prd.json ACs against updated specs for consistency

0 commit comments

Comments
 (0)