Skip to content

Commit a36dac5

Browse files
LeftTwixWandclaude
andcommitted
docs: add deploy gap fix plan — full stop→build→start in /deploy endpoint
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent d8e3d02 commit a36dac5

1 file changed

Lines changed: 241 additions & 0 deletions

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
# Deploy Gap Fix — Full Stop→Build→Start in /deploy Endpoint
2+
3+
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4+
5+
**Goal:** Close the deployment gap so IAW's self-improvement loop can write code, build, and deploy autonomously — the `/deploy` endpoint handles the full stop→build→start sequence from the MCP process.
6+
7+
**Architecture:** The `/deploy` endpoint (in MCP server, separate process) becomes the deployment orchestrator. It uses the Aspire dashboard's resource service gRPC API to stop/start the assistant. The Aspire agent's `DeployAsync` fires an HTTP POST to `/deploy` and accepts it'll die when the assistant stops. After the build succeeds, `/deploy` starts the assistant. If build fails, it reverts via git and starts with old code.
8+
9+
**Tech Stack:** ASP.NET Core, Aspire resource service API, `System.Diagnostics.Process`
10+
11+
---
12+
13+
## Root Cause Analysis
14+
15+
The current flow is broken:
16+
17+
```
18+
AspireAgent.DeployAsync():
19+
1. RestartResourceAsync("assistant") ← WRONG: does stop+START (assistant restarts before build)
20+
2. POST /deploy (build) ← TOO LATE: assistant already restarted with old code
21+
3. Start assistant ← REDUNDANT: already started in step 1
22+
```
23+
24+
The correct flow:
25+
26+
```
27+
AspireAgent.DeployAsync():
28+
1. POST http://localhost:5300/deploy (fire-and-forget, accept death)
29+
30+
/deploy endpoint (in MCP process):
31+
1. Stop assistant via Aspire resource API
32+
2. Wait 5s for DLLs to unlock
33+
3. dotnet build src/IAW.Assistant/IAW.Assistant.csproj
34+
4. If build OK → Start assistant (fresh binary)
35+
5. If build FAIL → git checkout -- . → Start assistant (old code)
36+
```
37+
38+
## DLL Lock Analysis
39+
40+
When building `src/IAW.Assistant/IAW.Assistant.csproj`:
41+
- The build outputs `Agents.dll` to `src/Agents/bin/Debug/net11.0/`
42+
- MCP locks `src/IAW.MCP/bin/Debug/net11.0/Agents.dll` (ITS copy, not the source)
43+
- Telegram locks `src/Clients.Telegram/bin/Debug/net11.0/win-x64/Agents.dll` (ITS copy)
44+
- Assistant is STOPPED → `src/IAW.Assistant/bin/` is unlocked
45+
46+
The build should succeed because it only copies TO the assistant's bin directory. The source output (`src/Agents/bin/`) is NOT locked by other processes — they have their own copies.
47+
48+
If the build still fails due to transitive locks, fallback: `dotnet build --no-dependencies src/IAW.Assistant/IAW.Assistant.csproj` or use `--artifacts-path E:\IAW\.deploy-artifacts`.
49+
50+
---
51+
52+
## File Map
53+
54+
| File | Action | Change |
55+
|------|--------|--------|
56+
| `src/IAW.MCP/Deploy/DeployEndpoint.cs` | Modify | Add Aspire resource stop/start via HTTP to dashboard API |
57+
| `src/Agents/Infrastructure/AspireAgent.cs` | Modify | DeployAsync becomes fire-and-forget HTTP POST |
58+
| `test/Core.Tests/Scheduling/DeployVerifyJobTests.cs` | Modify | Add test for revert-on-failure path |
59+
60+
---
61+
62+
### Task 1: Make /deploy endpoint handle full stop→build→start
63+
64+
**Files:**
65+
- Modify: `src/IAW.MCP/Deploy/DeployEndpoint.cs`
66+
67+
The endpoint currently only builds. Change it to:
68+
1. Stop assistant via Aspire resource API
69+
2. Wait for DLLs to unlock
70+
3. Build assistant project
71+
4. Start assistant (or revert + start on failure)
72+
73+
The Aspire dashboard exposes a resource service at its OTLP endpoint. But the simplest approach: use the `aspire` CLI tool to execute resource commands, since it's already available.
74+
75+
- [ ] **Step 1: Update deploy endpoint**
76+
77+
Replace the handler to include stop/start:
78+
79+
```csharp
80+
app.MapPost("/deploy", async (ILogger<Program> logger, CancellationToken ct) =>
81+
{
82+
logger.LogInformation("Deploy: starting full stop→build→start sequence");
83+
var iawRoot = FindIawRoot();
84+
if (iawRoot is null)
85+
return Results.Problem("Could not find IAW root directory");
86+
87+
try
88+
{
89+
// Step 1: Stop assistant to release DLL locks
90+
logger.LogInformation("Deploy: stopping assistant resource");
91+
var appHostPath = Path.Combine(iawRoot, "src", "IAW.AppHost");
92+
await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-stop",
93+
appHostPath, ct);
94+
await Task.Delay(5000, ct); // Wait for process to die and DLLs to unlock
95+
96+
// Step 2: Build
97+
logger.LogInformation("Deploy: building assistant project");
98+
var (exitCode, output, error) = await RunProcessAsync(
99+
"dotnet", "build src/IAW.Assistant/IAW.Assistant.csproj", iawRoot, ct);
100+
101+
var fullOutput = output + "\n" + error;
102+
var verification = DeployVerifier.VerifyBuildOutput(fullOutput);
103+
104+
if (!verification.Success)
105+
{
106+
logger.LogError("Deploy: build FAILED. Reverting and starting old code.");
107+
await RunProcessAsync("git", "checkout -- .", iawRoot, ct);
108+
}
109+
110+
// Step 3: Start assistant (fresh binary if build succeeded, old code if reverted)
111+
logger.LogInformation("Deploy: starting assistant resource");
112+
await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-start",
113+
appHostPath, ct);
114+
115+
if (!verification.Success)
116+
{
117+
return Results.Json(new { success = false, action = "reverted", errors = verification.Errors,
118+
output = fullOutput.Length > 2000 ? fullOutput[..2000] : fullOutput });
119+
}
120+
121+
return Results.Json(new { success = true, action = "deployed", errors = 0 });
122+
}
123+
catch (Exception ex)
124+
{
125+
logger.LogError(ex, "Deploy: sequence failed");
126+
// Try to start assistant even on error (recovery)
127+
try
128+
{
129+
var appHostPath = Path.Combine(iawRoot, "src", "IAW.AppHost");
130+
await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-start",
131+
appHostPath, CancellationToken.None);
132+
}
133+
catch { /* best effort */ }
134+
return Results.Problem($"Deploy failed: {ex.Message}");
135+
}
136+
});
137+
```
138+
139+
NOTE: The `aspire mcp run` command syntax may differ. If it doesn't work, fall back to calling the Aspire dashboard gRPC API directly, or use `curl` to the Aspire resource service endpoint (available via `ASPIRE_RESOURCE_SERVICE_ENDPOINT_URL` env var).
140+
141+
Alternative if `aspire mcp run` doesn't work for resource commands: the MCP server already has access to the Aspire dashboard URL. Use the Aspire resource service gRPC endpoint:
142+
143+
```csharp
144+
// Alternative: use Aspire resource service directly
145+
var aspireEndpoint = Environment.GetEnvironmentVariable("ASPIRE_RESOURCE_SERVICE_ENDPOINT_URL");
146+
// Call gRPC to stop/start
147+
```
148+
149+
Or simplest fallback: just build and let Aspire handle the start. The caller (Aspire agent) already stopped the assistant before dying.
150+
151+
- [ ] **Step 2: Build and verify**
152+
153+
Run: `dotnet build src/IAW.MCP`
154+
Expected: 0 errors.
155+
156+
- [ ] **Step 3: Commit**
157+
158+
```bash
159+
git add src/IAW.MCP/Deploy/DeployEndpoint.cs
160+
git commit -m "fix: /deploy endpoint handles full stop→build→start sequence"
161+
```
162+
163+
---
164+
165+
### Task 2: Fix AspireAgent.DeployAsync — fire and forget
166+
167+
**Files:**
168+
- Modify: `src/Agents/Infrastructure/AspireAgent.cs`
169+
170+
Current DeployAsync calls RestartResourceAsync (stop+start) THEN /deploy. This is wrong — the assistant restarts before the build. Fix: just POST to /deploy (which now handles stop+build+start) and accept that the agent will die.
171+
172+
- [ ] **Step 1: Simplify DeployAsync**
173+
174+
Replace the current implementation:
175+
176+
```csharp
177+
public async Task<string> DeployAsync(CancellationToken ct = default)
178+
{
179+
logger.LogInformation("Deploy: firing deploy request to MCP endpoint");
180+
181+
try
182+
{
183+
// Fire the deploy request — the MCP endpoint handles stop→build→start
184+
// This agent will die when the assistant stops, so we don't await the full response
185+
using var httpClient = httpClientFactory.CreateClient();
186+
httpClient.Timeout = TimeSpan.FromSeconds(10); // Short timeout — we'll die before it completes
187+
188+
_ = httpClient.PostAsync("http://localhost:5300/deploy", null, CancellationToken.None);
189+
190+
// Give the HTTP request time to reach MCP before we die
191+
await Task.Delay(2000, ct);
192+
193+
return "Deploy initiated. Assistant will restart with fresh binary.";
194+
}
195+
catch (Exception ex)
196+
{
197+
logger.LogError(ex, "Deploy: failed to initiate");
198+
return $"Deploy initiation failed: {ex.Message}";
199+
}
200+
}
201+
```
202+
203+
Key change: we fire the POST and DON'T await the full response. The MCP endpoint runs independently. We give it 2 seconds to receive the request, then return. The assistant will be stopped by the MCP endpoint shortly after.
204+
205+
- [ ] **Step 2: Build and test**
206+
207+
Run: `dotnet build src/Agents && dotnet test test/Core.Tests -v minimal`
208+
Expected: 0 errors, all tests pass.
209+
210+
- [ ] **Step 3: Commit**
211+
212+
```bash
213+
git add src/Agents/Infrastructure/AspireAgent.cs
214+
git commit -m "fix: DeployAsync fires POST to /deploy and accepts death — no more stop+start before build"
215+
```
216+
217+
---
218+
219+
### Task 3: End-to-end test — full closed loop
220+
221+
**Files:** None (testing via MCP)
222+
223+
- [ ] **Step 1: Kill all processes, clean EmojiAgent, rebuild, start Aspire**
224+
225+
- [ ] **Step 2: Ask IAW to create EmojiAgent**
226+
227+
Send: "Create EmojiAgent at E:\IAW\src\Agents\Fun/ then deploy via Aspire Deploy"
228+
229+
- [ ] **Step 3: Verify via traces**
230+
231+
Check:
232+
- FileSystem wrote files
233+
- DotNet built successfully
234+
- Aspire Deploy was called
235+
- /deploy endpoint stopped assistant, built, started
236+
- Assistant came back with EmojiAgent registered
237+
238+
- [ ] **Step 4: Test emoji agent**
239+
240+
Send: "Call SendToAgent Emoji: I love coffee"
241+
Expected: emoji response from IAW-created agent

0 commit comments

Comments
 (0)