|
| 1 | +# Deploy Gap Fix — Full Stop→Build→Start in /deploy Endpoint |
| 2 | + |
| 3 | +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. |
| 4 | +
|
| 5 | +**Goal:** Close the deployment gap so IAW's self-improvement loop can write code, build, and deploy autonomously — the `/deploy` endpoint handles the full stop→build→start sequence from the MCP process. |
| 6 | + |
| 7 | +**Architecture:** The `/deploy` endpoint (in MCP server, separate process) becomes the deployment orchestrator. It uses the Aspire dashboard's resource service gRPC API to stop/start the assistant. The Aspire agent's `DeployAsync` fires an HTTP POST to `/deploy` and accepts it'll die when the assistant stops. After the build succeeds, `/deploy` starts the assistant. If build fails, it reverts via git and starts with old code. |
| 8 | + |
| 9 | +**Tech Stack:** ASP.NET Core, Aspire resource service API, `System.Diagnostics.Process` |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Root Cause Analysis |
| 14 | + |
| 15 | +The current flow is broken: |
| 16 | + |
| 17 | +``` |
| 18 | +AspireAgent.DeployAsync(): |
| 19 | + 1. RestartResourceAsync("assistant") ← WRONG: does stop+START (assistant restarts before build) |
| 20 | + 2. POST /deploy (build) ← TOO LATE: assistant already restarted with old code |
| 21 | + 3. Start assistant ← REDUNDANT: already started in step 1 |
| 22 | +``` |
| 23 | + |
| 24 | +The correct flow: |
| 25 | + |
| 26 | +``` |
| 27 | +AspireAgent.DeployAsync(): |
| 28 | + 1. POST http://localhost:5300/deploy (fire-and-forget, accept death) |
| 29 | +
|
| 30 | +/deploy endpoint (in MCP process): |
| 31 | + 1. Stop assistant via Aspire resource API |
| 32 | + 2. Wait 5s for DLLs to unlock |
| 33 | + 3. dotnet build src/IAW.Assistant/IAW.Assistant.csproj |
| 34 | + 4. If build OK → Start assistant (fresh binary) |
| 35 | + 5. If build FAIL → git checkout -- . → Start assistant (old code) |
| 36 | +``` |
| 37 | + |
| 38 | +## DLL Lock Analysis |
| 39 | + |
| 40 | +When building `src/IAW.Assistant/IAW.Assistant.csproj`: |
| 41 | +- The build outputs `Agents.dll` to `src/Agents/bin/Debug/net11.0/` |
| 42 | +- MCP locks `src/IAW.MCP/bin/Debug/net11.0/Agents.dll` (ITS copy, not the source) |
| 43 | +- Telegram locks `src/Clients.Telegram/bin/Debug/net11.0/win-x64/Agents.dll` (ITS copy) |
| 44 | +- Assistant is STOPPED → `src/IAW.Assistant/bin/` is unlocked |
| 45 | + |
| 46 | +The build should succeed because it only copies TO the assistant's bin directory. The source output (`src/Agents/bin/`) is NOT locked by other processes — they have their own copies. |
| 47 | + |
| 48 | +If the build still fails due to transitive locks, fallback: `dotnet build --no-dependencies src/IAW.Assistant/IAW.Assistant.csproj` or use `--artifacts-path E:\IAW\.deploy-artifacts`. |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## File Map |
| 53 | + |
| 54 | +| File | Action | Change | |
| 55 | +|------|--------|--------| |
| 56 | +| `src/IAW.MCP/Deploy/DeployEndpoint.cs` | Modify | Add Aspire resource stop/start via HTTP to dashboard API | |
| 57 | +| `src/Agents/Infrastructure/AspireAgent.cs` | Modify | DeployAsync becomes fire-and-forget HTTP POST | |
| 58 | +| `test/Core.Tests/Scheduling/DeployVerifyJobTests.cs` | Modify | Add test for revert-on-failure path | |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +### Task 1: Make /deploy endpoint handle full stop→build→start |
| 63 | + |
| 64 | +**Files:** |
| 65 | +- Modify: `src/IAW.MCP/Deploy/DeployEndpoint.cs` |
| 66 | + |
| 67 | +The endpoint currently only builds. Change it to: |
| 68 | +1. Stop assistant via Aspire resource API |
| 69 | +2. Wait for DLLs to unlock |
| 70 | +3. Build assistant project |
| 71 | +4. Start assistant (or revert + start on failure) |
| 72 | + |
| 73 | +The Aspire dashboard exposes a resource service at its OTLP endpoint. But the simplest approach: use the `aspire` CLI tool to execute resource commands, since it's already available. |
| 74 | + |
| 75 | +- [ ] **Step 1: Update deploy endpoint** |
| 76 | + |
| 77 | +Replace the handler to include stop/start: |
| 78 | + |
| 79 | +```csharp |
| 80 | +app.MapPost("/deploy", async (ILogger<Program> logger, CancellationToken ct) => |
| 81 | +{ |
| 82 | + logger.LogInformation("Deploy: starting full stop→build→start sequence"); |
| 83 | + var iawRoot = FindIawRoot(); |
| 84 | + if (iawRoot is null) |
| 85 | + return Results.Problem("Could not find IAW root directory"); |
| 86 | + |
| 87 | + try |
| 88 | + { |
| 89 | + // Step 1: Stop assistant to release DLL locks |
| 90 | + logger.LogInformation("Deploy: stopping assistant resource"); |
| 91 | + var appHostPath = Path.Combine(iawRoot, "src", "IAW.AppHost"); |
| 92 | + await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-stop", |
| 93 | + appHostPath, ct); |
| 94 | + await Task.Delay(5000, ct); // Wait for process to die and DLLs to unlock |
| 95 | +
|
| 96 | + // Step 2: Build |
| 97 | + logger.LogInformation("Deploy: building assistant project"); |
| 98 | + var (exitCode, output, error) = await RunProcessAsync( |
| 99 | + "dotnet", "build src/IAW.Assistant/IAW.Assistant.csproj", iawRoot, ct); |
| 100 | + |
| 101 | + var fullOutput = output + "\n" + error; |
| 102 | + var verification = DeployVerifier.VerifyBuildOutput(fullOutput); |
| 103 | + |
| 104 | + if (!verification.Success) |
| 105 | + { |
| 106 | + logger.LogError("Deploy: build FAILED. Reverting and starting old code."); |
| 107 | + await RunProcessAsync("git", "checkout -- .", iawRoot, ct); |
| 108 | + } |
| 109 | + |
| 110 | + // Step 3: Start assistant (fresh binary if build succeeded, old code if reverted) |
| 111 | + logger.LogInformation("Deploy: starting assistant resource"); |
| 112 | + await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-start", |
| 113 | + appHostPath, ct); |
| 114 | + |
| 115 | + if (!verification.Success) |
| 116 | + { |
| 117 | + return Results.Json(new { success = false, action = "reverted", errors = verification.Errors, |
| 118 | + output = fullOutput.Length > 2000 ? fullOutput[..2000] : fullOutput }); |
| 119 | + } |
| 120 | + |
| 121 | + return Results.Json(new { success = true, action = "deployed", errors = 0 }); |
| 122 | + } |
| 123 | + catch (Exception ex) |
| 124 | + { |
| 125 | + logger.LogError(ex, "Deploy: sequence failed"); |
| 126 | + // Try to start assistant even on error (recovery) |
| 127 | + try |
| 128 | + { |
| 129 | + var appHostPath = Path.Combine(iawRoot, "src", "IAW.AppHost"); |
| 130 | + await RunProcessAsync("aspire", "mcp run execute_resource_command -- --resourceName assistant --commandName resource-start", |
| 131 | + appHostPath, CancellationToken.None); |
| 132 | + } |
| 133 | + catch { /* best effort */ } |
| 134 | + return Results.Problem($"Deploy failed: {ex.Message}"); |
| 135 | + } |
| 136 | +}); |
| 137 | +``` |
| 138 | + |
| 139 | +NOTE: The `aspire mcp run` command syntax may differ. If it doesn't work, fall back to calling the Aspire dashboard gRPC API directly, or use `curl` to the Aspire resource service endpoint (available via `ASPIRE_RESOURCE_SERVICE_ENDPOINT_URL` env var). |
| 140 | + |
| 141 | +Alternative if `aspire mcp run` doesn't work for resource commands: the MCP server already has access to the Aspire dashboard URL. Use the Aspire resource service gRPC endpoint: |
| 142 | + |
| 143 | +```csharp |
| 144 | +// Alternative: use Aspire resource service directly |
| 145 | +var aspireEndpoint = Environment.GetEnvironmentVariable("ASPIRE_RESOURCE_SERVICE_ENDPOINT_URL"); |
| 146 | +// Call gRPC to stop/start |
| 147 | +``` |
| 148 | + |
| 149 | +Or simplest fallback: just build and let Aspire handle the start. The caller (Aspire agent) already stopped the assistant before dying. |
| 150 | + |
| 151 | +- [ ] **Step 2: Build and verify** |
| 152 | + |
| 153 | +Run: `dotnet build src/IAW.MCP` |
| 154 | +Expected: 0 errors. |
| 155 | + |
| 156 | +- [ ] **Step 3: Commit** |
| 157 | + |
| 158 | +```bash |
| 159 | +git add src/IAW.MCP/Deploy/DeployEndpoint.cs |
| 160 | +git commit -m "fix: /deploy endpoint handles full stop→build→start sequence" |
| 161 | +``` |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +### Task 2: Fix AspireAgent.DeployAsync — fire and forget |
| 166 | + |
| 167 | +**Files:** |
| 168 | +- Modify: `src/Agents/Infrastructure/AspireAgent.cs` |
| 169 | + |
| 170 | +Current DeployAsync calls RestartResourceAsync (stop+start) THEN /deploy. This is wrong — the assistant restarts before the build. Fix: just POST to /deploy (which now handles stop+build+start) and accept that the agent will die. |
| 171 | + |
| 172 | +- [ ] **Step 1: Simplify DeployAsync** |
| 173 | + |
| 174 | +Replace the current implementation: |
| 175 | + |
| 176 | +```csharp |
| 177 | +public async Task<string> DeployAsync(CancellationToken ct = default) |
| 178 | +{ |
| 179 | + logger.LogInformation("Deploy: firing deploy request to MCP endpoint"); |
| 180 | + |
| 181 | + try |
| 182 | + { |
| 183 | + // Fire the deploy request — the MCP endpoint handles stop→build→start |
| 184 | + // This agent will die when the assistant stops, so we don't await the full response |
| 185 | + using var httpClient = httpClientFactory.CreateClient(); |
| 186 | + httpClient.Timeout = TimeSpan.FromSeconds(10); // Short timeout — we'll die before it completes |
| 187 | +
|
| 188 | + _ = httpClient.PostAsync("http://localhost:5300/deploy", null, CancellationToken.None); |
| 189 | + |
| 190 | + // Give the HTTP request time to reach MCP before we die |
| 191 | + await Task.Delay(2000, ct); |
| 192 | + |
| 193 | + return "Deploy initiated. Assistant will restart with fresh binary."; |
| 194 | + } |
| 195 | + catch (Exception ex) |
| 196 | + { |
| 197 | + logger.LogError(ex, "Deploy: failed to initiate"); |
| 198 | + return $"Deploy initiation failed: {ex.Message}"; |
| 199 | + } |
| 200 | +} |
| 201 | +``` |
| 202 | + |
| 203 | +Key change: we fire the POST and DON'T await the full response. The MCP endpoint runs independently. We give it 2 seconds to receive the request, then return. The assistant will be stopped by the MCP endpoint shortly after. |
| 204 | + |
| 205 | +- [ ] **Step 2: Build and test** |
| 206 | + |
| 207 | +Run: `dotnet build src/Agents && dotnet test test/Core.Tests -v minimal` |
| 208 | +Expected: 0 errors, all tests pass. |
| 209 | + |
| 210 | +- [ ] **Step 3: Commit** |
| 211 | + |
| 212 | +```bash |
| 213 | +git add src/Agents/Infrastructure/AspireAgent.cs |
| 214 | +git commit -m "fix: DeployAsync fires POST to /deploy and accepts death — no more stop+start before build" |
| 215 | +``` |
| 216 | + |
| 217 | +--- |
| 218 | + |
| 219 | +### Task 3: End-to-end test — full closed loop |
| 220 | + |
| 221 | +**Files:** None (testing via MCP) |
| 222 | + |
| 223 | +- [ ] **Step 1: Kill all processes, clean EmojiAgent, rebuild, start Aspire** |
| 224 | + |
| 225 | +- [ ] **Step 2: Ask IAW to create EmojiAgent** |
| 226 | + |
| 227 | +Send: "Create EmojiAgent at E:\IAW\src\Agents\Fun/ then deploy via Aspire Deploy" |
| 228 | + |
| 229 | +- [ ] **Step 3: Verify via traces** |
| 230 | + |
| 231 | +Check: |
| 232 | +- FileSystem wrote files |
| 233 | +- DotNet built successfully |
| 234 | +- Aspire Deploy was called |
| 235 | +- /deploy endpoint stopped assistant, built, started |
| 236 | +- Assistant came back with EmojiAgent registered |
| 237 | + |
| 238 | +- [ ] **Step 4: Test emoji agent** |
| 239 | + |
| 240 | +Send: "Call SendToAgent Emoji: I love coffee" |
| 241 | +Expected: emoji response from IAW-created agent |
0 commit comments