|
| 1 | +# Requirements: live run streaming for `agent run get --watch` |
| 2 | + |
| 3 | +**Status:** proposed — not yet implemented. |
| 4 | +**Audience:** the engineer/agent implementing WebSocket streaming, server + client. |
| 5 | + |
| 6 | +## 1. Background |
| 7 | + |
| 8 | +The CLI can start runs and read their state over the public `/v1` REST API, but |
| 9 | +it cannot stream a run's output. `agent run get --watch` exists today and gives a |
| 10 | +**status-level** live view by polling `GET /v1/agents/runs/{id}` until the run |
| 11 | +reaches a terminal status (`completed`/`error`/`cancelled`/`stopped`). It shows |
| 12 | +status transitions and the final summary — not the step-by-step stdout/stderr or |
| 13 | +token stream. |
| 14 | + |
| 15 | +Token-level streaming over `/v1` is explicitly deferred in the backend design |
| 16 | +doc (`documents/eng/ELLIPSIS_API_AND_CLI.md` §7, "Deferred → Streaming run steps |
| 17 | +through `/v1` for CLI clients"). This spec defines the work to close that gap. |
| 18 | + |
| 19 | +Scaffolding already in this repo (currently unused, kept as a starting point): |
| 20 | +- `src/lib/ws.ts` — a `streamRun()` WebSocket client and a `StreamFrame` type |
| 21 | + (`stdout`/`stderr`/`status`/`done`/`error`). It connects to |
| 22 | + `${wsBase}/v1/runs/{id}/stream` with a bearer token. No reconnect/resume/heartbeat. |
| 23 | +- `src/ui/RunView.tsx` — an Ink component that renders frames from `streamRun()`. |
| 24 | +- `DEFAULT_WS_BASE` in `src/lib/constants.ts` (`wss://api.ellipsis.dev`). |
| 25 | + |
| 26 | +The reported `error: [object ErrorEvent]` came from this scaffolding connecting to |
| 27 | +a non-existent server endpoint; `ws.ts` stringifies the raw `ErrorEvent`. Fix the |
| 28 | +error rendering as part of this work (surface `err.message`). |
| 29 | + |
| 30 | +## 2. Goal |
| 31 | + |
| 32 | +`agent run get <id> --watch` streams a run's output live, in real time, and |
| 33 | +falls back to REST polling when streaming is unavailable. The same flag covers |
| 34 | +both modes — no new top-level command. |
| 35 | + |
| 36 | +## 3. Server-side requirements (`/v1`) |
| 37 | + |
| 38 | +1. **Endpoint:** `GET /v1/runs/{run_id}/stream`, upgraded to WebSocket. |
| 39 | +2. **Auth:** `Authorization: Bearer <token>`, resolved by the same `V1Auth` |
| 40 | + path as the REST API (user/API/sandbox tokens), authorizing the run's |
| 41 | + customer. Reject with a close code on auth failure (see §5). |
| 42 | +3. **Frame protocol (server → client), one JSON object per WS message:** |
| 43 | + - `{ "type": "status", "status": "<AgentRunStatus>", "ts": "<iso8601>" }` |
| 44 | + - `{ "type": "stdout", "data": "<chunk>", "seq": <int>, "ts": "<iso8601>" }` |
| 45 | + - `{ "type": "stderr", "data": "<chunk>", "seq": <int>, "ts": "<iso8601>" }` |
| 46 | + - `{ "type": "done", "status": "<terminal status>", "exit_status": "<...>" }` |
| 47 | + - `{ "type": "error", "message": "<human-readable>" }` |
| 48 | + - `seq` is a monotonic per-run cursor used for resume. |
| 49 | +4. **Backfill + resume:** accept `?after_seq=<int>` (query or first client |
| 50 | + message). On connect, replay buffered frames with `seq > after_seq`, then |
| 51 | + stream live. This makes reconnects lossless. |
| 52 | +5. **Heartbeat:** server sends WS ping (or a `status` keepalive) at a fixed |
| 53 | + interval (e.g. 20s) so dead connections are detectable. |
| 54 | +6. **Termination:** send a final `done` frame, then close with a normal code. |
| 55 | + For an already-terminal run, replay buffered output then `done` immediately. |
| 56 | +7. **Retention:** define how long run output is buffered for backfill (at least |
| 57 | + the run's lifetime + a grace window). Document the limit. |
| 58 | + |
| 59 | +## 4. Client-side requirements (this repo) |
| 60 | + |
| 61 | +1. `agent run get <id> --watch` connects to the stream and renders frames: |
| 62 | + `stdout`/`stderr` as output, `status` as transition lines, `done`/`error` |
| 63 | + to finish. Exit 0 on `done` with a successful terminal status, non-zero on |
| 64 | + `error` or a failed terminal status. |
| 65 | +2. **Reconnect with backoff** and resume from the last seen `seq` via |
| 66 | + `after_seq`, so a dropped socket doesn't lose or duplicate output. |
| 67 | +3. **Fallback:** if the WebSocket can't connect (e.g. server without streaming, |
| 68 | + or a `1003`/unsupported close), fall back to the existing REST polling |
| 69 | + `watchRun()` automatically, with a one-line notice. `--watch` must keep |
| 70 | + working against a backend that lacks the endpoint. |
| 71 | +4. **Heartbeat:** respond to/expect pings; treat a missed heartbeat as a dropped |
| 72 | + connection and reconnect. |
| 73 | +5. `--json` with `--watch`: emit one JSON object per frame (NDJSON) for piping. |
| 74 | +6. Fix `ws.ts` error handling to surface a readable message, not |
| 75 | + `[object ErrorEvent]`. |
| 76 | + |
| 77 | +## 5. WebSocket close codes (suggested) |
| 78 | + |
| 79 | +| Code | Meaning | |
| 80 | +|------|---------| |
| 81 | +| 1000 | normal — run reached a terminal state | |
| 82 | +| 1008 | auth failed / not authorized for this run | |
| 83 | +| 1003 | streaming unsupported (client should fall back to polling) | |
| 84 | +| 1011 | server error | |
| 85 | + |
| 86 | +## 6. Acceptance criteria |
| 87 | + |
| 88 | +- Streaming a live run shows stdout/stderr in near real time end to end. |
| 89 | +- Killing the socket mid-run and reconnecting resumes with no lost or duplicated |
| 90 | + frames (verified via `seq`/`after_seq`). |
| 91 | +- `--watch` against a backend without the endpoint transparently falls back to |
| 92 | + REST polling and still completes. |
| 93 | +- `--json --watch` emits valid NDJSON, one frame per line. |
| 94 | +- Unit tests for the client frame handler, reconnect/resume cursor, and fallback |
| 95 | + trigger (mirror the fake-timer style in `test/auth.test.ts` / `test/run.test.ts`). |
| 96 | +- No `[object ErrorEvent]`; connection errors print a real message. |
| 97 | + |
| 98 | +## 7. Out of scope |
| 99 | + |
| 100 | +- Bidirectional control over the stream (stop/input). `run stop` is tracked |
| 101 | + separately and also has no `/v1` endpoint yet. |
| 102 | +- Multiplexing multiple runs over one socket. |
0 commit comments