Skip to content

Commit 60cdf88

Browse files
committed
fix: harden local dev stack flow
1 parent 56db72a commit 60cdf88

File tree

12 files changed

+202
-54
lines changed

12 files changed

+202
-54
lines changed

AGENTS.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,11 @@ All commands use pnpm. Target a single app with `pnpm --filter <package>`.
2424

2525
```bash
2626
pnpm install # Install
27+
pnpm --filter @nexu/shared build # Build shared dist required by cold-start dev flows
28+
pnpm dev start # Start the lightweight local stack: openclaw -> controller -> web -> desktop
2729
pnpm dev start <service> # Start one local-dev service: desktop|openclaw|controller|web
30+
pnpm dev restart # Restart the lightweight local stack
31+
pnpm dev stop # Stop the lightweight local stack in reverse order
2832
pnpm dev stop <service> # Stop one local-dev service
2933
pnpm dev restart <service> # Restart one local-dev service
3034
pnpm dev status <service> # Show status for one local-dev service
@@ -67,9 +71,13 @@ This repo is desktop-first. Prefer the controller-first path and remove or ignor
6771

6872
## Desktop local development
6973

70-
- For script-managed local development, use explicit per-service commands only: `pnpm dev start <desktop|openclaw|controller|web>`, `pnpm dev stop <service>`, `pnpm dev restart <service>`, `pnpm dev status <service>`, and `pnpm dev logs <service>`.
71-
- `pnpm dev` has no implicit aggregate default and intentionally does not support `all`; start each service deliberately in dependency order when you want the full local stack: `openclaw` -> `controller` -> `web` -> `desktop`.
74+
- Minimal cold-start setup on a fresh machine is: `pnpm install` -> `pnpm --filter @nexu/shared build` -> copy `scripts/dev/.env.example` to `scripts/dev/.env` only if you need dev-only overrides.
75+
- Default daily flow is: `pnpm dev start` -> `pnpm dev status <service>` / `pnpm dev logs <service>` as needed -> `pnpm dev stop`.
76+
- Use `pnpm dev restart` for a clean full-stack recycle; use `pnpm dev restart <service>` only when you are intentionally touching one service.
77+
- Explicit single-service control remains available through `pnpm dev start <desktop|openclaw|controller|web>`, `pnpm dev stop <service>`, `pnpm dev restart <service>`, `pnpm dev status <service>`, and `pnpm dev logs <service>`.
78+
- `pnpm dev` intentionally does not support `all`; the full local stack order remains `openclaw` -> `controller` -> `web` -> `desktop`.
7279
- `pnpm dev logs <service>` is session-scoped, prints a fixed header, and tails at most the last 200 lines from the active service session.
80+
- `scripts/dev/.env.example` is the source-of-truth template for dev-only overrides. Copy it to `scripts/dev/.env` only when you need to override ports, URLs, state paths, or the shared OpenClaw gateway token for local development.
7381
- Keep the detailed startup optimization rules, cache invalidation behavior, and troubleshooting notes in `specs/guides/desktop-runtime-guide.md`; keep only the core workflow expectations here.
7482
- The repo also includes a local Slack reply smoke probe at `scripts/probe/slack-reply-probe.mjs` (`pnpm probe:slack prepare` / `pnpm probe:slack run`) for verifying the end-to-end Slack DM reply path after local runtime or OpenClaw changes.
7583
- The Slack smoke probe is not zero-setup: install Chrome Canary first, then manually log into Slack in the opened Canary window before running `pnpm probe:slack run`.
@@ -252,11 +260,12 @@ This note should track:
252260
## Local quick reference
253261

254262
- Controller env path: `apps/controller/.env`
263+
- Fresh local-dev cold start: `pnpm install` -> `pnpm --filter @nexu/shared build` -> optional `copy scripts/dev/.env.example scripts/dev/.env` (Windows) or `cp scripts/dev/.env.example scripts/dev/.env` (POSIX) -> `pnpm dev start`
264+
- Daily local-dev flow: `pnpm dev start` -> `pnpm dev logs <service>` / `pnpm dev status <service>` when needed -> `pnpm dev restart` for a clean recycle -> `pnpm dev stop`
255265
- OpenClaw managed skills dir (expected default): `~/.openclaw/skills/`
256266
- Slack smoke probe setup: install Chrome Canary, set `PROBE_SLACK_URL`, run `pnpm probe:slack prepare`, then manually log into Slack in Canary before `pnpm probe:slack run`
257267
- `openclaw-runtime` is installed implicitly by `pnpm install`; local development should normally not use a global `openclaw` CLI
258-
- Local service startup order for the script-managed dev path: `pnpm dev start openclaw` -> `pnpm dev start controller` -> `pnpm dev start web` -> `pnpm dev start desktop`
259-
- Local service shutdown order for the script-managed dev path: `pnpm dev stop desktop` -> `pnpm dev stop web` -> `pnpm dev stop controller` -> `pnpm dev stop openclaw`
268+
- Full-stack startup order is `openclaw` -> `controller` -> `web` -> `desktop`; shutdown order is the reverse
260269
- Prefer `./openclaw-wrapper` over global `openclaw` in local development; it executes `openclaw-runtime/node_modules/openclaw/openclaw.mjs`
261270
- When OpenClaw is started manually, set `RUNTIME_MANAGE_OPENCLAW_PROCESS=false` for `@nexu/controller` to avoid launching a second OpenClaw process
262271
- If behavior differs, verify effective `OPENCLAW_STATE_DIR` / `OPENCLAW_CONFIG_PATH` used by the running controller process.

packages/dev-utils/src/index.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ export { waitFor } from "./conditions.js";
22
export {
33
createNodeOptions,
44
getListeningPortPid,
5+
isProcessRunning,
56
terminateProcess,
67
waitForChildExit,
78
waitForListeningPortPid,

packages/dev-utils/src/process.ts

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ export function createNodeOptions(): string {
1313
}
1414

1515
export async function terminateProcess(pid: number): Promise<void> {
16+
if (!isProcessRunning(pid)) {
17+
return;
18+
}
19+
1620
if (process.platform === "win32") {
1721
await new Promise<void>((resolve, reject) => {
1822
const child = spawn("taskkill", ["/PID", String(pid), "/T", "/F"], {
@@ -22,7 +26,7 @@ export async function terminateProcess(pid: number): Promise<void> {
2226

2327
child.once("error", reject);
2428
child.once("exit", (code) => {
25-
if (code === 0) {
29+
if (code === 0 || !isProcessRunning(pid)) {
2630
resolve();
2731
return;
2832
}
@@ -42,6 +46,15 @@ export async function terminateProcess(pid: number): Promise<void> {
4246
}
4347
}
4448

49+
export function isProcessRunning(pid: number): boolean {
50+
try {
51+
process.kill(pid, 0);
52+
return true;
53+
} catch {
54+
return false;
55+
}
56+
}
57+
4558
export async function waitForProcessStart(
4659
child: ChildProcess,
4760
processName: string,
@@ -140,8 +153,41 @@ export async function getListeningPortPid(
140153
export async function waitForListeningPortPid(
141154
port: number,
142155
serviceName: string,
143-
options: { attempts: number; delayMs?: number },
156+
options: {
157+
attempts: number;
158+
delayMs?: number;
159+
supervisorPid?: number;
160+
supervisorName?: string;
161+
},
144162
): Promise<number> {
163+
const supervisorLabel = options.supervisorName ?? `${serviceName} supervisor`;
164+
165+
if (options.supervisorPid) {
166+
for (let index = 0; index < options.attempts; index += 1) {
167+
try {
168+
return await getListeningPortPid(port, serviceName);
169+
} catch {}
170+
171+
if (!isProcessRunning(options.supervisorPid)) {
172+
throw new Error(
173+
`${supervisorLabel} exited before opening port ${port}`,
174+
);
175+
}
176+
177+
if (index < options.attempts - 1) {
178+
await new Promise((resolve) =>
179+
setTimeout(resolve, options.delayMs ?? 250),
180+
);
181+
}
182+
}
183+
184+
if (!isProcessRunning(options.supervisorPid)) {
185+
throw new Error(`${supervisorLabel} exited before opening port ${port}`);
186+
}
187+
188+
throw new Error(`${serviceName} did not open port ${port}`);
189+
}
190+
145191
return waitFor(
146192
() => getListeningPortPid(port, serviceName),
147193
() => new Error(`${serviceName} did not open port ${port}`),

scripts/dev/AGENTS.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,24 @@ This file captures local guidance for the `scripts/dev` CLI surface.
2525
## Command surface
2626

2727
- Keep the command surface small and intentional.
28-
- Preferred commands are explicit single-service commands: `pnpm dev start <desktop|openclaw|controller|web>`, `pnpm dev restart <service>`, `pnpm dev stop <service>`, `pnpm dev status <service>`, and `pnpm dev logs <service>`.
29-
- Do not reintroduce implicit aggregate defaults such as bare `pnpm dev start` or an `all` target.
28+
- Fresh-machine cold start is: `pnpm install` -> `pnpm --filter @nexu/shared build` -> optional `copy scripts/dev/.env.example scripts/dev/.env` (Windows) or `cp scripts/dev/.env.example scripts/dev/.env` (POSIX).
29+
- Daily full-stack flow is: `pnpm dev start` -> work -> `pnpm dev restart` when you need a clean full restart -> `pnpm dev stop` when done.
30+
- Bare `pnpm dev start` runs the lightweight full local stack in dependency order: `openclaw` -> `controller` -> `web` -> `desktop`.
31+
- Bare `pnpm dev restart` restarts that stack by stopping in reverse order and starting again in dependency order.
32+
- Bare `pnpm dev stop` stops that stack in reverse order: `desktop` -> `web` -> `controller` -> `openclaw`.
33+
- Explicit single-service control remains available: `pnpm dev start <desktop|openclaw|controller|web>`, `pnpm dev restart <service>`, `pnpm dev stop <service>`, `pnpm dev status <service>`, and `pnpm dev logs <service>`.
34+
- Do not reintroduce an `all` target or any other alias target name.
3035
- Validate behavior through the real command surface instead of temporary harness scripts.
3136
- Acceptance must be run from the repo root through `pnpm dev ...`, not by invoking `scripts/dev` internals directly.
32-
- The default end-to-end acceptance chain is: `pnpm dev start openclaw` -> `pnpm dev logs openclaw` -> `pnpm dev start controller` -> `pnpm dev logs controller` -> `pnpm dev start web` -> `pnpm dev logs web` -> `pnpm dev start desktop` -> `pnpm dev logs desktop` -> stop each service explicitly.
37+
- The focused acceptance chain is: `pnpm dev start` -> `pnpm dev status <service>` / `pnpm dev logs <service>` as needed -> `pnpm dev stop`.
3338

3439
## Runtime model
3540

3641
- Root entrypoint stays `pnpm dev ...`.
3742
- The CLI executes through `pnpm --dir ./scripts/dev exec tsx ./src/index.ts`.
3843
- `scripts/dev` may use its own `tsconfig.json` features such as `paths`.
44+
- `scripts/dev/.env.example` is the source-of-truth template for dev-only overrides. Only create `scripts/dev/.env` when you need local overrides for ports, URLs, state paths, config path, log dir, or the shared OpenClaw gateway token.
45+
- Keep the repo-level pnpm build-script allowlist tight. Do not add Windows-only packaging tools such as `electron-winstaller` unless the team explicitly wants that behavior on every machine.
3946
- Logs should live under `.tmp/dev/logs/<run_id>/...`.
4047
- `pnpm dev logs <service>` should resolve the active session only, prepend a fixed metadata header, and tail at most 200 lines by default.
4148
- Lightweight state should use per-service pid locks under `.tmp/dev/*.pid`.
@@ -53,10 +60,11 @@ This file captures local guidance for the `scripts/dev` CLI surface.
5360

5461
## FAQ
5562

56-
- Q: `pnpm dev stop <service>` fails because that side is already down. A: This is currently acceptable. Read the matching `.tmp/dev/*.pid`, kill the remaining supervisor manually if needed, remove stale pid locks, then rerun `pnpm dev start <service>` or `pnpm dev restart <service>`.
57-
- Q: `pnpm dev status <service>` shows `stale`. A: The pid lock still exists but the supervisor pid is no longer alive. Remove the stale `.tmp/dev/*.pid` file and start that service again.
63+
- Q: A service will not start. A: Start with `pnpm dev status <service>` and `pnpm dev logs <service>`. If the error says a dependency is missing, start that dependency first; if it says a port is busy, kill the listener and retry.
64+
- Q: `pnpm dev status <service>` shows `stale`. A: The supervisor pid is gone but the lock survived. Prefer `pnpm dev stop <service>` first; if the lock still remains, remove the matching `.tmp/dev/*.pid` file and start again.
5865
- Q: `pnpm dev logs web` shows `Port 5173 is already in use`. A: A stale Vite process from an earlier experiment is still listening. Kill the listener on `5173`, remove `web.pid` if present, and restart the dev flow.
5966
- Q: Which pid is stored in each `.tmp/dev/*.pid` file? A: The pid lock stores the supervisor pid, not the transient worker/listener pid. Worker/listener pids are resolved at runtime via snapshots.
60-
- Q: Where should logs be inspected first? A: Start with `pnpm dev logs <service>` for the active session. If that is not enough, inspect the backing file under `.tmp/dev/logs/<run_id>/...` or `.tmp/logs/desktop-dev.log` for desktop.
67+
- Q: Where should logs be inspected first? A: Start with `pnpm dev logs <service>` for the active session. If that is not enough, inspect the backing file under `.tmp/dev/logs/<run_id>/...`.
6168
- Q: How do I correlate a leaked or suspicious process to a specific dev run? A: Start with `sessionId` from `pnpm dev status <service>` or `.tmp/dev/*.pid`, then search process command lines for `--nexu-dev-session=<sessionId>` and `--nexu-dev-service=<service>`.
62-
- Q: What is the expected worst-case recovery path? A: Kill the known listener/supervisor pid for the affected service, remove the stale `.tmp/dev/*.pid` file, rerun `pnpm dev start <service>`, and if the local environment is still inconsistent, reboot the machine to clear any orphaned OS-level process state.
69+
- Q: `pnpm install` warns that `electron-winstaller` build scripts were ignored. A: Keep it out of the shared repo allowlist unless Windows packaging support is intentionally being enabled for the whole team. Use per-machine approval when only one Windows environment needs it.
70+
- Q: What is the expected worst-case recovery path? A: Run `pnpm dev stop`, kill any leftover listener/supervisor pid for the affected service, remove stale `.tmp/dev/*.pid` files, then run `pnpm dev start` again.

scripts/dev/src/index.ts

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,25 @@ function readTargetOrThrow(target: string | undefined): DevTarget {
5252
return target as DevTarget;
5353
}
5454

55+
async function startDefaultStack(): Promise<void> {
56+
await startTarget("openclaw", createDevSessionId());
57+
await startTarget("controller", createDevSessionId());
58+
await startTarget("web", createDevSessionId());
59+
await startTarget("desktop", createDevSessionId());
60+
}
61+
62+
async function stopDefaultStack(): Promise<void> {
63+
await stopTarget("desktop");
64+
await stopTarget("web");
65+
await stopTarget("controller");
66+
await stopTarget("openclaw");
67+
}
68+
69+
async function restartDefaultStack(): Promise<void> {
70+
await stopDefaultStack();
71+
await startDefaultStack();
72+
}
73+
5574
async function startTarget(
5675
target: DevTarget,
5776
sessionId: string,
@@ -307,6 +326,11 @@ function printLogHeader(logFilePath: string, totalLineCount: number): void {
307326
cli
308327
.command("start [target]", "Start one local dev service")
309328
.action(async (target?: string) => {
329+
if (!target) {
330+
await startDefaultStack();
331+
return;
332+
}
333+
310334
const resolvedTarget = readTargetOrThrow(target);
311335
const sessionId = createDevSessionId();
312336
await startTarget(resolvedTarget, sessionId);
@@ -315,6 +339,11 @@ cli
315339
cli
316340
.command("restart [target]", "Restart one local dev service")
317341
.action(async (target?: string) => {
342+
if (!target) {
343+
await restartDefaultStack();
344+
return;
345+
}
346+
318347
const resolvedTarget = readTargetOrThrow(target);
319348
const sessionId = createDevSessionId();
320349
await restartTarget(resolvedTarget, sessionId);
@@ -323,6 +352,11 @@ cli
323352
cli
324353
.command("stop [target]", "Stop one local dev service")
325354
.action(async (target?: string) => {
355+
if (!target) {
356+
await stopDefaultStack();
357+
return;
358+
}
359+
326360
const resolvedTarget = readTargetOrThrow(target);
327361
await stopTarget(resolvedTarget);
328362
});

scripts/dev/src/services/controller.ts

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import {
2525
getControllerDevLogPath,
2626
} from "../shared/paths.js";
2727
import { createDevMarkerArgs } from "../shared/trace.js";
28+
import { getCurrentOpenclawDevSnapshot } from "./openclaw.js";
2829

2930
export type ControllerDevSnapshot = {
3031
service: "controller";
@@ -74,9 +75,22 @@ async function waitForControllerPortPid(): Promise<number> {
7475
);
7576
}
7677

78+
async function ensureOpenclawReadyForController(): Promise<void> {
79+
const openclawSnapshot = await getCurrentOpenclawDevSnapshot();
80+
81+
ensure(openclawSnapshot.status === "running").orThrow(
82+
() =>
83+
new Error(
84+
"openclaw is not running; start it with `pnpm dev start openclaw` before starting controller",
85+
),
86+
);
87+
}
88+
7789
export async function startControllerDevProcess(options: {
7890
sessionId: string;
7991
}): Promise<ControllerDevSnapshot> {
92+
await ensureOpenclawReadyForController();
93+
8094
const existingSnapshot = await getCurrentControllerDevSnapshot();
8195

8296
ensure(existingSnapshot.status !== "running").orThrow(
@@ -144,12 +158,13 @@ export async function startControllerDevProcess(options: {
144158
export async function stopControllerDevProcess(): Promise<ControllerDevSnapshot> {
145159
const snapshot = await getCurrentControllerDevSnapshot();
146160

147-
ensure(snapshot.status === "running" && Boolean(snapshot.pid)).orThrow(
161+
ensure(snapshot.status !== "stopped").orThrow(
148162
() => new Error("controller dev process is not running"),
149163
);
150-
const supervisorPid = snapshot.pid as number;
151164

152-
await terminateProcess(supervisorPid);
165+
if (snapshot.status === "running" && snapshot.pid) {
166+
await terminateProcess(snapshot.pid);
167+
}
153168

154169
try {
155170
const workerPid = await getControllerPortPid();

0 commit comments

Comments
 (0)