nanocoai · chiptoe-svg · May 9, 2026
diff --git a/.claude/skills/add-codex/SKILL.md b/.claude/skills/add-codex/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: add-codex
-description: Use Codex (CLI + AppServer) as the full agent provider — planning, tool orchestration, native compaction, MCP tools, session resume — in place of the Claude Agent SDK. ChatGPT subscription or OPENAI_API_KEY. Per-group via agent_provider. Distinct from using OpenAI as an MCP tool (where Claude remains the planner).
+description: Use Codex (CLI + AppServer) as the full agent provider — planning, tool orchestration, Codex-owned context management, MCP tools, session resume — in place of the Claude Agent SDK. ChatGPT subscription or OPENAI_API_KEY. Per-group via agent_provider. Distinct from using OpenAI as an MCP tool (where Claude remains the planner).
 ---
 
 # Codex agent provider
@@ -9,7 +9,7 @@ NanoClaw runs agents in a long-lived **poll loop** inside the container. The bac
 
 Trunk ships with only the `claude` provider baked in. This skill copies the Codex provider files in from the `providers` branch, wires them into the host and container barrels, updates the Dockerfile to install the Codex CLI, and rebuilds the image.
 
-The Codex provider runs `codex app-server` as a child process and speaks JSON-RPC over stdio. That gives it native session resume, streaming events, MCP tool access, and `thread/compact/start` compaction — same feature bar as the Claude Agent SDK, without the Anthropic-only lock-in.
+The Codex provider runs `codex app-server` as a child process and speaks JSON-RPC over stdio. That gives it native session resume, streaming events, MCP tool access, approvals, and Codex-owned transcript/context management — same feature bar as the Claude Agent SDK, without the Anthropic-only lock-in.
 
 ## Install
 
@@ -79,6 +79,8 @@ RUN --mount=type=cache,target=/root/.cache/pnpm \
 
 Note: **no agent-runner package dependency** — Codex is a CLI binary, not a library. Unlike OpenCode, there's nothing to add to `container/agent-runner/package.json`.
 
+Keep `CODEX_VERSION` pinned to a concrete semver. `codex app-server` is the protocol surface this provider depends on, so upgrades should be deliberate: bump the pin, run the focused provider tests, and smoke-test a real `initialize` -> `thread/start` or `thread/resume` -> `turn/start` cycle before shipping.
+
 ### 5. Build
 
 ```bash
@@ -107,10 +109,11 @@ No `.env` variables required for this mode.
 
 ```env
 OPENAI_API_KEY=sk-...
-CODEX_MODEL=gpt-5.4-mini
+# Optional. If omitted, Codex CLI/app-server uses its configured default.
+CODEX_MODEL=gpt-5.2-codex
 ```
 
-The host forwards both variables into the container. If both subscription (`auth.json`) and `OPENAI_API_KEY` are present, Codex prefers the subscription.
+The host forwards both variables into the container. If both subscription (`auth.json`) and `OPENAI_API_KEY` are present, Codex prefers the subscription. Leave `CODEX_MODEL` unset unless you intentionally want to override the Codex CLI/app-server default for this NanoClaw install.
 
 ### Option C — BYO OpenAI-compatible endpoint (experimental)
 
@@ -138,8 +141,8 @@ Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config
 
 - **Spawn-per-query:** Codex's app-server is spawned fresh per query invocation, matching the OpenCode pattern. No long-lived daemon to keep healthy across sessions.
 - **Per-session `~/.codex` isolation:** each group gets its own copy of the host's `auth.json`. The container can rewrite `config.toml` freely on every wake without touching the host's Codex config.
-- **Native compaction:** kicks in automatically at 40K cumulative input tokens between turns, via `thread/compact/start`. If compaction fails, the provider logs and continues uncompacted — no fatal error.
-- **Approvals:** auto-accepted inside the container (the container is the sandbox; same posture as Claude/OpenCode).
+- **Codex context management:** NanoClaw does not maintain a client-side token threshold or manually call `thread/compact/start`. The app-server owns transcript/context management for Codex threads. If context-limit failures appear in real use, add a notification-driven trigger from app-server token-usage events rather than a hard-coded threshold.
+- **Approvals:** auto-accepted inside the container because the container, user, and explicit mount list are the sandbox. Do not expand mounts, env passthrough, or host credential access without treating it as a security-sensitive change.
 - **Mid-turn input:** Codex turns don't accept mid-turn messages. Follow-up `push()` calls queue and drain between turns, matching the OpenCode pattern. The poll-loop only pushes between turns anyway, so no messages are dropped.
 - **Stale thread recovery:** `isSessionInvalid` matches on stale-thread-ID errors (`thread not found`, `unknown thread`, etc.) so a cold-started app-server can recover cleanly when it sees a stored continuation it no longer has.
 
@@ -149,7 +152,7 @@ Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config
 grep -q "./codex.js" container/agent-runner/src/providers/index.ts && echo "container barrel: OK"
 grep -q "./codex.js" src/providers/index.ts && echo "host barrel: OK"
 grep -q "@openai/codex@" container/Dockerfile && echo "Dockerfile install: OK"
-cd container/agent-runner && bun test src/providers/codex.factory.test.ts && cd -
+cd container/agent-runner && bun test src/providers/codex.factory.test.ts src/providers/codex-app-server.test.ts && cd -
 ```
 
 After the image rebuild, set `agent_provider = 'codex'` on a test group and send a message. Successful round-trip looks like:

@@ -1,6 +1,16 @@
+import fs from 'fs';
+import path from 'path';
+import { fileURLToPath } from 'url';
+
 import { describe, it, expect } from 'bun:test';
 
-import { STALE_THREAD_RE, tomlBasicString } from './codex-app-server.js';
+import {
+  type AppServer,
+  type JsonRpcServerRequest,
+  STALE_THREAD_RE,
+  attachCodexAutoApproval,
+  tomlBasicString,
+} from './codex-app-server.js';
 
 describe('tomlBasicString', () => {
   it('leaves safe strings unchanged inside quotes', () => {
@@ -45,3 +55,65 @@ describe('STALE_THREAD_RE', () => {
     expect(STALE_THREAD_RE.test('internal server error')).toBe(false);
   });
 });
+
+describe('Codex CLI pin contract', () => {
+  it('keeps app-server behind a concrete pinned @openai/codex install', () => {
+    const testDir = path.dirname(fileURLToPath(import.meta.url));
+    const dockerfile = fs.readFileSync(path.resolve(testDir, '../../../Dockerfile'), 'utf-8');
+
+    const versionMatch = dockerfile.match(/^ARG CODEX_VERSION=(.+)$/m);
+    expect(versionMatch).not.toBeNull();
+    expect(versionMatch![1]).not.toBe('latest');
+    expect(versionMatch![1]).toMatch(/^\d+\.\d+\.\d+$/);
+    expect(dockerfile).toContain('pnpm install -g "@openai/codex@${CODEX_VERSION}"');
+  });
+});
+
+describe('attachCodexAutoApproval', () => {
+  function fakeServer(): { server: AppServer; writes: string[] } {
+    const writes: string[] = [];
+    const server = {
+      process: {
+        stdin: {
+          write: (line: string) => {
+            writes.push(line);
+            return true;
+          },
+        },
+      },
+      pending: new Map(),
+      notificationHandlers: [],
+      serverRequestHandlers: [],
+    } as unknown as AppServer;
+
+    return { server, writes };
+  }
+
+  function send(server: AppServer, method: string): void {
+    const request: JsonRpcServerRequest = { id: 7, method, params: {} };
+    server.serverRequestHandlers[0](request);
+  }
+
+  it('auto-accepts command and file approvals inside the container sandbox', () => {
+    const { server, writes } = fakeServer();
+    attachCodexAutoApproval(server);
+
+    send(server, 'item/commandExecution/requestApproval');
+    send(server, 'item/fileChange/requestApproval');
+
+    expect(writes.map((line) => JSON.parse(line).result.decision)).toEqual(['accept', 'accept']);
+  });
+
+  it('grants broad app-server permissions because NanoClaw relies on container mounts as the boundary', () => {
+    const { server, writes } = fakeServer();
+    attachCodexAutoApproval(server);
+
+    send(server, 'item/permissions/requestApproval');
+
+    const result = JSON.parse(writes[0]).result;
+    expect(result).toEqual({
+      permissions: { fileSystem: { read: ['/'], write: ['/'] }, network: { enabled: true } },
+      scope: 'session',
+    });
+  });
+});
@@ -282,7 +282,7 @@ export async function initializeCodexAppServer(server: AppServer): Promise<void>
 }
 
 export interface ThreadParams {
-  model: string;
+  model?: string;
   cwd: string;
   sandbox?: string;
   approvalPolicy?: string;

@@ -5,7 +5,7 @@ import path from 'path';
 import { describe, it, expect } from 'bun:test';
 
 import { createProvider } from './factory.js';
-import { CodexProvider, resolveClaudeImports } from './codex.js';
+import { CodexProvider, resolveClaudeImports, resolveCodexModel } from './codex.js';
 
 describe('createProvider (codex)', () => {
   it('returns CodexProvider for codex', () => {
@@ -32,6 +32,23 @@ describe('createProvider (codex)', () => {
   });
 });
 
+describe('resolveCodexModel', () => {
+  it('leaves the Codex CLI/app-server default model alone when unset', () => {
+    expect(resolveCodexModel(undefined)).toBeUndefined();
+    expect(resolveCodexModel({})).toBeUndefined();
+  });
+
+  it('ignores blank model overrides', () => {
+    expect(resolveCodexModel({ CODEX_MODEL: '' })).toBeUndefined();
+    expect(resolveCodexModel({ CODEX_MODEL: '   ' })).toBeUndefined();
+  });
+
+  it('uses CODEX_MODEL when explicitly configured', () => {
+    expect(resolveCodexModel({ CODEX_MODEL: 'gpt-5.2-codex' })).toBe('gpt-5.2-codex');
+    expect(resolveCodexModel({ CODEX_MODEL: ' gpt-5.2-codex ' })).toBe('gpt-5.2-codex');
+  });
+});
+
 describe('resolveClaudeImports', () => {
   function scratchDir(): string {
     return fs.mkdtempSync(path.join(os.tmpdir(), 'codex-imports-'));

@@ -2,10 +2,10 @@
  * OpenAI Codex provider — wraps `codex app-server` via JSON-RPC.
  *
  * Unlike the (deprecated) @openai/codex-sdk approach, the app-server
- * protocol exposes proper session/stream semantics, native compaction, and
- * stable MCP config via ~/.codex/config.toml — which is the same mechanism
- * the standalone codex CLI uses, so the container and host share one
- * provider-integration story.
+ * protocol exposes proper session/stream semantics, Codex-owned context
+ * management, and stable MCP config via ~/.codex/config.toml — which is the
+ * same mechanism the standalone codex CLI uses, so the container and host
+ * share one provider-integration story.
  *
  * Codex turns don't accept mid-turn input. Follow-up `push()` messages are
  * queued and drained after the current turn completes (same pattern as the
@@ -102,11 +102,11 @@ export class CodexProvider implements AgentProvider {
   readonly supportsNativeSlashCommands = false;
 
   private readonly mcpServers: Record<string, { command: string; args: string[]; env: Record<string, string> }>;
-  private readonly model: string;
+  private readonly model: string | undefined;
 
   constructor(options: ProviderOptions = {}) {
     this.mcpServers = options.mcpServers ?? {};
-    this.model = (options.env?.CODEX_MODEL as string | undefined) ?? 'gpt-5.4-mini';
+    this.model = resolveCodexModel(options.env);
   }
 
   isSessionInvalid(err: unknown): boolean {
@@ -203,6 +203,11 @@ export class CodexProvider implements AgentProvider {
   }
 }
 
+export function resolveCodexModel(env: Record<string, string | undefined> | undefined): string | undefined {
+  const model = env?.CODEX_MODEL?.trim();
+  return model || undefined;
+}
+
 // ── Per-turn event pump ─────────────────────────────────────────────────────
 // Pulled out because the gen() loop above reads cleaner with it extracted,
 // and because it's a natural seam for future unit tests that drive it with
@@ -212,7 +217,7 @@ async function* runOneTurn(
   server: AppServer,
   threadId: string,
   inputText: string,
-  model: string,
+  model: string | undefined,
   cwd: string,
   hasInit: () => boolean,
   markInit: () => void,

diff --git a/docs/agent-runner-details.md b/docs/agent-runner-details.md
@@ -153,89 +153,63 @@ class ClaudeProvider implements AgentProvider {
 
 ### Codex Provider
 
-Wraps `@openai/codex-sdk`.
+Wraps `codex app-server` over stdio JSON-RPC.
+
+The earlier `@openai/codex-sdk` sketch was replaced because NanoClaw needs the
+same top-level provider role that Claude Code fills: persistent sessions,
+streaming turn/item events, MCP tool wiring, server-driven approvals, and
+Codex-owned transcript/context management. The SDK is still useful for small
+embedded workflows, but the app-server protocol exposes the thread/turn surface
+NanoClaw needs without moving provider-specific state into the poll loop.
 
 ```typescript
 class CodexProvider implements AgentProvider {
   query(input: QueryInput): AgentQuery {
-    const codex = new Codex(this.buildOptions(input));
-    const thread = input.sessionId
-      ? codex.resumeThread(input.sessionId, this.threadOptions(input))
-      : codex.startThread(this.threadOptions(input));
-
-    const abortController = new AbortController();
-    let pendingFollowUp: string | null = null;
+    const pending = [input.prompt];
 
     return {
-      push: (msg) => {
-        // Codex doesn't support streaming input.
-        // Store the follow-up and abort the current turn.
-        pendingFollowUp = msg;
-        abortController.abort();
-      },
-      end: () => { /* no-op — Codex turns end naturally */ },
-      abort: () => abortController.abort(),
-      events: this.run(thread, input.prompt, abortController, () => pendingFollowUp),
+      push: (msg) => pending.push(msg),
+      end: () => markEnded(),
+      abort: () => markAborted(),
+      events: this.runAppServer(input, pending),
     };
   }
 
-  private async *run(thread, prompt, abortController, getPendingFollowUp): AsyncIterable<ProviderEvent> {
-    let currentPrompt = prompt;
-
-    while (true) {
-      try {
-        const streamed = await thread.runStreamed(currentPrompt, {
-          signal: abortController.signal,
-        });
-
-        let sessionId: string | undefined;
-        let resultText = '';
-
-        for await (const event of streamed.events) {
-          if (event.type === 'thread.started') {
-            sessionId = event.thread_id;
-            yield { type: 'init', sessionId };
-          }
-          if (event.type === 'item.completed' && event.item.type === 'agent_message') {
-            resultText = event.item.text || resultText;
-          }
-          if (event.type === 'turn.failed') {
-            yield { type: 'error', message: event.error.message, retryable: false };
-            return;
-          }
-        }
+  private async *runAppServer(input, pending): AsyncIterable<ProviderEvent> {
+    writeCodexMcpConfigToml(input.mcpServers);
+    const server = spawnCodexAppServer(createCodexConfigOverrides());
+    attachCodexAutoApproval(server);
 
-        yield { type: 'result', text: resultText || null };
+    await initializeCodexAppServer(server);
 
-        // Check if a follow-up was queued during this turn
-        const followUp = getPendingFollowUp();
-        if (followUp) {
-          currentPrompt = followUp;
-          // Reset for next iteration
-          continue;
-        }
+    const threadId = await startOrResumeCodexThread(server, input.continuation, {
+      cwd: input.cwd,
+      model: input.env.CODEX_MODEL, // optional; omit to use Codex's own default
+      sandbox: 'danger-full-access',
+      approvalPolicy: 'never',
+      baseInstructions: composeBaseInstructions(input.systemContext?.instructions),
+    });
 
-        return;
-      } catch (err) {
-        if (abortController.signal.aborted && getPendingFollowUp()) {
-          // Aborted because of follow-up — restart with new prompt
-          currentPrompt = getPendingFollowUp();
-          abortController = new AbortController();
-          continue;
-        }
-        throw err;
-      }
+    yield { type: 'init', continuation: threadId };
+
+    while (pending.length > 0) {
+      const text = pending.shift();
+      await startCodexTurn(server, { threadId, inputText: text, cwd: input.cwd });
+      yield* translateAppServerNotifications(server);
     }
   }
 }
 ```
 
 **Codex-specific behavior inside the provider:**
-- `developer_instructions` for system prompt (loaded from CLAUDE.md)
-- `git init` in workspace (Codex requires a git repo)
-- Abort+restart pattern for follow-up messages
-- `sandboxMode`, `approvalPolicy`, `networkAccessEnabled` from env vars
-- Conversation archiving (Codex doesn't have PreCompact)
+- `baseInstructions` is composed from `CLAUDE.md`, `CLAUDE.local.md`, and the poll-loop's addendum because Codex does not expand Claude Code `@` imports for us.
+- `~/.codex/config.toml` is rewritten per spawn from the normalized MCP server map. The host mounts a per-session `~/.codex` copy so this never clobbers the user's host Codex config.
+- `CODEX_MODEL` is an optional override. When unset, NanoClaw omits `model` from `thread/start` / `turn/start` and lets the pinned Codex CLI/app-server choose its own configured default.
+- Follow-up messages are queued and drained between turns. Codex turns do not accept mid-turn input, and the poll loop only pushes when new pending messages arrive.
+- `codex app-server` may ask for command/file/permission approvals through server-initiated JSON-RPC requests. NanoClaw auto-accepts those requests because the container and its explicit mount list are the security boundary.
+- The Dockerfile pins `@openai/codex` with `CODEX_VERSION`. Treat app-server protocol changes as intentional upgrades: bump the pin, run the provider tests, and smoke-test `initialize` -> `thread/start` or `thread/resume` -> `turn/start` -> streaming `item/*` and `turn/*` notifications.
+- Stale thread IDs are recognized narrowly. Unknown-thread errors can start a fresh thread; auth, version, and transport errors fail loudly instead of silently discarding session state.
+- Context/transcript management is owned by Codex app-server. NanoClaw does not maintain a client-side compaction threshold for Codex.
 
 ### OpenCode Provider
 

@@ -11,9 +11,9 @@
  *   wake with container-appropriate MCP server paths, without racing
  *   other sessions or leaking per-session paths back to the host.
  *
- * Env passthrough covers the two knobs that are read at runtime:
+ * Env passthrough covers the runtime knobs Codex reads:
  *   OPENAI_API_KEY  — fallback auth when auth.json isn't a subscription token
- *   CODEX_MODEL     — model override if the user wants something other than the default
+ *   CODEX_MODEL     — optional model override; unset lets Codex use its default
  *   OPENAI_BASE_URL — rare, but supports API-compatible alternates
  */
 import fs from 'fs';