Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions .claude/skills/add-codex/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: add-codex
description: Use Codex (CLI + AppServer) as the full agent provider — planning, tool orchestration, native compaction, MCP tools, session resume — in place of the Claude Agent SDK. ChatGPT subscription or OPENAI_API_KEY. Per-group via agent_provider. Distinct from using OpenAI as an MCP tool (where Claude remains the planner).
description: Use Codex (CLI + AppServer) as the full agent provider — planning, tool orchestration, Codex-owned context management, MCP tools, session resume — in place of the Claude Agent SDK. ChatGPT subscription or OPENAI_API_KEY. Per-group via agent_provider. Distinct from using OpenAI as an MCP tool (where Claude remains the planner).
---

# Codex agent provider
Expand All @@ -9,7 +9,7 @@ NanoClaw runs agents in a long-lived **poll loop** inside the container. The bac

Trunk ships with only the `claude` provider baked in. This skill copies the Codex provider files in from the `providers` branch, wires them into the host and container barrels, updates the Dockerfile to install the Codex CLI, and rebuilds the image.

The Codex provider runs `codex app-server` as a child process and speaks JSON-RPC over stdio. That gives it native session resume, streaming events, MCP tool access, and `thread/compact/start` compaction — same feature bar as the Claude Agent SDK, without the Anthropic-only lock-in.
The Codex provider runs `codex app-server` as a child process and speaks JSON-RPC over stdio. That gives it native session resume, streaming events, MCP tool access, approvals, and Codex-owned transcript/context management — same feature bar as the Claude Agent SDK, without the Anthropic-only lock-in.

## Install

Expand Down Expand Up @@ -79,6 +79,8 @@ RUN --mount=type=cache,target=/root/.cache/pnpm \

Note: **no agent-runner package dependency** — Codex is a CLI binary, not a library. Unlike OpenCode, there's nothing to add to `container/agent-runner/package.json`.

Keep `CODEX_VERSION` pinned to a concrete semver. `codex app-server` is the protocol surface this provider depends on, so upgrades should be deliberate: bump the pin, run the focused provider tests, and smoke-test a real `initialize` -> `thread/start` or `thread/resume` -> `turn/start` cycle before shipping.

### 5. Build

```bash
Expand Down Expand Up @@ -107,10 +109,11 @@ No `.env` variables required for this mode.

```env
OPENAI_API_KEY=sk-...
CODEX_MODEL=gpt-5.4-mini
# Optional. If omitted, Codex CLI/app-server uses its configured default.
CODEX_MODEL=gpt-5.2-codex
```

The host forwards both variables into the container. If both subscription (`auth.json`) and `OPENAI_API_KEY` are present, Codex prefers the subscription.
The host forwards both variables into the container. If both subscription (`auth.json`) and `OPENAI_API_KEY` are present, Codex prefers the subscription. Leave `CODEX_MODEL` unset unless you intentionally want to override the Codex CLI/app-server default for this NanoClaw install.

### Option C — BYO OpenAI-compatible endpoint (experimental)

Expand Down Expand Up @@ -138,8 +141,8 @@ Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config

- **Spawn-per-query:** Codex's app-server is spawned fresh per query invocation, matching the OpenCode pattern. No long-lived daemon to keep healthy across sessions.
- **Per-session `~/.codex` isolation:** each group gets its own copy of the host's `auth.json`. The container can rewrite `config.toml` freely on every wake without touching the host's Codex config.
- **Native compaction:** kicks in automatically at 40K cumulative input tokens between turns, via `thread/compact/start`. If compaction fails, the provider logs and continues uncompacted — no fatal error.
- **Approvals:** auto-accepted inside the container (the container is the sandbox; same posture as Claude/OpenCode).
- **Codex context management:** NanoClaw does not maintain a client-side token threshold or manually call `thread/compact/start`. The app-server owns transcript/context management for Codex threads. If context-limit failures appear in real use, add a notification-driven trigger from app-server token-usage events rather than a hard-coded threshold.
- **Approvals:** auto-accepted inside the container because the container, user, and explicit mount list are the sandbox. Do not expand mounts, env passthrough, or host credential access without treating it as a security-sensitive change.
- **Mid-turn input:** Codex turns don't accept mid-turn messages. Follow-up `push()` calls queue and drain between turns, matching the OpenCode pattern. The poll-loop only pushes between turns anyway, so no messages are dropped.
- **Stale thread recovery:** `isSessionInvalid` matches on stale-thread-ID errors (`thread not found`, `unknown thread`, etc.) so a cold-started app-server can recover cleanly when it sees a stored continuation it no longer has.

Expand All @@ -149,7 +152,7 @@ Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config
grep -q "./codex.js" container/agent-runner/src/providers/index.ts && echo "container barrel: OK"
grep -q "./codex.js" src/providers/index.ts && echo "host barrel: OK"
grep -q "@openai/codex@" container/Dockerfile && echo "Dockerfile install: OK"
cd container/agent-runner && bun test src/providers/codex.factory.test.ts && cd -
cd container/agent-runner && bun test src/providers/codex.factory.test.ts src/providers/codex-app-server.test.ts && cd -
```

After the image rebuild, set `agent_provider = 'codex'` on a test group and send a message. Successful round-trip looks like:
Expand Down
74 changes: 73 additions & 1 deletion container/agent-runner/src/providers/codex-app-server.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
import fs from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';

import { describe, it, expect } from 'bun:test';

import { STALE_THREAD_RE, tomlBasicString } from './codex-app-server.js';
import {
type AppServer,
type JsonRpcServerRequest,
STALE_THREAD_RE,
attachCodexAutoApproval,
tomlBasicString,
} from './codex-app-server.js';

describe('tomlBasicString', () => {
it('leaves safe strings unchanged inside quotes', () => {
Expand Down Expand Up @@ -45,3 +55,65 @@ describe('STALE_THREAD_RE', () => {
expect(STALE_THREAD_RE.test('internal server error')).toBe(false);
});
});

describe('Codex CLI pin contract', () => {
it('keeps app-server behind a concrete pinned @openai/codex install', () => {
const testDir = path.dirname(fileURLToPath(import.meta.url));
const dockerfile = fs.readFileSync(path.resolve(testDir, '../../../Dockerfile'), 'utf-8');

const versionMatch = dockerfile.match(/^ARG CODEX_VERSION=(.+)$/m);
expect(versionMatch).not.toBeNull();
expect(versionMatch![1]).not.toBe('latest');
expect(versionMatch![1]).toMatch(/^\d+\.\d+\.\d+$/);
expect(dockerfile).toContain('pnpm install -g "@openai/codex@${CODEX_VERSION}"');
});
});

describe('attachCodexAutoApproval', () => {
function fakeServer(): { server: AppServer; writes: string[] } {
const writes: string[] = [];
const server = {
process: {
stdin: {
write: (line: string) => {
writes.push(line);
return true;
},
},
},
pending: new Map(),
notificationHandlers: [],
serverRequestHandlers: [],
} as unknown as AppServer;

return { server, writes };
}

function send(server: AppServer, method: string): void {
const request: JsonRpcServerRequest = { id: 7, method, params: {} };
server.serverRequestHandlers[0](request);
}

it('auto-accepts command and file approvals inside the container sandbox', () => {
const { server, writes } = fakeServer();
attachCodexAutoApproval(server);

send(server, 'item/commandExecution/requestApproval');
send(server, 'item/fileChange/requestApproval');

expect(writes.map((line) => JSON.parse(line).result.decision)).toEqual(['accept', 'accept']);
});

it('grants broad app-server permissions because NanoClaw relies on container mounts as the boundary', () => {
const { server, writes } = fakeServer();
attachCodexAutoApproval(server);

send(server, 'item/permissions/requestApproval');

const result = JSON.parse(writes[0]).result;
expect(result).toEqual({
permissions: { fileSystem: { read: ['/'], write: ['/'] }, network: { enabled: true } },
scope: 'session',
});
});
});
2 changes: 1 addition & 1 deletion container/agent-runner/src/providers/codex-app-server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ export async function initializeCodexAppServer(server: AppServer): Promise<void>
}

export interface ThreadParams {
model: string;
model?: string;
cwd: string;
sandbox?: string;
approvalPolicy?: string;
Expand Down
19 changes: 18 additions & 1 deletion container/agent-runner/src/providers/codex.factory.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import path from 'path';
import { describe, it, expect } from 'bun:test';

import { createProvider } from './factory.js';
import { CodexProvider, resolveClaudeImports } from './codex.js';
import { CodexProvider, resolveClaudeImports, resolveCodexModel } from './codex.js';

describe('createProvider (codex)', () => {
it('returns CodexProvider for codex', () => {
Expand All @@ -32,6 +32,23 @@ describe('createProvider (codex)', () => {
});
});

describe('resolveCodexModel', () => {
it('leaves the Codex CLI/app-server default model alone when unset', () => {
expect(resolveCodexModel(undefined)).toBeUndefined();
expect(resolveCodexModel({})).toBeUndefined();
});

it('ignores blank model overrides', () => {
expect(resolveCodexModel({ CODEX_MODEL: '' })).toBeUndefined();
expect(resolveCodexModel({ CODEX_MODEL: ' ' })).toBeUndefined();
});

it('uses CODEX_MODEL when explicitly configured', () => {
expect(resolveCodexModel({ CODEX_MODEL: 'gpt-5.2-codex' })).toBe('gpt-5.2-codex');
expect(resolveCodexModel({ CODEX_MODEL: ' gpt-5.2-codex ' })).toBe('gpt-5.2-codex');
});
});

describe('resolveClaudeImports', () => {
function scratchDir(): string {
return fs.mkdtempSync(path.join(os.tmpdir(), 'codex-imports-'));
Expand Down
19 changes: 12 additions & 7 deletions container/agent-runner/src/providers/codex.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
* OpenAI Codex provider — wraps `codex app-server` via JSON-RPC.
*
* Unlike the (deprecated) @openai/codex-sdk approach, the app-server
* protocol exposes proper session/stream semantics, native compaction, and
* stable MCP config via ~/.codex/config.toml — which is the same mechanism
* the standalone codex CLI uses, so the container and host share one
* provider-integration story.
* protocol exposes proper session/stream semantics, Codex-owned context
* management, and stable MCP config via ~/.codex/config.toml — which is the
* same mechanism the standalone codex CLI uses, so the container and host
* share one provider-integration story.
*
* Codex turns don't accept mid-turn input. Follow-up `push()` messages are
* queued and drained after the current turn completes (same pattern as the
Expand Down Expand Up @@ -102,11 +102,11 @@ export class CodexProvider implements AgentProvider {
readonly supportsNativeSlashCommands = false;

private readonly mcpServers: Record<string, { command: string; args: string[]; env: Record<string, string> }>;
private readonly model: string;
private readonly model: string | undefined;

constructor(options: ProviderOptions = {}) {
this.mcpServers = options.mcpServers ?? {};
this.model = (options.env?.CODEX_MODEL as string | undefined) ?? 'gpt-5.4-mini';
this.model = resolveCodexModel(options.env);
}

isSessionInvalid(err: unknown): boolean {
Expand Down Expand Up @@ -203,6 +203,11 @@ export class CodexProvider implements AgentProvider {
}
}

export function resolveCodexModel(env: Record<string, string | undefined> | undefined): string | undefined {
const model = env?.CODEX_MODEL?.trim();
return model || undefined;
}

// ── Per-turn event pump ─────────────────────────────────────────────────────
// Pulled out because the gen() loop above reads cleaner with it extracted,
// and because it's a natural seam for future unit tests that drive it with
Expand All @@ -212,7 +217,7 @@ async function* runOneTurn(
server: AppServer,
threadId: string,
inputText: string,
model: string,
model: string | undefined,
cwd: string,
hasInit: () => boolean,
markInit: () => void,
Expand Down
104 changes: 39 additions & 65 deletions docs/agent-runner-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,89 +153,63 @@ class ClaudeProvider implements AgentProvider {

### Codex Provider

Wraps `@openai/codex-sdk`.
Wraps `codex app-server` over stdio JSON-RPC.

The earlier `@openai/codex-sdk` sketch was replaced because NanoClaw needs the
same top-level provider role that Claude Code fills: persistent sessions,
streaming turn/item events, MCP tool wiring, server-driven approvals, and
Codex-owned transcript/context management. The SDK is still useful for small
embedded workflows, but the app-server protocol exposes the thread/turn surface
NanoClaw needs without moving provider-specific state into the poll loop.

```typescript
class CodexProvider implements AgentProvider {
query(input: QueryInput): AgentQuery {
const codex = new Codex(this.buildOptions(input));
const thread = input.sessionId
? codex.resumeThread(input.sessionId, this.threadOptions(input))
: codex.startThread(this.threadOptions(input));

const abortController = new AbortController();
let pendingFollowUp: string | null = null;
const pending = [input.prompt];

return {
push: (msg) => {
// Codex doesn't support streaming input.
// Store the follow-up and abort the current turn.
pendingFollowUp = msg;
abortController.abort();
},
end: () => { /* no-op — Codex turns end naturally */ },
abort: () => abortController.abort(),
events: this.run(thread, input.prompt, abortController, () => pendingFollowUp),
push: (msg) => pending.push(msg),
end: () => markEnded(),
abort: () => markAborted(),
events: this.runAppServer(input, pending),
};
}

private async *run(thread, prompt, abortController, getPendingFollowUp): AsyncIterable<ProviderEvent> {
let currentPrompt = prompt;

while (true) {
try {
const streamed = await thread.runStreamed(currentPrompt, {
signal: abortController.signal,
});

let sessionId: string | undefined;
let resultText = '';

for await (const event of streamed.events) {
if (event.type === 'thread.started') {
sessionId = event.thread_id;
yield { type: 'init', sessionId };
}
if (event.type === 'item.completed' && event.item.type === 'agent_message') {
resultText = event.item.text || resultText;
}
if (event.type === 'turn.failed') {
yield { type: 'error', message: event.error.message, retryable: false };
return;
}
}
private async *runAppServer(input, pending): AsyncIterable<ProviderEvent> {
writeCodexMcpConfigToml(input.mcpServers);
const server = spawnCodexAppServer(createCodexConfigOverrides());
attachCodexAutoApproval(server);

yield { type: 'result', text: resultText || null };
await initializeCodexAppServer(server);

// Check if a follow-up was queued during this turn
const followUp = getPendingFollowUp();
if (followUp) {
currentPrompt = followUp;
// Reset for next iteration
continue;
}
const threadId = await startOrResumeCodexThread(server, input.continuation, {
cwd: input.cwd,
model: input.env.CODEX_MODEL, // optional; omit to use Codex's own default
sandbox: 'danger-full-access',
approvalPolicy: 'never',
baseInstructions: composeBaseInstructions(input.systemContext?.instructions),
});

return;
} catch (err) {
if (abortController.signal.aborted && getPendingFollowUp()) {
// Aborted because of follow-up — restart with new prompt
currentPrompt = getPendingFollowUp();
abortController = new AbortController();
continue;
}
throw err;
}
yield { type: 'init', continuation: threadId };

while (pending.length > 0) {
const text = pending.shift();
await startCodexTurn(server, { threadId, inputText: text, cwd: input.cwd });
yield* translateAppServerNotifications(server);
}
}
}
```

**Codex-specific behavior inside the provider:**
- `developer_instructions` for system prompt (loaded from CLAUDE.md)
- `git init` in workspace (Codex requires a git repo)
- Abort+restart pattern for follow-up messages
- `sandboxMode`, `approvalPolicy`, `networkAccessEnabled` from env vars
- Conversation archiving (Codex doesn't have PreCompact)
- `baseInstructions` is composed from `CLAUDE.md`, `CLAUDE.local.md`, and the poll-loop's addendum because Codex does not expand Claude Code `@` imports for us.
- `~/.codex/config.toml` is rewritten per spawn from the normalized MCP server map. The host mounts a per-session `~/.codex` copy so this never clobbers the user's host Codex config.
- `CODEX_MODEL` is an optional override. When unset, NanoClaw omits `model` from `thread/start` / `turn/start` and lets the pinned Codex CLI/app-server choose its own configured default.
- Follow-up messages are queued and drained between turns. Codex turns do not accept mid-turn input, and the poll loop only pushes when new pending messages arrive.
- `codex app-server` may ask for command/file/permission approvals through server-initiated JSON-RPC requests. NanoClaw auto-accepts those requests because the container and its explicit mount list are the security boundary.
- The Dockerfile pins `@openai/codex` with `CODEX_VERSION`. Treat app-server protocol changes as intentional upgrades: bump the pin, run the provider tests, and smoke-test `initialize` -> `thread/start` or `thread/resume` -> `turn/start` -> streaming `item/*` and `turn/*` notifications.
- Stale thread IDs are recognized narrowly. Unknown-thread errors can start a fresh thread; auth, version, and transport errors fail loudly instead of silently discarding session state.
- Context/transcript management is owned by Codex app-server. NanoClaw does not maintain a client-side compaction threshold for Codex.

### OpenCode Provider

Expand Down
4 changes: 2 additions & 2 deletions src/providers/codex.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
* wake with container-appropriate MCP server paths, without racing
* other sessions or leaking per-session paths back to the host.
*
* Env passthrough covers the two knobs that are read at runtime:
* Env passthrough covers the runtime knobs Codex reads:
* OPENAI_API_KEY — fallback auth when auth.json isn't a subscription token
* CODEX_MODEL — model override if the user wants something other than the default
* CODEX_MODEL — optional model override; unset lets Codex use its default
* OPENAI_BASE_URL — rare, but supports API-compatible alternates
*/
import fs from 'fs';
Expand Down
Loading