Skip to content

Concurrent Claude Code sessions race on shared ~/.codex — app-server spawned without an isolated CODEX_HOME #382

Description

@Octo-o-o-o

Summary

The plugin runs one long-lived codex app-server per workspace, but spawns it with the inherited environment and never sets CODEX_HOME. When two Claude Code sessions are active in different workspaces (separate repos, or separate git worktrees of one repo), two app-servers run concurrently against the same ~/.codex and race on codex's process-global mutable state. The result is that the two codex "sessions" interfere and the earlier review fails.

Environment

  • Plugin codex@openai-codex v1.0.4
  • Codex CLI 0.139.0
  • macOS (darwin), Node v26

Root cause

  • The app-server is spawned inheriting ambient env, with no CODEX_HOME:
    plugins/codex/scripts/lib/app-server.mjsspawn("codex", ["app-server"], { env: this.options.env ?? process.env }) (~L188-189).
  • It is launched via the broker, also with env = process.env:
    plugins/codex/scripts/lib/broker-lifecycle.mjsspawnBrokerProcess({ … env = process.env }) (L59), called from ensureBrokerSession (L113).
  • The broker — and therefore the app-server — is keyed per workspace, not per session:
    plugins/codex/scripts/lib/state.mjsresolveStateDir(cwd) = <CLAUDE_PLUGIN_DATA>/state/<slug>-<sha256(workspaceRoot)> (L29-43).
  • ⇒ N workspaces ⇒ N concurrent app-servers, all sharing one ~/.codex. That directory is not safe for concurrent multi-process writers: .codex-global-state.json is rewritten via a temp-file + rename pattern, and codex continuously writes a set of SQLite stores and index files there.

Evidence

On a heavily-used machine, ~/.codex accumulates orphaned, half-written atomic-rename temp files — the fingerprint of concurrent writers clobbering each other:

..codex-global-state.json.tmp-1777371657260-8c5f96aa-…
..codex-global-state.json.tmp-1779423636291-a0b11b60-…
..codex-global-state.json.tmp-1780365543861-35c22d9f-…
..codex-global-state.json.tmp-1781679281003-5ef1efb4-…

And the set of files codex mutates at runtime (mtimes captured during a single active session, within a ~2-minute window) is large and shared:

.codex-global-state.json, .codex-global-state.json.bak
state_5.sqlite(-wal/-shm), logs_2.sqlite(-wal/-shm),
goals_1.sqlite(-wal/-shm), memories_1.sqlite(-wal/-shm),
session_index.jsonl, models_cache.json, skills/, shell_snapshots/, …

Two app-servers writing all of the above against the same ~/.codex with no cross-process coordination is the race.

Reproduction

  1. Two Claude Code sessions in two different repos (or two git worktrees), both with the plugin.
  2. Trigger /codex:review (or the stop-time review gate) in both near-simultaneously.
  3. One review fails / the codex sessions cross; ~/.codex accumulates ..codex-global-state.json.tmp-* orphans.

Proposed fix — isolate CODEX_HOME per app-server, keyed by the existing workspace identity

Set CODEX_HOME when spawning the broker/app-server, reusing the same per-workspace key already used for broker.json:

env.CODEX_HOME = path.join(resolveStateDir(cwd), "codex-home")

Seed it with a strict allowlist so login/config are preserved while all runtime state is isolated (default = isolate; only bring across vetted, read-mostly assets, so future additions can never be accidentally shared):

  • auth.jsonsymlink to the user's real $CODEX_HOME/auth.json (single source of truth; safe across OAuth refresh-token rotation).
  • config.toml, hooks.jsoncopy (codex writes back trust hashes / feature toggles / onboarding; copying keeps those writes local), preserving 0600.
  • Static user assets (AGENTS.md, rules/, policy/, *.config.toml profiles) → symlink.
  • Everything else (sessions/, .codex-global-state.json*, *.sqlite*, session_index.jsonl, skills/, models_cache.json, logs, caches, …) → let codex recreate inside the isolated home. This is the state that must not be shared.

Why per-workspace (not per-session): the broker is already 1-per-workspace and serializes concurrent requests (it returns a BUSY code; cf. #342). A per-workspace home yields exactly one writer per home, matching the broker. Same-workspace sessions still share correctly; different workspaces become fully isolated. The plugin already injects CODEX_COMPANION_SESSION_ID at SessionStart (session-lifecycle-hook.mjs L77), so the lifecycle plumbing is largely in place; tie the home's lifecycle to the broker and remove it in teardownBrokerSession.

Verified workaround (until fixed)

A user-level Claude Code SessionStart hook that allocates a per-session CODEX_HOME (same allowlist seeding as above) and exports it via $CLAUDE_ENV_FILE; the spawned app-server inherits it. Confirmed end-to-end: CODEX_HOME=<seeded home> codex login statusLogged in using ChatGPT, while all *.sqlite / session_index.jsonl / skills/ / global-state stay isolated. Happy to share the script or open a PR.

Alternatives considered

  • CLI-level concurrency safety (advisory locks / atomic single-writer / SQLite busy-timeout, or a CODEX_STATE_HOME separate from the config home) — the deeper fix, but in openai/codex, not here; per-workspace isolation side-steps it and is shippable in this repo. Related upstream: Multiple parallel codex exec instances interfere via shared session restore codex#11435, #14233, #10887.
  • codex exec --ephemeral exists but only for exec; the plugin uses app-server, which has no such flag.

Open questions

Related

#380, #377, #342, #367

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions