Written 2026-04-23. These are the load-bearing decisions that were debated, settled, and locked in during planning. If you are about to contradict one of these, re-read the section first — the alternative was considered and rejected for a reason.
What we chose. A single PreToolUse hook matched to the Bash tool. Each Bash call Claude makes is rewritten to try -D <overlay> -n -- bash -c <orig>.
What we rejected. An outer shell wrapper (trai) that launches claude itself inside a try session.
Why.
- The outer wrapper captures everything — Bash and Claude's direct Edit/Write/MultiEdit file writes. That sounds better until you realize the overlay then contains Claude's own state (
~/.claude/, logs, history) which must be filtered out of the commit. - The outer wrapper is an entirely separate install path from Claude Code's extension model. Users have to
curl | shit or clone a repo, set up$PATH, and remember to invoketraiinstead ofclaude. Plugin install is one slash command. - The user was explicit in the planning conversation that they'd rather have a narrower-but-cleaner scope (Bash side-effects reversible; code edits reviewable via git). That's the scope we ship.
Trade-off we accept. Claude's Edit / Write / MultiEdit tools write through the Node process directly — they never become subprocess calls that a hook can intercept. So file edits to source code are not sandboxed. We document this loudly and rely on git for that half.
What we chose. Everything ships inside the .claude-plugin/ manifest + hooks/ + commands/ + scripts/ tree. Users install with /plugin marketplace add <git-url> and /plugin install trai.
What we rejected. A separate shell tool distributed via install script or package manager.
Why. A plugin is installable, updatable, and uninstallable with two slash commands. A shell tool needs shellrc edits, uninstall scripts, version managers. The plugin format also means Claude Code knows about it — /plugin list shows it; disabling it cleanly removes all hooks. Trust surface is identical either way (both are shell scripts running with user privileges), but operational surface is smaller.
What we chose. /trai:commit applies the full accumulated overlay. /trai:discard --yes nukes it. No per-file picker.
What we rejected. Per-file / per-hunk interactive commit.
Why. Per-hunk commit means a TUI, path-conflict resolution, synthesizing partial commits outside of try commit's atomicity guarantees, and handling whiteout/rename tombstones. It's a week's work on its own, and git itself already solves partial-commit well once the overlay is applied. If a user wants selective: /trai:commit, then git add -p.
Revisit when. Users frequently hit the "I want only half of this" case and a git-less workflow is common enough to justify.
What we chose. scripts/doctor.sh refuses to start on non-Linux kernels and prints a pointer to Claude Code's native Seatbelt/bubblewrap sandbox. The SessionStart hook self-disables the plugin if doctor fails.
What we rejected. APFS-snapshot or sandbox-exec fallback on macOS; Docker-for-Desktop bootstrapping on any OS.
Why. try is Linux-only because overlayfs is Linux-only. APFS snapshots have completely different semantics (snapshot-of-volume, not overlay-per-process). sandbox-exec blocks but doesn't produce an inspectable diff. A Docker-in-Docker story works but multiplies the failure surface by the number of host configurations. v1 is Linux.
What we chose. One overlay dir per Claude session, stored at $XDG_STATE_HOME/trai/sessions/<id>/overlay/. Every try -D $overlay -n -- <cmd> invocation reuses the same upperdir.
What we rejected. One overlay per Bash call (isolated upperdirs per-invocation).
Why. Per-call isolation breaks multi-call sequences: if call 1 mkdir foo and call 2 cd foo && touch bar, call 2's overlay doesn't see foo — it fails. Users and Claude both expect sequential commands to share state. The shared-overlay-per-session model is identical in spirit to a single shell: state accumulates, and the "review" at the end shows the net diff, not a log of intermediate states.
Per-call isolation is a hypothetical future "git-log-like" mode. Not v1.
What we chose. config/defaults.json lists command patterns that the hook does not wrap. git *, gh *, sudo *, docker *, podman *, plus read-only tools like ls, cat, pwd, grep, rg, find, which, env.
Why per-pattern:
git *— users want git commits/diffs to hit the real repo. Wrapping git means Claude's "checkgit status" reports on the overlay's git state, which is the same repo's worktree via overlayfs, and branches/HEAD writes land in the overlay. Commits inside the overlay requiretry committo propagate, which is surprising behavior. Let git talk to the real repo; users can always un-do by git.sudo *—sudois a setuid binary and interacts poorly with unprivileged user namespaces. Wrapping opaquely fails withsudo: effective uid is not 0. Passthrough plus a user-visible warning is the least-surprising behavior.docker *,podman *— already container-based; nesting undertryis a known-broken combination upstream (issue #19).- Read-only commands (
ls,cat,grep,rg,find,which,env,pwd,echo,true,false,type,command) — no side effects to review. Wrapping them adds 50–200ms per call for no benefit.
Users can override the list via ${CLAUDE_PLUGIN_DATA}/config.json. The default list is deliberately conservative — we wrap too much rather than too little.
What we chose. try is invoked without -x. Network remains accessible inside the sandbox.
Why. Claude needs the Anthropic API. try -x creates a new net namespace that severs all networking, including the API. A proxy-allowlist story is complex enough to be v2.
Trade-off we accept. A malicious agent with network access can exfiltrate data before the user reviews the overlay. This is the fundamental "post-hoc review is not a security boundary" limit. We document it and don't apologize for it — users who need exfil defense should use Claude Code's built-in network sandbox in addition to this plugin.
What we chose. hooks/pre-bash.sh early-returns passthrough for any command whose first path is inside $CLAUDE_PLUGIN_ROOT.
Why. The seven /trai:* slash commands each expand to a Bash(${CLAUDE_PLUGIN_ROOT}/scripts/cmd-*.sh) tool call. Without exemption, PreToolUse(Bash) wraps those in try, which is absurd:
/trai:doctorruns inside the overlay → reports on the overlay's view of the kernel /try/jq, not the real host./trai:passthroughwrites the bypass token into the overlay's upperdir, so the next real Bash call checks$TRAI_BYPASSon the host (which never got written) and the bypass silently fails./trai:commit//trai:discardoperate on the overlay's view of the session pointer, leaving real state inconsistent.
Discovered live in production. The fix is a single early-return case in pre-bash.sh:
root="${CLAUDE_PLUGIN_ROOT%/}"
case "$cmd" in
"$root"/*) trai::hook_passthrough "trai: plugin-internal script; not sandboxed."; exit 0 ;;
esacWhat we chose. Right before dispatching try -D <overlay> <cmd>, delete every non-directory entry under <overlay>/temproot/.
Why. Upstream try's sandbox_valid_or_empty() (at line 558 of vendor/try/try) rejects any sandbox whose temproot/ contains a file or symlink. But try's own chroot setup creates symlinks under temproot/ for every host mountpoint that is itself a symlink (e.g. temproot/bin -> usr/bin on systemd distros). Upstream's cleanup (around line 328) tries to remove them but isn't reliable — on duvel we reproduced two failure modes with reused -D:
ln: failed to create symbolic link 'temproot//bin/bin': Permission deniedon the second Bash call (try re-creates the symlink, but the leftover causes the newln -sto resolve into an overlay-mounted read-only dir).try: given sandbox 'X' is invalidon the third call (validity check fires because non-directory entries remain).
User data lives in upperdir/, not temproot/, so wiping temproot/'s contents between calls costs nothing:
find "$overlay/temproot" -mindepth 1 -delete 2>/dev/null || trueThis is a coping mechanism for an upstream limitation, not a fix. Upstream try was designed for single-command usage; our shared-overlay-per-session architecture pushes -D further than upstream tests for.
What we chose. hooks/session-start.sh checks current-session first. If it already points at an existing overlay, emit a "resumed" banner and return early. Only create a new overlay when there's nothing to resume.
Why. SessionStart fires on startup, resume, clear, compact, and auto-compact — not just at genuine startup. The original implementation unconditionally called trai::new_session() on every fire, which:
- Overwrote
current-sessionwith a fresh empty overlay, orphaning any in-progress work./trai:statusthen reported on the new empty overlay instead of the one the user's Bash calls had been landing in. - If doctor happened to fail on re-run, the error path cleared
current-sessionentirely — so/trai:statusreported"no active session"mid-session, which is how we found this.
Idempotence is the right default for any lifecycle hook that creates persistent state. If you add other per-session state, check for existing state first.
What we chose. current-session and bypass-next are written under $XDG_STATE_HOME/trai/. Only config.json (user-authored overrides) is read from $CLAUDE_PLUGIN_DATA.
Why. We discovered in live use that $CLAUDE_PLUGIN_DATA is unreliable for plugin-private runtime state:
- NFS. On hosts where
$HOMEis NFS-mounted (common on shared research machines),$CLAUDE_PLUGIN_DATAinherits$XDG_DATA_HOMEwhich defaults to$HOME/.local/share/trai— also NFS. Writes silently fail or succeed with delayed visibility; the hook writes, and microseconds later the same path looks empty. - Context divergence.
$CLAUDE_PLUGIN_DATAhas been observed to resolve differently between the hook-execution context (pre-bash.shreads it one way) and the slash-command-execution context (cmd-status.shreads it another way). The hook writescurrent-sessionin dir A; the status command reads from dir B; state "disappears."
We hit the second failure mode on duvel: npm init / node ran correctly sandboxed (meaning pre-bash.sh found the session pointer), and /trai:status seconds later reported "no active session" (meaning cmd-status.sh looked in a different place). The smoking-gun diagnostic:
CLAUDE_PLUGIN_DATA: /home/<user>/.local/share/trai (NFS)
current-session: ...current-session (exists=no)
$XDG_STATE_HOME is different: the user sets it before launching claude, and it's inherited as an environment variable by every subprocess — hooks, slash commands, the /plugin harness. One value, one place.
Implication for contributors. Anything the plugin writes and later reads (session pointer, bypass token, future things like a gc timestamp) goes under $TRAI_STATE_ROOT = $XDG_STATE_HOME/trai/. $CLAUDE_PLUGIN_DATA is read-only from the plugin's side — only the user writes there (via config.json).
- Move the overlay from
$XDG_STATE_HOMEto$XDG_CACHE_HOMEif users find overlays filling their backup-eligible state dirs. - Switch from
flockto an atomic-rename lock ifflockmisbehaves on remote filesystems (it does on some NFS versions). - Drop
gitfrom the passthrough list if we find Claude is consistently confused about which repo state it's operating on. (No evidence yet.) - Add
PostToolUsediagnostic logging if debugging overlay issues becomes painful.
- The per-Bash-hook model. If we change this, it's a different project.
- Plugin-only distribution. If we add a shell wrapper, we split the install story.
- Linux-only in v1. macOS support is a distinct product.