Advanced and internal details for Deliberation. For install and everyday use, see the README. This document covers the provider bridges, the full environment-variable reference, manual MCP setup, multi-turn and retry behavior, and the Gemini recovery paths.
- Architecture
- Consensus flow details
- Provider bridges
- Implementation mode (core capability)
- Environment variables
- Manual MCP setup
- Multi-turn and retry
- Gemini timeout recovery
- Grok files and cleanup
- OpenRouter bridge
- Orientation auto-attach
- Session persistence
- Customizing expert prompts
- Troubleshooting
- Known limitations
Claude acts as the orchestrator. It reads your request, picks an expert, and delegates to a provider over MCP. Each provider reaches Claude Code differently:
- Codex (GPT) - the Codex CLI ships a native MCP server (
codex mcp-server). - Gemini - a bundled zero-dependency Node bridge (
server/gemini/index.js) wraps the Antigravity CLI (agy). - Grok (xAI) - a bundled zero-dependency Node bridge (
server/grok/index.js) talks to the xAI Responses API (/v1/responses) over HTTP. Advisory-only: it cannot edit files, but it can read attached files.
Responses are synthesized by Claude, never passed through verbatim.
End-to-end flow on a typical request:
You: "Is this authentication flow secure?"
|
v
Claude: detects a security question, selects the Security Analyst
|
v
+-------------------------------------+
| mcp__deliberation-codex__codex / |
| mcp__deliberation-gemini__gemini / |
| mcp__deliberation-grok__grok |
| -> Security Analyst prompt |
| -> expert analyzes your code |
+-------------------------------------+
|
v
Claude: "I found 3 issues..." (synthesizes, applies judgment)
- Each expert has a specialized system prompt (in
prompts/). - Claude reads your request, picks the expert, and delegates over MCP.
- Responses are synthesized, not passed through raw.
- Multi-turn conversations preserve context via
threadIdfor chained work, and implementation retries before escalating to you.
/consensus is a thin driver over the consensus-step tool; the multi-round loop lives in
the core state machine (core/consensus-loop.js). Each round: Claude commits a blind verdict,
the server fans out to the panel (dispatch_peers) and parses each voice's verdict + critical
issues, Claude adjudicates (accept/dismiss/defer, every dismiss carries a reason), then revises
the plan. The loop converges only when at least one responding peer APPROVES, none REJECT, zero
accepted critical issues remain, and Claude's adjudicated verdict is APPROVE - so Claude cannot
self-approve. The cap is consensus.maxRounds (default 5).
Critical-issue taxonomy (the closed set every critical issue is tagged with, parsed by
parseReview in core/provider.js):
security- auth, secrets, injection, data exposure, privilege boundarycorrectness- wrong behaviour, broken invariant, missing case, race conditionscope- undefined boundary, missing acceptance criteria, deliverable unclearambiguity- reference too vague to act on, contradictory steps, missing contextperformance- latency, throughput, resource use, scaling limitops- rollback, observability, deploy, migration, on-call surface
Verdict shape. The shared review prompt (one byte-identical string per provider) instructs every
reviewer to end with a machine-readable VERDICT: APPROVE (or VERDICT: REQUEST_CHANGES /
VERDICT: REJECT) line on its own, then list issues as - [category] description. parseReview
tolerates real-world drift across models: it first strips fenced code blocks (so a quoted/echoed
template cannot hijack the verdict), then resolves the verdict via a 4-tier ladder - the VERDICT:
sentinel, a same-line Verdict: <token>, a Verdict heading with the token on a following line, or a
bare standalone token line - and joins a category heading to a description on the next line when the
bullet itself has none. An unparsed verdict stays null (treated as "not APPROVE"), so a parse miss
never false-approves into convergence.
Stage 2 (anonymized peer cross-review) is not part of the current loop. Earlier revisions ran
a command-layer Stage 2 (each reviewer scored the others' anonymized answers, with a shuffle
mapping in the report). The engine-driven rewrite removed it: the core loop has no Stage 2 model,
and keeping it in command prose re-introduced the duplication the rewrite eliminated. If anonymized
cross-review proves valuable it returns as an engine feature (a new consensus-step action), not as
prose.
The bridge wraps the Antigravity CLI (agy) in print mode (agy -p) and adds two
reliability behaviors:
- Soft-timeout drain - on timeout it keeps
agyalive, keeps buffering its streamed stdout, and returns the answer ifagycompletes cleanly within the grace budget. See Gemini timeout recovery. - Plain-stdout answer with an
Error:sentinel -agy -pprints the answer as plain UTF-8 text on stdout and exits 0; there is no-o jsonmode. The bridge treats stdout as the answer unless it matches/^\s*Error:/(agy reports failures asError: <message>on stdout, still at exit 0), in which case it classifies the failure into an error envelope.
Flag mapping the bridge applies to agy:
| Bridge input | agy flag |
|---|---|
sandbox: read-only (advisory) |
--sandbox (terminal-only; see below) |
sandbox: workspace-write |
--dangerously-skip-permissions |
include-directories: [...] |
repeated --add-dir <dir> |
gemini-reply (multi-turn) |
--conversation <id> |
| always | --print-timeout <duration> and -p <prompt> |
There is no -m/--model flag and no -o json. The model is read from
~/.gemini/settings.json (model.name, default auto-gemini-3); the MCP model
parameter is advisory only. The bridge default model is auto-gemini-3; override the
default with GEMINI_DEFAULT_MODEL, or point at a different agy binary with
AGY_BIN. agy print mode does not enforce folder trust.
Read-only is enforced by the bridge, not by agy. agy's --sandbox
restricts only terminal commands - the agent's file-edit tool is unrestricted, and
the model can bypass the terminal restriction per tool call. So a read-only
dispatch does NOT rely on --sandbox alone. The bridge layers its own enforcement
on every read-only run:
- macOS OS sandbox - the
agyprocess is wrapped insandbox-execwith a Seatbelt profile that denies all file writes except~/.gemini(OAuth + the conversation cache that backsthreadIdcontinuity) and temp dirs. Workspace writes are blocked at the kernel. Disable withDELIBERATION_DISABLE_OS_SANDBOX=1. - Prompt guard - a hard advisory instruction block is prepended to the prompt.
- Git mutation detection (all platforms) - the bridge snapshots
git rev-parse HEAD+git status --porcelainof the consulted cwd (and any--add-dirroots) before and after the run; on a diff it prepends aWORKSPACE MUTATION DETECTEDwarning and setsworkspaceMutated: trueon the result. No auto-revert - the result is surfaced as tainted for the caller to act on. - Env scrub - the child env drops the kill-switch and push/exfil credentials
(
GITHUB_TOKEN,GH_TOKEN,GIT_ASKPASS,SSH_AUTH_SOCK).
The network is intentionally NOT isolated (agy needs its API + OAuth token
refresh), so a determined delegate could still push pre-existing commits or exfil
over the network - read-only contains local writes. On Linux there is no OS sandbox
in v1; read-only there relies on the prompt guard + mutation detection (bwrap is a
documented future seam). workspace-write (the direct gemini tool only) is an
explicit opt-in that intentionally skips all of the above.
A bundled zero-dependency Node bridge over the xAI Responses API
(/v1/responses). It is advisory-only (it cannot edit files) but it can read
attached files: pass files: [{ path | file_id | file_url | dir }] and the
bridge delivers them per the mode setting - uploaded to the xAI Files API
(default), inlined as input_text (for line-by-line reading of source files),
or expanded via the bundled glob walker for directories. Resolution is against
the top-level roots: string[] (first-root-wins) or cwd when roots is
omitted. Uploaded files are SHA-256 dedup-cached locally and carry an
expires_after (default 7 days); manage with /grok-files
(list / prune / gc). See Grok files and cleanup.
The bridge default model is grok-4.3. It needs XAI_API_KEY in its environment;
a missing key surfaces errorKind: "missing-auth".
The core codex + gemini providers can run workspace-write (edit files) instead
of the default read-only. The capability is gated by two AND-ed locks - a write
happens only when both are true:
- Construction lock -
makeCodexProvider({ allowImplement: true })/makeAntigravityProvider({ bridge, allowImplement: true }). Absent/false, the provider is read-only no matter what the request says.capabilities.canImplementreflects this lock, sopanel/discovery report honestly per process. - Request lock -
DelegationRequest.mode, a closed"advisory"|"implement"enum (default"advisory"). Only the exact string"implement"requests a write.
effectiveImplement = (opts.allowImplement === true) && (req.mode === "implement")
Read-only is the structural default at every layer: anything that is not exactly that
pair runs advisory. The request vocabulary is mode-level (advisory/implement); the
OS sandbox string (workspace-write) is computed inside the provider and never taken
from caller input, so a request cannot smuggle argv.
Flag mapping (only when effectiveImplement):
| Provider | Advisory (default) | Implement |
|---|---|---|
| Codex | codex exec --sandbox read-only |
codex exec --sandbox workspace-write (never danger-full-access/bypass-approvals) |
| Gemini | buildAgyArgs({sandbox:"read-only"}) + runGemini(..., {readOnly:true}) |
buildAgyArgs({sandbox:"workspace-write"}) (-> --dangerously-skip-permissions) + runGemini(..., {readOnly:false}) |
Codex enforces the sandbox at the OS level (Seatbelt/Landlock/seccomp) in both modes.
For Gemini, readOnly:false drops the OS write-deny wrapper (intended - writes must
land), but the credential env scrub runs in both modes: advisoryEnv() still strips
*_KEY/*_TOKEN/*_SECRET-shaped vars plus GIT_ASKPASS/SSH_AUTH_SOCK from the
child env, so a write run edits the worktree yet never receives the operator's keys or
SSH agent - the human commits and pushes, not agy. See the
Gemini read-only enforcement detail above for the shared scrub.
Scope. This is the section-1 core capability, proven by tests
(test/core-codex.test.js, test/core-antigravity.test.js). The live composition root
(server/mcp/index.js) does not pass allowImplement:true, so the running unified
server stays read-only and no end-user tool sets mode:"implement" yet. The unified
implement tool (no readOnlyHint, so the host prompts a human), the never-cache
guarantee for write runs, a forced audit/session record, and multi-turn for impl land
with the MCP consolidation. Hosted/remote builds (HTTP-only providers, no codex/gemini
CLIs) are inert by construction - they never set the construction lock.
This is the single source of truth for the bridge environment variables.
| Variable | Provider | Default | Purpose |
|---|---|---|---|
GEMINI_DEFAULT_MODEL |
Gemini | auto-gemini-3 |
Default model when the call sets none |
GEMINI_DISABLE_TIMEOUT_RECOVERY |
Gemini | unset | 1 forces legacy timeout (no drain) |
AGY_BIN |
Gemini | agy |
Override the path to the agy binary |
AGY_LAST_CONVERSATIONS |
Gemini | ~/.gemini/antigravity-cli/cache/last_conversations.json |
Override the conversation-id map file (mainly for tests) |
XAI_API_KEY |
Grok | unset (required) | xAI API key; missing key returns missing-auth |
GROK_DEFAULT_MODEL |
Grok | grok-4.3 |
Default model when the call sets none |
XAI_API_BASE |
Grok | https://api.x.ai/v1 |
API endpoint override |
GROK_REASONING_EFFORT |
Grok | high |
low/medium/high; none or off omits the field |
GROK_FILE_TTL_SECONDS |
Grok | 604800 (7 days) |
Upload lifetime, clamped 1h..30d |
DELIBERATION_SESSIONS |
sessions | <XDG cache>/deliberation/sessions |
Override the session store directory (see Session persistence) |
DELIBERATION_DEBUG_LOG |
debug | <XDG cache>/deliberation/debug.jsonl |
Override the debug log path (see Observability); only written when debug.enabled |
Codex has no bridge environment variables: it ships its own native MCP server and
reads ~/.codex/config.toml directly. The model comes from the model key in
that file by default (the Codex analog of GEMINI_DEFAULT_MODEL /
GROK_DEFAULT_MODEL). Override it on the server with -c model=<id> on the
claude mcp add ... deliberation-codex registration, or per call with the model parameter of
mcp__deliberation-codex__codex(...). See SETUP.md.
Codex per-call timeout. The core Codex provider caps each codex exec invocation to
CODEX_DEFAULT_TIMEOUT_MS (600 000 ms, 10 min) by default. This prevents a stalled Codex
process from blocking a consensus round indefinitely. A per-call timeout override is not
currently exposed through the MCP tool surface; the constant is the global ceiling.
callProvider retry. callProvider in core/orchestrate.js retries once on a network
error (a pre-response transport failure - connection refused, DNS failure, socket hang-up) and
does NOT retry on a provider timeout or application-level error. This is a single retry, not a
retry loop.
These are server-side on the unified deliberation server, so every MCP host gets them -
not only Claude Code. Every DelegationResult carries ms (wall time) and the effective
reasoningEffort (a real value for HTTP providers; null for the Codex/Gemini CLIs, which
have no per-call knob); HTTP providers (Grok, OpenRouter) also include token usage.
ask-all is one tool call that fans out to N providers server-side - opaque until all
finish. The alternative, for hosts where that opacity hurts:
panel { expert?, cwd? }returns{ providers: string[], omitted: string[] }- the EXACT setselectForAskAllwould dispatch (enabled built-ins + eligible OpenRouter aliases, fanout cap applied), WITHOUT calling any provider.omittedis the fanout-cap drop list. Read-only.ask-one { provider, prompt, expert?, cwd?, reasoningEffort?, files? }runs ONE provider named bypanel(resolved from the same selection set, so a pinnedopenrouter:<alias>works and a disabled/over-cap name returns{ error, panel }).- The pattern (
commands/ask-all.mdon Claude Code): callpanel, then issue oneask-oneper name in a single turn. The host runs them concurrently (parallel wall-time) and surfaces each result as it settles, so progress is visible per provider. The legacy single-callask-alltool is retained for back-compat / other hosts.
Why two paths: Claude Code does NOT render mid-call MCP notifications/message (verified),
but DOES surface each parallel tool result as it lands - so the per-provider tool path is
the progress lever there. Hosts that DO render server log notifications get live progress
from the single ask-all call too (next section).
The server declares the logging capability in initialize and accepts logging/setLevel.
During a fan-out it emits one notifications/message (level info, logger deliberation)
per provider as it settles, plus a dispatch_start. The payload carries event/provider/ms/
verdict metadata only - never prompt or response text (MCP logging security rule). A client
that raises its min level above info suppresses them.
Set "debug": { "enabled": true } (optional "path") in config.json to append one
compact JSON line per provider call and per consensus round to
<XDG cache>/deliberation/debug.jsonl (override with debug.path or
DELIBERATION_DEBUG_LOG). The logger (core/debug-log.js) is INJECTED through core
(askAll / askOne / consensus / runToConvergence) and emitted at the source, so the
Claude host-arbiter path and the in-core provider-arbiter loop log identically - one code
path, every host. Records: timestamp, tool, provider, model, reasoning effort, ms,
HTTP token usage, and consensus round/verdict/converged/accepted-issue COUNT. A strict
ALLOWED_KEYS whitelist is applied on every write, so prompts, responses, and free-text
issue descriptions can never land in the file. Off by default; nothing is written unless
enabled.
core/result-cache.js is an in-memory (process-lifetime) dedup cache wired to the advisory
ask-all / ask-one paths only (NOT the consensus loop). An identical re-ask (same
provider + model + reasoning effort + temperature + developer instructions + prompt + file
refs) returns the prior SUCCESS instantly with a cached:true marker. LRU-bounded (100) +
10-minute TTL; errors are never cached; file-bearing requests skip it (file content can
change under a path); session-revisit bypasses it (a revisit is a deliberate re-run).
core/analyze.js is pure (zero-dep, unit-tested); the read-only analyze MCP tool does the
IO and calls it. It answers "is my model panel pulling its weight?" from real measured data,
in two lenses that are never joined (the debug log and the session store share no run id):
- Lens A - timing/cost (debug log):
aggregateByModel-> per provider+model count, error rate, p50/p95/max latency, mean tokens (HTTP only), reasoning efforts + tools seen. The tool tail-reads the log (last ~1 MB by default;limitBytes) so a large file cannot bloat memory, and returns pre-aggregated stats, never raw lines. - Lens B - agreement (sessions):
aggregateAgreement-> per model, the share of its review verdicts that matched the run's FINAL verdict. Only consensus-loop records (which carry a final verdict) contribute votes; ask-all opinions are abstentions. This is the "uniqueness" proxy (a model that rarely dissents adds little), without reading opinion text or an LLM pass.
detectOutliers flags slow (relative to the fastest-peer baseline, or absolute) and
high-error models; recommend turns those + the agreement signal into advisory Suggestions
naming the exact config.json key (models.<id>.askAll, models.<id>.reasoningEffort,
routing.maxFanout). Codex/Gemini reasoning is flagged as external (~/.codex/config.toml /
agy) since it is outside deliberation's config. The tool writes nothing; /deliberation:analyze
renders it for humans and prints suggested edits without applying them. Needs debug.enabled
for Lens A and sessions.persist for Lens B; when the log is empty it returns
meta.insufficientData:true instead of fabricating numbers.
If /setup does not work, register the MCP servers manually. Each command is
idempotent (safe to rerun):
# Codex (GPT) - inherits its model from ~/.codex/config.toml.
# Pin a model on the server with `-c model=<id>` (e.g. `codex mcp-server -c model=gpt-5.5`).
claude mcp remove codex >/dev/null 2>&1 || true
claude mcp add --transport stdio --scope user codex -- codex mcp-server
# Gemini
claude mcp remove gemini >/dev/null 2>&1 || true
claude mcp add --transport stdio --scope user gemini -- node ${CLAUDE_PLUGIN_ROOT}/server/gemini/index.js
# Grok (xAI) - API-based, advisory-only. Needs XAI_API_KEY.
# Default registers WITHOUT --env, so the key is NOT written to ~/.claude.json;
# export XAI_API_KEY in Claude Code's launch environment (e.g. your shell profile).
claude mcp remove grok >/dev/null 2>&1 || true
claude mcp add --transport stdio --scope user grok -- node ${CLAUDE_PLUGIN_ROOT}/server/grok/index.js
# Alternative (persists the key in ~/.claude.json in plaintext): append
# --env XAI_API_KEY="$XAI_API_KEY"
# before the `-- node ...` part of the command above.Verify:
claude mcp list
printf '{"jsonrpc":"2.0","id":"health","method":"initialize","params":{}}\n' | node ${CLAUDE_PLUGIN_ROOT}/server/gemini/index.js
printf '{"jsonrpc":"2.0","id":"health","method":"initialize","params":{}}\n' | node ${CLAUDE_PLUGIN_ROOT}/server/grok/index.jsFor chained implementation steps, an expert preserves context across turns:
Turn 1: mcp__*__* -> returns threadId
Turn 2: mcp__*__*-reply(threadId) -> expert remembers turn 1
Turn 3: mcp__*__*-reply(threadId) -> expert remembers turns 1-2
Use single-shot (codex, gemini, grok) for advisory tasks. Use multi-turn for
implementation chains and retries. Grok is advisory-only.
Implementation retries up to 3 attempts total (1 initial + 2 *-reply retries),
then escalates to you. Retries reuse the threadId so the expert remembers the
earlier attempts.
timeout is a soft deadline (default 300000ms; Gemini 3 deep prompts run
200-260s). agy -p streams its answer to stdout incrementally, so the bridge
recovers by draining that stream rather than scraping disk. When the soft timeout
fires, the bridge does not fail immediately: it keeps agy alive and keeps
buffering its streamed stdout for up to recovery-grace ms (default 120000, range
0..600000). If agy completes cleanly within the grace budget (exit 0, no Error:
sentinel on stdout), the buffered output is returned as a normal success with a
top-level "recovered": true flag and a stderr log line; content is the full
answer so response parsers keep working. If agy is still running when the grace
budget is exhausted, the call fails with errorKind: "timeout" (still
retryable).
"recovery-grace": 0disables the drain (immediate legacy timeout).GEMINI_DISABLE_TIMEOUT_RECOVERY=1(env) forces full legacy behavior.- The call resolves within
timeout + recovery-grace. Theagychild process is then killedSIGTERM, with aSIGKILLabout 1s later; that kill is async cleanup and does not delay the response.
Grok reads attached files via the files[] parameter. Each entry has EXACTLY ONE of:
path- a local file. Delivery is controlled bymode(default"upload"- bridge uploads to the xAI Files API;"inline"embeds asinput_text;"auto"picks per heuristic - see "Inline vs upload delivery" below).file_id- an already-uploaded xAI file id (passed through, no upload).file_url- a public URL (passed through).dir- a local directory expanded recursively. Samemoderules; the walker applies the chosen mode to every selected file (see below).
A path or dir resolves against the top-level roots[] array (absolute directories,
first-root-wins for relative entries) or, when roots is omitted, against cwd. A
path that resolves outside every declared root is refused (no exfiltration); symlinks
that escape via realpath are also refused. An oversize file (>48 MB) returns
file-too-large.
mcp__deliberation-grok__grok({
prompt: "Compare the auth strategy in these two services.",
cwd: "/Users/me/work/service-a",
roots: ["/Users/me/work/service-a", "/Users/me/work/service-b"],
files: [
{ path: "src/auth.ts" }, // resolves under service-a (first root)
{ path: "/Users/me/work/service-b/src/auth.ts" }, // absolute, must lie under one root
{ dir: "docs", include: ["**/*.md"], maxFiles: 20 }, // expands service-a/docs
],
})The bridge bundles a zero-dep glob walker (server/grok/glob.js) so you do not have
to enumerate every file by hand:
include(default["**/*"]).excludeis appended to the bridge's safe defaults (it does NOT replace them). Defaults cover: VCS (.git); JS/Node (node_modules,dist,build,out,.next,.svelte-kit,.nuxt,.turbo,.cache,.parcel-cache,.pnpm-store); Yarn Berry (.yarn/cache,.yarn/unplugged); lockfiles (**/*.lock); Python (.venv,venv,__pycache__,.tox,.pytest_cache,.mypy_cache,.ruff_cache,.ipynb_checkpoints,.eggs,htmlcov); coverage (coverage,.nyc_output); Rust/Java/Gradle (target,.gradle); Go/PHP (vendor); Terraform (.terraform,.terragrunt-cache); plus security:**/*.tfstate*, granular.envvariants (keeps.env.examplereadable),.ssh/**, SSH keypairs (id_rsa,id_ed25519,id_ecdsa,id_dsaand.pub), and**/*.pem/**/*.key.- To replace defaults entirely instead of appending, set
excludeReset: trueon the same{dir}entry.excludeResetis validated as a strict boolean byvalidateFiles; non-boolean values are rejected. Use only when reviewing files defaults would block (e.g., Terraform state in a security audit, or legitimate.pempublic certs). Tradeoff is explicit: the bridge prefers a false positive (blocking a legitimate.pem) over a false negative (leaking a private key). maxFiles(default 50),maxBytes(default 128 MB). Exceeding either throws a hard error with counts - no silent truncation.- Walker is symlink-safe: dirs are pruned before descent; symlinks to dirs are
not followed (cycle safety); symlinks to files are followed only when
realpathstays inside the resolved root. - Patterns are POSIX (
/separator). Backslash escape sequences are rejected at validation; literalpath/dirvalues may contain backslashes (Windows OK).
xAI's input_file references are searchable attachments; for large source files
the model may enumerate them rather than read line-by-line. To force a full
line-by-line read, deliver the content as input_text instead:
files: [
{ path: "app/apps/api/routes.py", mode: "inline" }, // forced inline
{ path: "modules/web.tm.hcl", mode: "auto" }, // text + small → inline
{ path: "design.pdf", mode: "auto" }, // binary or big → upload
{ dir: "src", include: ["**/*.ts"], mode: "auto" }, // each walked file decides
]"upload"(default) - always uses the xAI Files API. Matches the v2.0 behavior."inline"- embeds the file content directly as a separateinput_textpart with a=== {filename} ===header. No/filescall, no cache row, nouploadedFileIdsentry. Best for source code review."auto"- inlines when the file is probably text (no NUL byte; <5% non-printable bytes in the first 4 KB) AND its size is at or belowGROK_INLINE_MAX_BYTES(default 262144 = 256 KB). Otherwise uploads.
For {dir} entries the mode is inherited by every walked file. mode must
NOT be set on file_id / file_url entries (those bypass the upload path
entirely; setting mode on them returns -32602 from validateFiles).
Override the inline ceiling with GROK_INLINE_MAX_BYTES=<bytes> in the bridge
environment.
Uploads are deduplicated by SHA-256 content hash. A reuse hit requires the SAME content
plus the same API key, the same normalised apiBase, and the same effective filename
(see cache-key below); identical bytes uploaded under a different filename or a different
key produce separate cache rows:
- Cache file:
~/.cache/deliberation/grok-files.json(canonical XDG path; Windows%LOCALAPPDATA%\deliberation\grok-files.json). Override withDELIBERATION_CACHE. - Cache key:
sha256(bytes)@sha256(XAI_API_KEY)[:16]@normalize(apiBase)@effectiveFilename- Key rotation auto-invalidates entries (different
keyFp). - Different
apiBase(including port/protocol differences) → separate rows. - Different effective filename (basename or
filenameoverride) → separate rows.
- Key rotation auto-invalidates entries (different
- Reuse check: hit +
expiresAt > now + 60s+apiBase+keyFpall match. - In-process Promise dedup (
withInflight): concurrent uploads of the same content collapse into a single network call. - Cross-process safety: mkdir-based lock (
server/grok/lock.js) with token-specific owner markers + stale reclaim via atomic rename. (lock.heartbeat()is provided for long-running holders; cache writes complete sub-second so the 5s stale window is not at risk and the bridge does not call it.) - Stale xAI file id mid-
/v1/responses: when the responses call returns a 4xx whose body names afile_*/file-*id from the current refs (and the ref has asourcePath), the bridge evicts the cached row, re-uploads from the original disk path, and retries the responses call once. Errors that don't name a known file id are surfaced unchanged. XAI_DISABLE_FILE_CACHE=1(env) skips the cache layer entirely (debugging).
Stored upload filenames are deliberation-{sha256[:16]}-{basename}. Uploads also
carry expires_after set by GROK_FILE_TTL_SECONDS (default 604800 = 7 days,
clamped 1h..30d).
The bundled server/grok/files-admin.js supports three subcommands:
list- shows total xAI file count and everydeliberation-*upload.prune --older-than <30m|24h|7d|seconds> [--yes]- dry run by default; deletes remote bridge-owned files matched by filename prefix + age. Works without the local cache; safe for environments where the cache was lost or never existed.gc [--all-keys] [--force-local-prune]- syncs the local cache with the remote file list via one paginatedGET /v1/files. Prunes local rows whosefileIdis no longer on xAI. Default scope is the currentXAI_API_KEY+XAI_API_BASErows only.--all-keyswidens to foreign rows but leaves them in place when remote absence is ambiguous (the current key can't see foreign files).--force-local-prunedrops ambiguous foreign rows anyway.
prune and gc are complementary: prune is the remote-side cleaner; gc keeps
the local cache aligned with remote state. The deliberation- filename prefix
is a hard safety invariant on both paths - your own xAI files are never touched.
The OpenRouter bridge (server/openrouter/index.js) is a zero-dependency Node MCP server
that calls any OpenAI-compatible POST {apiBase}/chat/completions endpoint.
It is advisory-only - it cannot edit files or run shell commands.
The bridge and the fan-out commands (/ask-all, /consensus) read
~/.config/deliberation/config.json at call time - the canonical XDG path (Windows:
%APPDATA%\deliberation\config.json). Override the path with DELIBERATION_CONFIG. The file is stat-gated: the bridge re-reads it only when
the mtime changes, so edits to models, routing, or the providers.openrouter block
take effect immediately without restarting Claude Code or re-running /setup. Toggling a
built-in provider (codex / gemini / grok) still requires /setup to re-register
or de-register the MCP server.
The config has four top-level sections, each with one job:
providers- transport / connection only. Per provider:enabled(default true) plus auth/endpoint keys.providers.openrouteralso carries the OpenRouter-specific connection keys (apiBase,allowRawModel,defaultModel, per-calldefaults).models- named model records, keyed by id. Each record names itsproviderandmodelslug and sets routing flags. This is where you declare the models the panel uses.routing- global fan-out policy (maxFanout).consensus-arbiter(who synthesizes the consensus verdict) andblindVote(optional blind arbiter pre-vote; boolean, defaultfalse).
Config file schema (strict JSON, version must be 1):
{
"$schema": "https://raw.githubusercontent.com/antonbabenko/deliberation/master/config/config.schema.json",
"version": 1,
"providers": {
"codex": { "enabled": true },
"gemini": { "enabled": true },
"grok": { "enabled": true, "apiKeyEnv": "XAI_API_KEY" },
"openrouter": {
"enabled": true,
"apiKeyEnv": "OPENROUTER_API_KEY",
"apiBase": "https://openrouter.ai/api/v1",
"allowRawModel": false,
"defaultModel": "openai/gpt-4.1-mini",
"defaults": { "reasoningEffort": "high", "temperature": 0.2, "timeout": 120000 }
}
},
"models": {
"claude-arb": {
"provider": "openrouter",
"model": "anthropic/claude-3.7-sonnet",
"askAll": true,
"consensus": true,
"experts": ["architect"],
"reasoningEffort": "high",
"temperature": 0.2,
"timeout": 60000
}
},
"routing": { "maxFanout": 3 },
"consensus": { "arbiter": { "model": "claude-arb" }, "blindVote": true }
}providers.openrouter fields (connection only; these are OpenRouter-specific and
are not globalized):
| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
boolean | true |
Whether OpenRouter participates |
apiKeyEnv |
string | OPENROUTER_API_KEY |
Env var holding the API key |
apiBase |
string | https://openrouter.ai/api/v1 |
OpenAI-compatible base URL |
allowRawModel |
boolean | false |
Allow raw slugs (not just configured records) |
defaultModel |
string | absent | Slug for the bare /ask-openrouter call |
defaults |
object | {} |
Per-call defaults: reasoningEffort, temperature, timeout |
models record fields (the map key is the record id, matching ^[a-z0-9-]+$ and not
the reserved openrouter-default):
| Field | Type | Default | Notes |
|---|---|---|---|
provider |
string | required | Must be "openrouter" in v1 (codex/gemini/grok are CLI-managed / singleton built-ins, out of scope) |
model |
string | required | Provider model slug (e.g. openai/gpt-4.1) |
experts |
array or absent | absent = all 7 | [] = none / explicit-only; array = subset of the 7 expert keys |
askAll |
boolean | true |
Include this record in /ask-all fan-out when eligible |
consensus |
boolean | false |
Include this record in /consensus voting |
reasoningEffort |
string | from defaults |
Per-record override (maps to the wire reasoning_effort) |
timeout |
number (ms) | from defaults |
Per-record override |
temperature |
number | from defaults |
Per-record override |
apiBase |
string | from providers.openrouter.apiBase |
Per-record override (use for mixing endpoints) |
On temperature: most deliberation work is analytical - code review, debugging,
security audits, architecture and plan verdicts - where you want focused, repeatable
answers. Leave temperature unset and the field is omitted, so the provider default
applies (commonly around 1.0); set a low value (roughly 0.1-0.3) when you want
that focused, repeatable behavior. Raise it (roughly 0.6-0.9) only for generative
fan-out where spread across models is the point: brainstorming, naming, "give me 20
options". Keep it low for /consensus rounds; you want the models reasoning, not
improvising.
consensus.arbiter names who synthesizes the verdict. Two forms:
- A shorthand string:
"auto"(default - pick a healthy voice, preferring an OpenRouter one),"host"(the host arbitrates; the server runs no arbiter pass), or a built-in provider name"codex"/"gemini"/"grok". - An object
{ "model": "<id>" }referencing amodelsrecord. The record can be any entry - even one withaskAll: falseandconsensus: false- which is the dedicated-arbiter case (an out-of-panel model that adjudicates without voting). Arbiter eligibility is independent of voting-panel membership.
A cross-host recommendation: a dedicated Claude record used only as the arbiter
({ "model": "claude-arb" }) lets a non-Claude host synthesize with a model that is not
one of the voting providers. An unusable arbiter (unknown shorthand, a { model } id that
is not configured, or a disabled provider) soft-degrades to "auto" with a warning - it
never hard-fails the config.
consensus.blindVote is an optional boolean (default false). When true, the arbiter
ALSO answers the original question cold - with no peer opinions - to produce a
blindVerdict, fired in parallel with the peer fan-out (no extra round). The blind pass
reduces the arbiter anchoring on the peers' framing.
Constraints and behavior:
- Concrete / server-arbiter mode only. It runs only when a real arbiter pass runs
(
"auto", a built-in, or a{ model }record). In"host"mode the server runs no arbiter pass, so there is no blind pass either -blindVerdictisnull. - Cost. It adds one extra arbiter call (parallel, no extra round), which is why it is off by default.
- Failure-isolated. A thrown blind pass yields
blindVerdict: nulland never fails the run.blindVerdictis alsonullwhenblindVoteis off or no arbiter exists. - Validation. A non-boolean value soft-degrades to
falsewith a warning - it never hard-fails the config.
Behavior source of truth: consensus() in core/orchestrate.js and the blindVote
validation in server/openrouter/config.js.
consensus.maxRounds is an optional positive integer (default 5) that caps the
server-side convergence loop used by the consensus and consensus-step tools (a per-call maxRounds overrides it).
The loop ends unresolved once it hits the cap without converging.
- Range.
1..50. A value above50is clamped to50with a warning; a non-integer or non-positive value is dropped (the default5applies) with a warning - it never hard-fails the config. - Scope. It governs only the multi-round loop tools. The one-shot
consensustool is a single arbiter pass and is unaffected. - Validation lives in
resolveConsensus(server/openrouter/config.js); the cap is enforced incore/consensus-loop.js.
consensus.maxWallMs is an optional positive integer (default 1200000, 20 min) that sets
the global wall-time budget (ms) for the server-side provider-arbiter convergence loop (the
consensus tool). When the budget is spent the loop stops BEFORE starting the next round and
returns UNRESOLVED with stopReason: "budget-exhausted"; it never aborts an in-flight
provider call. Does not apply to the host-driven /consensus (consensus-step) path.
- Default.
1200000(20 min). A non-integer or non-positive value is ignored (no budget is applied) without failing the config. - Scope. Provider-arbiter
consensustool only. The host-drivenconsensus-steppath has no server-side wall clock; the host controls timing there. - The budget is forwarded to
runToConvergenceasopts.maxWallMsand checked incore/consensus-loop.jsat the start of each round (now() - startedAt >= maxWallMs).
Config keys are camelCase: reasoningEffort, temperature, timeout. The bridge sends
reasoning_effort on the wire; the camelCase -> wire mapping happens in one place - the
resolved layer in server/openrouter/config.js, which carries reasoning_effort on each
resolved record and on defaults. temperature and timeout pass through unchanged.
Which params apply on which path: the unified /ask-all, /consensus, and the
{ model: <id> } arbiter path apply a record's per-model reasoningEffort, temperature,
and timeout (forwarded with arg-wins precedence by pinAlias in core/registry.js). A
record's per-model apiBase and the providers.openrouter.defaults block apply only on the
standalone /ask-openrouter bridge path, because the unified server's OpenRouter provider
fixes apiBase / apiKeyEnv at construction. That is a pre-existing limitation, not a goal
of the arbiter feature.
/ask-all: includes all records whereaskAll !== falseand the record is eligible for the requested expert; capped torouting.maxFanoutrecords (default 3)./consensus: includes records whereconsensus === true; NOT subject tomaxFanout. A warning is logged when more than 3 records enter a consensus round (cost).openrouter-defaultis the reserved id for the baremcp__deliberation__openroutercall and/ask-openrouterwith no record specified. It resolves todefaultModel, is the single-shot fallback only, and is never included in fan-out or consensus.- Implementation tasks always route to Codex or Gemini, never to OpenRouter.
The config carries a $schema key pointing at config/config.schema.json (JSON Schema draft
2020-12). VS Code's built-in JSON support reads that key and gives you validation,
autocomplete, and lint with no third-party extension - and it works on the user's real
config outside this repo, because the file itself carries $schema. The in-repo .vscode/
folder additionally wires a json.schemas mapping so example configs inside the repo
validate even without the $schema line.
Validation is per-entry, not all-or-nothing. A single malformed models record
(bad id characters, reserved id, non-openrouter provider, missing model, unknown
expert, or a bad per-record override) no longer rejects the whole config - the bridge keeps
every valid record and collects the bad ones into invalidModels. Only top-level/schema
problems hard-fail the whole config: malformed JSON, a non-object root, an unsupported
version, or a non-integer/< 1 routing.maxFanout.
mcp__deliberation__openrouter-list returns (each delegate keeps the alias field, equal
to the record id, so selection and the wire stay stable):
- On a hard config failure the object instead carries
error: "<message>"withdelegates: [](andinvalidModelsabsent/empty)./ask-alland/consensustreat theerrorform as "OpenRouter set EMPTY". invalidModels[].suggestedAliasis present only when a safe deterministic repair exists: id-format errors are sanitized to[a-z0-9-]+(e.g.qwen3.7-max->qwen3-7-max), and collisions get a free-Nsuffix. Suggestions are collision-checked against every existing id and the reservedopenrouter-default. Entries with no safe fix (missingmodel, unknown expert, non-openrouterprovider, reserved-id clash) have nosuggestedAlias.- The bridge never edits
config.json. The/ask-alland/consensuscommands surfaceinvalidModelsand offer Fix & proceed (default - apply eachsuggestedAliastoconfig.json, drop the unrepairable, re-list), Run valid only, or Skip all OpenRouter.
The Authorization header is sent only when the key env var resolves to a non-empty
string. Keyless local endpoints (Ollama, vLLM, LM Studio) work without a dummy key.
openrouter.ai returns HTTP 401 if the key is absent; local endpoints accept no-auth
requests.
| Endpoint | apiBase value |
|---|---|
| OpenRouter | https://openrouter.ai/api/v1 (default) |
| HuggingFace Inference | https://router.huggingface.co/v1 |
| Ollama (local) | http://localhost:11434/v1 |
| LM Studio | http://localhost:1234/v1 |
| vLLM | http://localhost:8000/v1 |
OpenRouter accepts {path} and {dir} entries only. file_id and file_url
entries are rejected (-32602). The mode field is coerced to "inline" regardless
of what is set - there is no upload path. Content is embedded as text blocks in the
request body.
Per-file cap: OPENROUTER_INLINE_MAX_BYTES (default 262144 = 256 KB).
Aggregate cap: OPENROUTER_INLINE_MAX_TOTAL_BYTES (default 1048576 = 1 MB).
Exceeding either cap returns a hard error with counts.
A model alias is bound at the start of a session via mcp__deliberation__openrouter
and is preserved for the life of that threadId. -reply calls on the same thread
always use the same model.
Each consensus round uses approximately N models x bundle tokens x rounds tokens.
When more than 3 models participate, the bridge emits a warning with an estimated
token count. There is no hard spend cap - the warning is informational only.
| Tool | Purpose |
|---|---|
mcp__deliberation__openrouter |
Start a new advisory session |
mcp__deliberation__openrouter-reply |
Continue a session (multi-turn via threadId) |
mcp__deliberation__openrouter-list |
List configured model aliases and their eligibility flags |
| errorKind | Meaning |
|---|---|
auth |
API key missing or rejected (HTTP 401/403) |
rate-limit |
HTTP 429 from upstream |
timeout |
Request exceeded the configured timeout |
network |
Connection error or DNS failure |
parse |
Response body could not be parsed |
upstream |
Non-2xx from the endpoint (other than auth/rate-limit) |
config |
Config file missing, invalid JSON, or schema violation |
model-not-allowed |
Requested alias is not in the config, or a raw model was passed with allowRawModel:false, or no alias/model was given and no defaultModel is set |
unknown-thread |
-reply called with a threadId that does not exist |
unknown |
Catch-all for unclassified errors |
An opt-in mechanism that automatically attaches a small repo orientation bundle to
advisory calls targeting file-blind providers (Grok and OpenRouter), so they
reach context parity with Codex and Gemini, which already walk cwd under their
read-only sandboxes. Default OFF - nothing is attached unless orientation.enabled
is true.
"orientation": { "enabled": false, "maxFiles": 6 }| Key | Type | Default | Meaning |
|---|---|---|---|
enabled |
boolean | false |
Attach the bundle to file-blind providers when they carry no files. |
maxFiles |
integer | 6 |
Cap on the number of files in the bundle. |
core/orientation.js - resolveOrientationFiles(cwd, { maxFiles }) returns an
array of FileRef objects (absolute paths of EXISTING files only) in fixed priority
order:
CLAUDE.md, AGENTS.md, README.md, package.json, pyproject.toml,
Cargo.toml, go.mod, tsconfig.json, main.tf
Results are capped to maxFiles. The function is stat-only (never reads file
content), never throws, and silently skips missing files.
orientationFilesFor(config, cwd) is the public entry-point: it returns the bundle
array when orientation.enabled is true, else undefined.
Provider capability: walksFilesystem - declared in core/types.js
ProviderCapabilities. true for codex and gemini (they walk the cwd under their
sandbox); false for grok and every OpenRouter per-alias wrapper (set via
registry.js pinAlias). This flag, not fileUpload, is the file-blind
discriminator.
core/orchestrate.js withOrientation gate - shared helper called in
callProvider BEFORE the dedup cache key is computed. It attaches the bundle when
two conditions both hold:
- The provider is file-blind (
walksFilesystem === false). - The caller passed no files of its own (the
filesarray is absent or empty).
Injection before the cache-key computation matters: core/result-cache.js keyFor
excludes cwd, so injecting a bundle AFTER the cache check would risk a cross-repo
false cache hit (same prompt, different repo). Injecting BEFORE makes the now-file-bearing
request correctly skip the in-session result cache.
Scope - orientation is applied to the peer fan-out AND the arbiter blind pass. It is NOT applied to the adjudication and revision passes, which reason over peer-opinion text rather than repo files directly.
Zero cross-contamination - the bundle travels in the per-provider files[]
argument, never in the shared prompt. The prompt text is byte-identical across
every provider; only the file list differs for file-blind delegates.
Bridge caps apply - each bridge enforces its own size limits. OpenRouter inlines
files as text (256 KB/file, 1 MB aggregate); Grok delivers them as inline or uploaded
attachments per the usual mode rules. The orientation bundle is intentionally small
(up to 6 high-signal files), so it fits comfortably within both caps.
When orientation.enabled is false (the default), or when you need more than
maxFiles files for a particular query, embed the context manually in the prompt
as described in the ask-all and consensus command files. The per-command
guidance still applies and takes precedence over the auto-attach when you pass your
own files.
An opt-in, single-user local store that records each /consensus and /ask-all
run so it can be fetched, re-run, and annotated later. Default OFF - nothing is
written to disk unless sessions.persist is true. Implemented in core/sessions.js
(synchronous, zero-dep); the store directory is resolved by resolveSessionsDir in
core/paths.js; config is validated by resolveSessions in
server/openrouter/config.js; the MCP wiring lives in server/mcp/index.js.
Three paths write through one chokepoint (persistRun): the server-side consensus
tool, ask-all, and session-revisit (child record). The host-driven consensus-step
loop (the live /consensus) also persists ONE record on a terminal transition -
converged or unresolved - via persistConsensusStep. It takes the loop entry with an
atomic loopStore.take() BEFORE the synchronous write, so a concurrent/retried terminal
call finds nothing and cannot double-write (at-most-one, lock-free); the record's
question is the ORIGINAL prompt stashed at init, not the final revision. On a write
failure it emits a CONTENT-FREE persist_failed event to the debug log (fs errno or
"write_failed", plus the ephemeral loop id - never err.message) and returns
persisted:false with no sessionId.
"sessions": { "persist": false, "maxRecords": 200, "maxAgeDays": 30, "captureText": false }| Key | Type | Default | Meaning |
|---|---|---|---|
persist |
boolean | false |
Save each run and return a sessionId. Non-boolean degrades to false + warning. |
maxRecords |
integer | 200 |
Keep at most this many newest records. -1 = unlimited (never trim by count). 0/invalid -> default + warning. |
maxAgeDays |
integer | 30 |
Delete records older than this. -1 = unlimited (never delete by age). 0/invalid -> default + warning. |
captureText |
boolean | false |
Also store each provider's raw RESPONSE body (opinion.text). OFF (default) = summaries only (question + verdict/criticalIssues); the body is dropped at persistRun for every path. ON (and persist on) stores the body, secret-scrubbed (mandatory) then best-effort PII (email) stripped, then capped. Non-boolean degrades to false + warning. The metrics-only debug log NEVER receives body text either way. |
Validation soft-degrades: a bad value never rejects the config, it falls back to the
default and the reason rides the same consensusWarnings channel the bridge already
surfaces.
- One JSON file per session at
<dir>/<id>.json, where<dir>isDELIBERATION_SESSIONSif set, else<XDG cache>/deliberation/sessions(macOS/Linux~/.cache/...; Windows%LOCALAPPDATA%\...). - Written atomically: the temp file is created with mode
0600directly (no world-readable window), then renamed into place; a failed rename removes the temp. - No global lock - each file is independent. The only read-modify-write is
session-annotateon one file, documented last-writer-wins (fine for a local single-user stdio server). - Retention runs after every write: delete by age, then trim by count (both honoring
-1= unlimited). Orphaned<id>.json.tmp.<pid>.<ts>fragments older than an hour are also reaped.
{ id, parentId|null, schemaVersion: 1, createdAt: <ISO>,
tool: "consensus"|"ask-all", question, expert|null,
files: [{ path|dir|file_id|file_url, mode? }]|null, // attachment REFS, never bodies
opinions: [{ provider, model, text?, // text = RESPONSE body, present ONLY when sessions.captureText is on
verdict?, criticalIssues? }], // verdict/criticalIssues on consensus LOOP opinions
blindVerdict|null, verdict|null, // verdict = loop enum (null in synthesize mode)
synthesis?|null, synthesizeAlways?, // synthesis = free-text (synthesize runs)
arbiter: { mode, provider }|null, warnings: [], annotations: [{ note, at }],
converged?, confidence?, rounds? } // consensus LOOP summary
A single stamp - no dual-version support (pre-1.0, no users). readSession returns the
object as-is and nothing branches on schemaVersion. The loop fields
(verdict/criticalIssues/converged/confidence/rounds) are populated only for a
consensus LOOP run; a synthesizeAlways run carries synthesis instead; ask-all omits
all of them. (tool is "consensus" for both consensus modes; synthesizeAlways records
the mode so session-revisit replays it.)
Before writing, scrubSecrets redacts common key shapes (OpenAI sk-, OpenRouter
sk-or-, xAI xai-, GitHub gh[pousr]_, AWS AKIA, Google AIza, and Bearer
tokens) in the question, opinion/verdict text, each critical-issue description, warnings,
annotation notes, and the file path/dir strings; the question and each opinion/verdict
are capped at ~100 KB, and an opinion verdict is whitelisted to the closed enum (anything
else is coerced to null) so no free text rides the unscrubbed verdict field. Scrubbing is
best-effort - user transcript text may still carry secrets in unrecognized shapes.
Opinion RESPONSE bodies (opinion.text) are persisted ONLY under the opt-in
sessions.captureText (default off); persistRun drops the field for every path when it
is off, so the default record holds only the question + verdict/issue summaries. When on,
the body is secret-scrubbed (the mandatory primary control, always run) and then passed
through a best-effort stripPII (email addresses only; RFC-bounded so it stays linear on
long provider text) as defense-in-depth - NOT a guarantee, never the gate. captureText is
forward-gating: turning it off stops new capture but does not strip records already on disk
(they age out via retention, or delete the store dir to purge).
Each takes its own input schema (no prompt), and reports
"session persistence is disabled (set sessions.persist)" when off.
| Tool | Input | Effect |
|---|---|---|
session-get |
{ sessionId } |
Return the record, or a not-found message. Read-only. |
session-revisit |
{ sessionId, cwd? } |
Re-run the record's original question (and its file refs) with the CURRENT providers/config, write a CHILD record (parentId = original id), return the new sessionId + result. Re-run, not snapshot-replay. A consensus record replays its mode - the full multi-round LOOP, or the single synthesis pass. (Records written before the tool merge are unsupported - pre-1.0, no users.) |
session-annotate |
{ sessionId, note } |
Append { note, at } to the record's audit trail and rewrite the file. |
When persist is on, consensus, ask-all, and the consensus-step terminal transition
also include a top-level sessionId in their result (consensus-step adds persisted and,
on the host-driven loop, the ephemeral loopSessionId).
With sessions.persist: true, one decision flows through all three tools. Tool calls are
shown in their real arg shape; ids are illustrative.
-
Record. Run a panel and the result carries a
sessionId:/ask-all Should we cache provider results in-process or in Redis? -> { results: [...], sessionId: "019c5af2-1a2b-7c3d-8e4f-aa12" } -
Review the stored decision later:
session-get { "sessionId": "019c5af2-1a2b-7c3d-8e4f-aa12" } -> { session: { tool: "ask-all", question: "Should we cache provider results ...", opinions: [ { provider: "codex", verdict: null, text: "..." }, { provider: "gemini", verdict: null, text: "..." } ], verdict: null, annotations: [] } }(A
consensusloop record also carries per-opinionverdict, a finalverdict, andconverged/confidence/rounds; asynthesizeAlwaysrecord carriessynthesis.) -
Revisit - re-ask the SAME question against today's providers/config. This writes a CHILD record linked by
parentIdand returns a new id (it re-runs; it does not replay the stored answer). Aconsensusrecord replays its mode (the loop, or a synthesize pass):session-revisit { "sessionId": "019c5af2-1a2b-7c3d-8e4f-aa12" } -> { results: [...], sessionId: "019c6b00-3d4e-7f50-9a61-bb34" } // parentId = 019c5af2-... -
Annotate either record with the real-world outcome - appended to the audit trail with a timestamp, never overwriting prior notes:
session-annotate { "sessionId": "019c5af2-1a2b-7c3d-8e4f-aa12", "note": "Shipped in-process LRU; revisit after Redis lands." } -> { session: { ..., annotations: [ { note: "Shipped in-process LRU; ...", at: "2026-06-17T..." } ] } }019c5af2 (ask-all) --parentId--> 019c6b00 (revisit) | +-- annotations: [ "Shipped in-process LRU; ..." @ 2026-06-17 ]
There is no enumeration tool, so a sessionId comes from one of three places:
- The
sessionIdin the original/ask-allor/consensusresult. - The store on disk:
ls ~/.cache/deliberation/sessions/- one<sessionId>.jsonper run. The exact dir is what/deliberation:doctorprints, andDELIBERATION_SESSIONSoverrides it. /deliberation:analyze- reviews recent runs in aggregate (verdict agreement, Lens B).
Expert prompts live in prompts/. Each follows the same structure: role definition
and context, advisory vs implementation modes, response-format guidance, and when
to invoke or not invoke. Edit these to change expert behavior for your workflow.
| Issue | Solution |
|---|---|
| MCP server not found | Restart Claude Code after setup |
| Provider not authenticated | Codex: codex login. Gemini: run agy once (or set GOOGLE_API_KEY). Grok: export XAI_API_KEY (else calls return errorKind: missing-auth) |
| Tool not appearing | Run claude mcp list and verify registration |
| Expert not triggered | Ask explicitly: "Ask GPT to review...", "Ask Gemini to review...", or "Ask Grok to review..." |
An advisory Gemini run returned workspaceMutated: true |
The delegate wrote to the consulted repo despite read-only mode (the OS sandbox is macOS-only; Linux relies on detection). Nothing was auto-reverted - review git status / git log and discard unwanted changes. Treat that result as tainted. |
Advisory Gemini run errors with a sandbox-exec failure on macOS |
The Seatbelt wrapper could not start. Set DELIBERATION_DISABLE_OS_SANDBOX=1 to fall back to prompt-guard + detection while investigating. |
agy print mode does not enforce folder trust, so there is no trust prompt to clear. Soft-timeout recovery (stdout-drain) is documented in Gemini timeout recovery.
- Advisory (read-only) Gemini enforcement is strongest on macOS, where the
agyprocess is wrapped in asandbox-execSeatbelt profile that denies workspace writes. On Linux there is no OS sandbox in v1 - read-only relies on the prompt guard + post-run git mutation detection (which setsworkspaceMutated: true), and the network is never isolated on any platform. Route deliberate implementation work to Codex (GPT) or the directgeminitool withworkspace-write. agyresolves a conversation id per cwd (in~/.gemini/antigravity-cli/cache/last_conversations.json). Heavy parallel calls from the same cwd (for example/ask-all,/consensus) share that single per-cwd slot, so agemini-replycould attach to a sibling run's conversation. This mirrorsagy's own per-cwd model.
{ "delegates": [ { "alias", "model", "experts", "askAll", "consensus", "reasoning_effort" } ], "defaultModelSet": true, "maxFanout": 3, "maxFanoutHigh": false, "invalidModels": [ { "index": 2, "alias": "qwen3.7-max", "reason": "models id \"qwen3.7-max\" must match [a-z0-9-]+ ...", "suggestedAlias": "qwen3-7-max" } ] }